Computer Science and Engineering Knowledge Center: November 2018

Tuesday, November 20, 2018

Artificial Neural Networks - Introduction

Author: Marek Libra

Posted under creative commons from Knol

The Artificial Neural Network (NN later) is a topic in artificial intelligence methods and techniques. It was successfully applied in a wide range of problem domains like finance, engineering, medicine, geology, physics or control.

Neural networks are useful especially for solving problems of prediction, classification or control. They are also a good alternative to classical statistical approaches like regression analysis.

The artificial neural networks techniques were developed based on the model of biological neural networks. The biological neural networks are the basis of functioning of the nervous system of biological organisms. This inspiration is commonly known fact and it is mentioned in most of neural networks publications.

The NN is built from a large number of simple processing units called artificial neurons (called
just neurons later).

The interface of an artificial neuron stays from n numeric inputs and one numeric output. Some models of neurons consider one next special input called bias. Each input is evaluated by its numeric weight. The neuron can perform two operations: compute and adapt.

The compute operation transforms inputs to output. The compute operation takes numerical
inputs and computes their weighted sum. It performs a so called activation function to this sum
(a mathematical transformation) afterwards. The result of the activation function is set as a value
to the output interface.

The adapt operation, based on a pair of inputs and awaited outputs specified by the user,
tunes the weights of an NN for a better approximation of the computed output compared to
the awaited output for considered input.

The neurons in an NN are ordered and numerically signed (from N) according to the order.
    A lot of models of NNs are known. These models differs to each other by different usage of

    • the domain of numeric input, output and weights (real, integer or finite set like {0,1}),

    • the presence of bias (yes or no),

    • the definition of an activation function (sigmoid, hyperbolic tangents, discrete threshold,
      etc),

    • the topology of interconnected neurons (feed-forward or recurrent),

    • the ability to change the number of neurons or the network topology during the lifetime of
      the network,

    • the algorithm of the computation flow through the network over neurons,

    • the simulation time (discrete or continuous) or

    • the adaptation algorithm (none, back propagation, perceptron rule, genetic, simulated annealing etc.).

A good taxonomy of NN models can be found i.e. in [1]

More detailed general descriptions, which are formal and well readable, can be found in [2] .

References

[1] Šíma and P. Orponen. General purpose computation with neural

[2] David M Skapura. Building Neural Networks. Addison-Wesley, 1995

Source Knol: /knol.google.com/ marek-libra/artificial-neural-networks/5rqq7q8930m0/12#
Knol Nrao - 5193

Artificial Intelligence: The Basics

Kevin Warwick, Professor of Cybernetics Kevin Warwick
Routledge, 01-Mar-2013 - COMPUTERS
https://books.google.co.in/books?id=b16pAgAAQBAJ

2012

Artificial Intelligence: A Beginner's Guide

Blay Whitby
Oneworld Publications, 01-Dec-2012 - Computers - 192 pages
https://books.google.co.in/books?id=TKOfhnUhgS4C

2010

Artificial Intelligence: Foundations of Computational Agents

David L. Poole, Alan K. Mackworth
Cambridge University Press, 19-Apr-2010
https://books.google.co.in/books?id=B7khAwAAQBAJ

2008

Fundamentals of the New Artificial Intelligence: Neural, Evolutionary, Fuzzy and More

Toshinori Munakata
Springer Science; Business Media, Jan 1, 2008 - 272 pages

This significantly updated 2nd edition thoroughly covers the most essential & widely employed material pertaining to neural networks, genetic algorithms, fuzzy systems, rough sets, & chaos. The exposition reveals the core principles, concepts, & technologies in a concise & accessible, easy-to-understand manner, & as a result, prerequisites are minimal. Topics & features: Retains the well-received features of the first edition, yet clarifies & expands on the topic Features completely new material on simulated annealing, Boltzmann machines, & extended fuzzy if-then rules tables

https://books.google.co.in/books?id=lei-Zt8UGSQC

Updated 21 November 2018, 26 June 2016, 27 June 2015

Internet of Things (IOT) and Industrial Internet of Things (IIoT) - Research Papers, Books and Articles - Bibliography

Top 5 Data Science Trends for 2020

https://www.datasciencecentral.com/profiles/blogs/top-5-data-science-trends-for-2020

Trend 2. Rapid growth in the IoT

According to a report by IDC, it is expected that the investment in IoT technology would reach $1 trillion by the end of 2020, which is an exceptional growth of connected devices. Many of them are smart devices. We are already using many apps and devices that are functioning based on IoT. Google Assistant or Microsoft Cortana allow us to automate the regular things based on IoT only., Businesses are investing in this technology, especially in smartphone development that uses IoT.

Internet of Things: A Simple definition by Vermesan (2013):

Internet of things is a network of physical objects
(Devayani Kulkarni's MS Thesis Internet of Things in Finnish Metal Industry, March 2018)

100+ Books on Internet of Things - IoT Books

IBM IoT - Products and Systems

Updated 2020 on 14 March 2020
20 November 2018

Monday, November 19, 2018

Big Data - Introduction

Big data usually includes data sets with sizes beyond the ability of commonly-used software tools to capture, curate, manage, and process the data within a tolerable elapsed time. Big data sizes are a constantly moving target, as of 2012 ranging from a few dozen terabytes to many petabytes of data in a single data set. With this difficulty, a new platform of "big data" tools has arisen to handle sensemaking over large quantities of data, as in the Apache Hadoop Big Data Platform.

In 2012, Gartner updated its definition as follows: "Big data are high-volume, high-velocity, and/or high-variety information assets that require new forms of processing to enable enhanced decision making, insight discovery and process optimization."

A 2016 definition states that "Big data represents the information assets characterized by such a high volume, velocity and variety to require specific technology and analytical methods for its transformation into value".

A 2018 definition states "Big data is where parallel computing tools are needed to handle data", and notes, "This represents a distinct and clearly defined change in the computer science used, via parallel programming theories, and losses of some of the guarantees and capabilities made by Codd’s relational model."

(Source: http://en.wikipedia.org/wiki/Big_data )

Big Data Repositories

Big data repositories have existed in many forms for year built by corporations for their use with a special need. Commercial vendors historically offered parallel database management systems for big data beginning in the 1990s.

Teradata Corporation in 1984 marketed the parallel processing DBC 1012 system. Teradata systems were the first to store and analyze 1 terabyte of data in 1992. Hard disk drives were 2.5 GB in 1991 so the definition of big data continuously evolves according to Kryder's Law. Teradata installed the first petabyte class RDBMS based system in 2007. As of 2017, there are a few dozen petabyte class Teradata relational databases installed, the largest of which exceeds 50 PB. Systems up until 2008 were 100% structured relational data. Since then, Teradata has added unstructured data types including XML, JSON, and Avro.

In 2000, Seisint Inc. (now LexisNexis Group) developed a C++-based distributed file-sharing framework for data storage and query. The system stores and distributes structured, semi-structured, and unstructured data across multiple servers. Users can build queries in a C++ dialect called ECL. In 2004, LexisNexis acquired Seisint Inc. and in 2008 acquired ChoicePoint, Inc.and their high-speed parallel processing platform. The two platforms were merged into HPCC (or High-Performance Computing Cluster) Systems and in 2011, HPCC was open-sourced under the Apache v2.0 License. Quantcast File System was available about the same time.

CERN and other physics experiments have collected big data sets and they analyzed via high performance computing (supercomputers). But big data movement presently uses the commodity map-reduce architectures.

In 2004, Google published a paper on a process called MapReduce. The MapReduce concept provides a parallel processing model to process huge amounts of data. With MapReduce, queries are split and distributed across parallel nodes and processed in parallel (the Map step). The results are then gathered and delivered as the output (the Reduce step). An implementation of the MapReduce framework was adopted by an Apache open-source project named Hadoop. Apache Spark was developed in 2012 in response to limitations in the MapReduce paradigm, as it adds the ability to set up many operations (not just map followed by reduce).

MIKE2.0 is an open approach to information management that acknowledges the need for revisions due to big data implications identified in an article titled "Big Data Solution Offering". The methodology addresses handling big data in terms of useful permutations of data sources, complexity in interrelationships, and difficulty in deleting (or modifying) individual records.

https://en.wikipedia.org/wiki/Big_data

Big Data - Dimensions

Big data - Four dimensions: Volume, Velocity, Variety, and Veracity (IBM document)
Examples of big data in enterprises

Volume: Enterprises are awash with ever-growing data of all types, easily amassing terabytes—even petabytes—of information.

12 terabytes of Tweets created each day has to analysed to get improved product sentiment analysis
Convert 350 billion annual meter readings to better predict power consumption

Velocity: Sometimes 2 minutes is too late. For time-sensitive processes such as catching fraud, big data must be used as it streams into your enterprise in order to maximize its value.

Examples:
Scrutinize 5 million trade events created each day to identify potential fraud
Analyze 500 million daily call detail records in real-time to predict customer churn faster

Variety: Big data is any type of data - structured and unstructured data such as text, sensor data, audio, video, click streams, log files and more. New insights are found when analyzing these data types together.

Monitor 100’s of live video feeds from surveillance cameras to target points of interest
Exploit the 80% data growth in images, video and documents to improve customer satisfaction

Veracity: Establishing trust in big data presents a huge challenge as the variety and number of sources grows.

McKinsey Article on Big Data
http://www.mckinsey.com/insights/mgi/research/technology_and_innovation/big_data_the_next_frontier_for_innovation

28.2.2013

11 Feb 2016

Evolution of Big

http://www.ibmbigdatahub.com/infographic/evolution-big-data

https://hbr.org/2013/12/analytics-30

Analytics 1.0—the era of “business intelligence.”

Analytics 1.0 started gaining an objective, deep understanding of important business phenomena and giving managers the fact-based comprehension to go beyond intuition when making decisions. For the first time, data about production processes, sales, customer interactions, and more were recorded, aggregated, and analyzed.

Updated 20 November 2018, 11 Feb 2016, 28 Feb 2013

Big Data - Analysis - Articles, Books and Research Papers - Bibliography

"Big data is where parallel computing tools are needed to handle data" - 2018 definition.

Big Data - Introduction

Big Data - Wikipedia Article

http://www.bigdata-madesimple.com/research-papers-that-changed-the-world-of-big-data/

It is a collection of research papers in the area of Big Data

MapReduce: Simplified Data Processing on Large Clusters

This paper presents MapReduce, a programming model and its implementation for large-scale distributed clusters. The main idea is to have a general execution model for codes that need to process a large amount of data over hundreds of machines.

The Google File System

It presents Google File System, a scalable distributed file system for large distributed data-intensive applications, which provides fault tolerance while running on inexpensive commodity hardware, and it delivers high aggregate performance to a large number of clients.

Bigtable: A Distributed Storage System for Structured Data

This paper presents the simple data model provided by Bigtable, which gives clients dynamic control over data layout and format, and the design and implementation of Bigtable.

Dynamo: Amazon’s Highly Available Key-value Store

This paper presents the design and implementation of Dynamo, a highly available key-value storage system that some of Amazon's core services use to provide an "always-on" experience.

The Chubby lock service for loosely-coupled distributed systems

Chubby is a distributed lock service; it does a lot of the hard parts of building distributed systems and provides its users with a familiar interface (writing files, taking a lock, file permissions). The paper describes it, focusing on the API rather than the implementation details.

Chukwa: A large-scale monitoring system

This paper describes the design and initial implementation of Chukwa, a data collection system for monitoring and analyzing large distributed systems. Chukwa is built on top of Hadoop, an open source distributed filesystem and MapReduce implementation, and inherits Hadoop’s scalability and robustness.

Cassandra - A Decentralized Structured Storage System

Cassandra is a distributed storage system for managing very large amounts of structured data spread out across many commodity servers, while providing highly available service with no single point of failure.

HadoopDB: An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads

There are two schools of thought regarding what technology to use for data analysis. Proponents of parallel databases argue that the strong emphasis on performance and efficiency of parallel databases makes them well-suited to perform such analysis. On the other hand, others argue that MapReduce-based systems are better suited due to their superior scalability, fault tolerance, and flexibility to handle unstructured data. This paper explores the feasibility of building a hybrid system.

S4: Distributed Stream Computing Platform.

This paper outlines the S4 architecture in detail, describes various applications, including real-life deployments, to show that the S4 design is surprisingly flexible and lends itself to run in large clusters built with commodity hardware.

Dremel: Interactive Analysis of Web-Scale Datasets

This paper describes the architecture and implementation of Dremel, a scalable, interactive ad-hoc query system for analysis of read-only nested data, and explains how it complements MapReduce-based computing.

Large-scale Incremental Processing Using Distributed Transactions and Notifications

Percolator is a system for incrementally processing updates to a large data set, and deployed it to create the Google web search index. This indexing system based on incremental processing replaced Google's batch-based indexing system.

Pregel: A System for Large-Scale Graph Processing

This paper presents a computational model suitable to solve many practical computing problems that concerns large graphs.

Spanner: Google’s Globally-Distributed Database

It explains about Spanner, Google’s scalable, multi-version, globally-distributed, and synchronously-replicated database. It is the first system to distribute data at global scale and sup-port externally-consistent distributed transactions.

Shark: Fast Data Analysis Using Coarse-grained Distributed Memory

Shark is a research data analysis system built on a novel coarse-grained distributed shared-memory abstraction. Shark marries query processing with deep data analysis, providing a unified system for easy data manipulation using SQL and pushing sophisticated analysis closer to data.

The PageRank Citation Ranking: Bringing Order to the Web

This paper describes PageRank, a method for rating Web pages objectively and mechanically, effectively measuring the human interest and attention devoted to them.

A Few Useful Things to Know about Machine Learning

This paper summarizes twelve key lessons that machine learning researchers and practitioners have learned, which include pitfalls to avoid, important issues to focus on, and answers to common questions.

Random Forests

This paper describes a method of building a forest of uncorrelated trees using a CART like procedure, combined with randomized node optimization and bagging. In addition, it combines several ingredients, which form the basis of the modern practice of random forests.

A Relational Model of Data for Large Shared Data Banks

Written by EF Codd in 1970, this paper was a breakthrough in Relational Data Base systems. He was the man who first conceived of the relational model for database management.

Map-Reduce for Machine Learning on Multicore

The paper focuses on developing a general and exact technique for parallel programming of a large class of machine learning algorithms for multicore processors. The central idea is to allow a future programmer or user to speed up machine learning applications by "throwing more cores" at the problem rather than search for specialized optimizations.

Megastore: Providing Scalable, Highly Available Storage for Interactive Services

This paper describes Megastore, a storage system developed to blend the scalability of a NoSQL datastore with the convenience of a traditional RDBMS in a novel way.

Finding a needle in Haystack: Facebook’s photo storage

This paper describes Haystack, an object storage system optimized for Facebook’s Photos application. Facebook currently stores over 260 billion images, which translates to over 20 petabytes of data.

Spark: Cluster Computing with Working Sets

This paper focuses on applications that reuse a working set of data across multiple parallel operations and proposes a new framework called Spark that supports these applications while retaining the scalability and fault tolerance of MapReduce.

The Unified Logging Infrastructure for Data Analytics at Twitter

This paper presents Twitter’s production logging infrastructure and its evolution from application-specific logging to a unified “client events” log format, where messages are captured in common, well-formatted, flexible Thrift messages.

F1: A Distributed SQL Database That Scales

F1 is a distributed relational database system built at Google to support the AdWords business. F1 is a hybrid database that combines high availability, the scalability of NoSQL systems like Bigtable, and the consistency and usability of traditional SQL databases.

MLbase: A Distributed Machine-learning System

This paper presents MLbase, a novel system harnessing the power of machine learning for both end-users and ML researchers.

Scalable Progressive Analytics on Big Data in the Cloud

This paper presents a new approach that gives more control to data scientists to carefully choose from a huge variety of sampling strategies in a domain-specific manner.

Big data: The next frontier for innovation, competition, and productivity

This is paper one of the most referenced documents in the world of Big Data. It describes current and potential applications of Big Data.

The Promise and Peril of Big Data

This paper summarizes the insights of the Eighteenth Annual Roundtable on Information Technology, which sought to understand the implications of the emergence of “Big Data” and new techniques of inferential analysis.

TDWI Checklist Report: Big Data Analytics

This paper provides six guidelines on implementing Big Data Analytics. It helps you take the first steps toward achieving a lasting competitive edge with analytics.

http://www.bigdata-madesimple.com/research-papers-that-changed-the-world-of-big-data/

Updated on 20 November 2018
Last updated 3 February 2015

Tuesday, November 13, 2018

Embedded Systems

E-Book

http://users.ece.utexas.edu/~valvano/Volume1/E-Book/

The material provided through EdX course is provided in a pre-edited form in the above website by authors for wider reading.

The contents are

_____________________
-------------------------------

Chapter 1: Introduction
Chapter 2: Fundamental Concepts
Chapter 3: Electronics
Chapter 4: Digital Logic
Chapter 5: Introduction to C
Chapter 6: Microcontroller Ports
Chapter 7: Design and Development Process
Chapter 8: Switches and LEDs
Chapter 9: Arrays and Functional Debugging
Chapter 10: Finite State Machines
Chapter 11: UART - The Serial Interface
Chapter 12: Interrupts
Chapter 13: DAC and Sound
Chapter 14: ADC and Data Acquisition
Chapter 15: Systems Approach to Game Design
Chapter 16: The Internet of Things
Appendix: Reference Material
Video links: Web links to videos (All chapters 1 to 16)
Closed caption files: Closed caption srt files
Index: Index of terms and concepts

---------------------------------
______________________

An embedded system combines mechanical, electrical, and chemical components along with a computer, hidden inside, to perform a single dedicated purpose.

The capabilities of the microcontrollers embedded into devices has increased over period of time.

The ARM® Cortex™-M family represents a new class of microcontrollers much more powerful than the devices available ten years ago.

A digital multimeter is a typical embedded system.

There are two ways to develop embedded systems. The first technique uses a microcontroller, like the ARM Cortex M-series. In general, there is no operating system, so the entire software system is developed. These devices are suitable for low-cost, low-performance systems. On the other hand, one can develop a high-performance embedded system around a more powerful microcontroller such as the ARM Cortex A-series. These systems typically employ an operating system and are first designed on a development platform, and then the software and hardware are migrated to a stand-alone embedded platform.

Computer Science and Engineering Knowledge Center

Pages