Friday, October 28, 2016

Network Analysis and Social Network Analysis - Bibliography and Videos




_________________


_________________




http://snap.stanford.edu/

http://snap.stanford.edu/class/cs224w-2015/projects.html

https://web.stanford.edu/class/cs224w/

https://web.stanford.edu/class/cs224w/intro_handout/intro_handout.pdf

http://historicalnetworkresearch.org/resources/first-steps/

Network Structure Inference, A Survey: Motivations, Methods, and
Applications
Ivan Brugere, University of Illinois at Chicago
Brian Gallagher, Lawrence Livermore National Laboratory
Tanya Y. Berger-Wolf, University of Illinois at Chicago
https://arxiv.org/pdf/1610.00782.pdf


Social Network Analysis: Methods and Applications

Stanley Wasserman, Katherine Faust
Cambridge University Press, 25-Nov-1994 - Social Science - 825 pages


Social network analysis, which focuses on relationships among social entities, is used widely in the social and behavioral sciences, as well as in economics, marketing, and industrial engineering. Social Network Analysis: Methods and Applications reviews and discusses methods for the analysis of social networks with a focus on applications of these methods to many substantive examples. As the first book to provide a comprehensive coverage of the methodology and applications of the field, this study is both a reference book and a textbook.
https://books.google.co.in/books?hl=en&lr=&id=CAm2DpIqRUIC

The structure and function of complex networks
M. E. J. Newman
Department of Physics, University of Michigan, Ann Arbor, MI 48109, U.S.A. and
Santa Fe Institute, 1399 Hyde Park Road, Santa Fe, NM 87501, U.S.A.



Analyzing Participation of Students in Online Courses Using Social
Network Analysis Techniques
Reihaneh Rabbany k., Mansoureh Takaffoli and Osmar R. Za¨ıane,
Department of Computing Science, University of Alberta, Canada
rabbanyk,takaffol,zaiane@ualberta.ca

Social Network Analysis and Mining to Support the
Assessment of On-line Student Participation


Big Data over Networks

Shuguang Cui, Alfred O. Hero, III, Zhi-Quan Luo
Cambridge University Press, 14-Jan-2016 - Computers - 457 pages


Utilising both key mathematical tools and state-of-the-art research results, this text explores the principles underpinning large-scale information processing over networks and examines the crucial interaction between big data and its associated communication, social and biological networks. Written by experts in the diverse fields of machine learning, optimisation, statistics, signal processing, networking, communications, sociology and biology, this book employs two complementary approaches: first analysing how the underlying network constrains the upper-layer of collaborative big data processing, and second, examining how big data processing may boost performance in various networks. Unifying the broad scope of the book is the rigorous mathematical treatment of the subjects, which is enriched by in-depth discussion of future directions and numerous open-ended problems that conclude each chapter. Readers will be able to master the fundamental principles for dealing with big data over large systems, making it essential reading for graduate students, scientific researchers and industry practitioners alike.


Networks, Crowds, and Markets: Reasoning about a Highly Connected World
Full Book
David Easley
Dept. of Economics
Cornell University

Jon Kleinberg
Dept. of Computer Science
Cornell University
Cambridge University Press, 2010
Draft version: June 10, 2010.

Fundamentals of Predictive Text Mining

Sholom M. Weiss, Nitin Indurkhya, Tong Zhang
Springer, 07-Sep-2015 - Computers - 239 pages


This successful textbook on predictive text mining offers a unified perspective on a rapidly evolving field, integrating topics spanning the varied disciplines of data science, machine learning, databases, and computational linguistics. Serving also as a practical guide, this unique book provides helpful advice illustrated by examples and case studies.

This highly anticipated second edition has been thoroughly revised and expanded with new material on deep learning, graph models, mining social media, errors and pitfalls in big data evaluation, Twitter sentiment analysis, and dependency parsing discussion. The fully updated content also features in-depth discussions on issues of document classification, information retrieval, clustering and organizing documents, information extraction, web-based data-sourcing, and prediction and evaluation.

Topics and features: presents a comprehensive, practical and easy-to-read introduction to text mining; includes chapter summaries, useful historical and bibliographic remarks, and classroom-tested exercises for each chapter; explores the application and utility of each method, as well as the optimum techniques for specific scenarios; provides several descriptive case studies that take readers from problem description to systems deployment in the real world; describes methods that rely on basic statistical techniques, thus allowing for relevance to all languages (not just English); contains links to free downloadable industrial-quality text-mining software and other supplementary instruction material.

Fundamentals of Predictive Text Mining is an essential resource for IT professionals and managers, as well as a key text for advanced undergraduate computer science students and beginning graduate students.

http://www.leonidzhukov.net/hse/2016/sna/


Advanced Database Marketing: Innovative Methodologies and Applications for Managing Customer Relationships

Koen W. De Bock
Routledge, Mar 23, 2016 - 348 pages
https://books.google.co.in/books?id=4hHPCwAAQBAJ



Sunday, October 2, 2016

IBM Bluemix - Cloud Computing Platform




How IBM's Bluemix Garages Woo Enterprises And Startups To The Big Blue Cloud
The locations let IBM teach both startups and big companies how to harness its cloud services.

IBM started its Bluemix Garages to go close to startup entrepreneurs.  Bluemix Garages are IBM establishments  typically embedded within incubator or coworking spaces popular with startups. developers fo startup firms can get assistance from IBM engineers in exploring its Bluemix cloud platform. The first Bluemix Garage was opened in 2014 at the San Francisco branch of Galvanize, a company offering workspace and tech training at locations across the country.

https://www.fastcompany.com/3061738/mind-and-machine/how-ibms-bluemix-garages-woo-enterprises-and-startups-to-the-big-blue-cloud

Sunday, July 24, 2016

Senior Analytics Scientist - Risk Analytics - Job Specification

A Analytics-driven e-commerce company is looking for:

Job Title: Senior Analytics Scientist - Risk Analytics


Role Outline
Senior Analytics Scientist - Risk Analytics and reports to the Sr. Mgr / Director leading the team.

The key requirement for the role is the ability to understand the business, develop data driven solutions to address business problems and provide analytic support to the risk analytics group. The individual will possess the ability to work in teams and display a proactive learning attitude.

Job Description
Job Title : Senior Analytics Scientist - Risk Analytics
Department : Risk Analytics
Reports To : Sr. Manager / Director


Key responsibilities
* Key responsibilities include
o Building models to predict risk and other key metrics
o Coming up with data driven solutions to control risk
o Finding opportunities to acquire more customers by modifying/optimizing existing rules
o Doing periodic upgrades of the underwriting strategy based on business requirements
o Evaluating 3rd party solutions for predicting/controlling risk of the portfolio
o Running periodic controlled tests to optimize underwriting
o Monitoring key portfolio metrics and take data driven actions based on the performance
* Business Knowledge: Develop an understanding of the domain/function. Manage business process (es) in the work area. The individual is expected to develop domain expertise in his/her work area.
* Teamwork: Develop cross site relationships to enhance leverage of ideas. Set and manage partner expectations. Drive implementation of projects with Engineering team while partnering seamlessly with cross site team members.
* Communication: Responsibly perform end to end project communication across the various levels in the organization.

Candidate Specification:
Skills:
* Should have solid understanding of probability and stats; Bayesian methods, probability distributions, Central limit theorem etc.
* Should be familiar with some of the following GLM, logistic regression, Random forest, Gradient boosting trees, CART, Naïve bayes, Linear Program, Mixed Integer program, etc.
* Knowledge of analytical tools such as R/Python/SAS/SQL
* Experience in handling complex data sources
* Dexterity with MySQL, MS Excel
* Strong Analytical aptitude and logical reasoning ability
* Strong presentation and communication skills.
* Strong process/project management skill

Preferred:
* 3 - 5 years of experience in Financial Services/Analytics Industry/ecommerce
* Understanding of the financial services business

Tuesday, July 19, 2016

Simplifying IT Systems to Accelerate Digital Transformation - BCG Perspective





Simplify IT’s Six Levers
The following are the six levers for simplifying IT:
1. Intelligent Demand Management.
2. Application and Data Simplification.
3. Infrastructure-Technology-Pattern Reduction.
4. Simplified IT Organization and an Enabled IT Workforce.
5. Effective Governance and Simplified Processes.
6. A Shared-Services Model and Optimized Sourcing.

https://www.bcgperspectives.com/content/articles/technology-digital-simplifying-it-accelerate-digital-transformation/



Related Article - Especially important in IoT world

How Hardware Makers Can Win in the Software World

Saturday, July 16, 2016

Hadoop Summit San Jose 2016



Silicon Angle has captured the views of experts that attended the summit in articles and videos.


Chuck Yarbrough, Pentaho - Hadoop Summit 2016 San Jose - #HS16SJ - #theCUBE
_____________________

_____________________
SiliconAngle upload

Do you spend more time solving vendor problems than business problems? | #HS16SJ
http://siliconangle.com/blog/2016/06/30/do-you-spend-more-time-solving-vendor-problems-than-business-problems-hs16sj/

Thursday, June 23, 2016

Essential Skills for Data Science and Data Analytics Careers



1. Statistics/Mathematics

2. Data Mining/Machine Learning/Artificial Intelligence


Data Analytics and Data Mining - Difference Explained


Machine Learning - Blog

3. Data Frameworks like Hadoop


What is Hadoop? Text and Video Lectures

4. Special programming languages for statistics  - R and Python

5. Database and SQL skills for Structured data.

6. NoSQL skills for unstructured social media data.


NoSQL Databases

http://www.kdnuggets.com/2016/05/10-must-have-skills-data-scientist.html

http://www.kdnuggets.com/2014/11/9-must-have-skills-data-scientist.html

Updated 26 June 2016

Tuesday, June 21, 2016

NoSQL Databases


The term was started in 2009

10 things you should know about NoSQL databases
By Guy Harrison, August 26, 2010
http://www.techrepublic.com/blog/10-things/10-things-you-should-know-about-nosql-databases/


Introduction - Presentation
http://martinfowler.com/articles/nosql-intro-original.pdf


2013  Introduction to NoSQL - Video Presentation by Martin Fowler

__________________

__________________

GOTO Conferences

http://martinfowler.com/nosql.html

https://www.mongodb.com/

http://www.nosqlweekly.com/  - Subscribe to their weekly. They provide a python weekly also.


Wednesday, June 15, 2016

Best Business Analytics and Intelligence Blogs - Books

Business intelligence is a broad set of information technology (IT) solutions
that includes tools for gathering, analyzing, and reporting information to the
users about performance of the organization and its environment. These IT
solutions are among the most highly prioritized solutions for investment.


Data Analytics Made Accessible
Anil K. Maheshwari, Ph.D.
2015


10 Great Business Intelligence (BI) Blogs To Follow

Business intelligence (BI) is a term that “includes the applications, infrastructure and tools, and best practices that enable access to and analysis of information” (Gartner). As such, it covers a wide breadth of topics, from data security to big data to data analysis. 

1. Datanami
Run by: Tabor Communications Website link: Datanami.com 

2. Business Intelligence and Analytics
Run by: Jen Underwood Website link: JenUnderwood.com 

3. Big Data Made Simple
Run by: Crayon Data Website link: bigdata-madesimple.com/ 

4. MIT Technology Review
Run by: MIT Website link: Technologyreview.com 

5. Gartner
Run by: Gartner Website link: Gartner.com/blog 


6. Forrester’s Business Intelligence Blog
Run by: Forrester Website link: go.forrester.com/blogs 

7. Capterra
Run by: Capterra Website link: blog.capterra.com 

8. The Register
Run by: Situation Publishing Website link: theregister.co.uk 

9. InformationWeek
Run by: InformationWeek Website link: Informationweek.com/big-data-analytics.asp 

10. Tableau Blog
Run by: Tableau Website link: Tableau.com/blog 
Besides providing insight into our constantly updated product offerings and company culture, the Tableau blog hosts a wealth of examples, inspiration, tips & tricks, community highlights, and stories about the social impact of data. Be sure to catch the weekly “Best of the Tableau Web” column for Tableau-specific data visualizations, tips, and stories.


The Top 9 Business Intelligence Blogs You Should Be Reading
https://www.maptive.com/top-9-business-intelligence-blogs-reading/



Analytics at Work: Smarter Decisions, Better Results
De Thomas H. Davenport, Jeanne G. Harris, Robert Morison
https://books.google.com.gi/books?id=Jy-8bBjjC38C                        

9 great books about business intelligence (BI)
Business Intelligence (BI) is rapidly becoming more accessible and important for interpreting big data. With the right tools, anyone can use BI to learn more about their business and what will help them succeed. These books cover an array of expertise for every reader: from beginners who only know the term “Business Intelligence” and want to learn more, to business managers who want to make their BI analysis more effective. And if real-life examples of business intelligence in action, we've got an article for that.

What is business intelligence?

 BI refers to an infrastructure that contains an assortment of processes and tools, all with the goal of providing businesses with complete and actionable data to aid in their decision making processes. Some of the things included in BI are: data visualization, data infrastructure and tools, business analytics, preparation, benchmarking, statistical analysis, and more. 

Learn more about business intelligence. 

1. “Business Intelligence Guidebook: From Data Integration to Analytics”
Author: Rick Sherman 





2. “Big Data in Practice: How 45 Successful Companies Used Big Data Analytics to Deliver Extraordinary Results”
Author: Bernard Marr 


This book dives into big data and analytics to show Business Intelligence in action through real-world examples.

3. “Business Intelligence Roadmap: The Complete Project Lifecycle for Decision-Support Applications” 
Authors: Larissa T. Moss and Shaku Atr 



. The book walks readers through every step of the way to getting a BI application in place. 

4. “Business Intelligence For Dummies” 
Author: Swain Scheps 



“Business Intelligence For Dummies” is an introduction to BI for anyone with no prior experience. 

5. “Hyper: Changing the way you think about, plan, and execute business intelligence for real results, real fast!” 
Author: Gregory P. Steffine 

 He presents major challenges in the BI process and how to overcome them, as well as proven methods to assist planning. The book is also littered with practical advice and tips for analytical projects.

6. “Learning Tableau 10 - Second Edition: Business Intelligence and data visualization that brings your business into focus” 
Author: Joshua N. Milligan 

A lot of Business Intelligence is finding the answers to questions and making better decisions, and part of that is achieved through data visualization. Displaying data through storytelling helps bring the pertinent information to the forefront to show the answers to business questions. In this book, author Joshua Milligan talks about how to create stunning visualizations and present data in Tableau, so that you can spend more time acting on the data than putting it together. The book teaches readers how to present data effectively, as well as how to dig deeper into data and analyze trends using different models.

7. “Business Intelligence: The Savvy Manager's Guide”
Author: David Loshin 



 Loshin guides managers through the process of developing BI and how it aids success. Old and new technologies are introduced, both as a way to present BI’s background and also how the technology is evolving to accommodate digital data needs. 

8. “Business Intelligence in Plain Language: A practical guide to Data Mining and Business Analytics”
Author: Jeremy Kolb 



This book is a good primer for getting introduced to the concepts of Business Intelligence and why it’s useful for every business. 

9. “Successful Business Intelligence: Unlock the Value of BI & Big Data”
Author: Cindi Howson 

Author Cindi Howson, an experienced Business Intelligence analyst, details how to successfully integrate BI in business in this book. Rather than simply discussing the theories behind BI, she dives into the best strategies used by successful organizations. 






Ud. 16.11.2023
Pub. 16.6.2016




Friday, June 3, 2016

Data Science Competitions



https://www.kaggle.com/competitions



Active Competitions on 3 June 2016
_____________________________








______________________________

Friday, May 13, 2016

R Language - Matrix and Array Related Commands



Array has dimensions. A vector is an array with  only one dimension. An array with two dimensions is a matrix.
Anything with more than two dimensions is simply called an array.

Technically, a vector has no dimensions at all in R. If you use the functions dim(), nrow(), or ncol(),  with a vector as argument, R returns NULL as a result.



Creating matrix

use the matrix() function.

The matrix() function has arguments to specify the matrix.
data is a vector of values you want in the matrix.
ncol takes a single number that tells R how many columns you want.
nrow takes a single number that tells R how many rows you want.
byrow takes a logical value that tells R whether you want to fill the matrix rowwise
(TRUE) or column-wise (FALSE). Column-wise is the default.

You don’t have to specify both ncol and nrow. If you specify one, R will know automatically what the other needs to be.

You can look at the structure of an object using the str() function.

If you want the number of rows and columns without looking at the structure, you can use the
dim() function.

You can find the total number of values in a matrix exactly the same way as
you do with a vector, using the length() function:

To see all the attributes of an object, you can use the attributes() function

You can combine both vectors as two rows of a matrix with the rbind() function,

The cbind() function does something similar. It binds the vectors as columns of a matrix,

you have the functions rownames() and colnames().  Both functions work much like the names() function you use when naming vector values.



Calculating with Matrices


You add a scalar to a matrix simply by using the addition operator, +,

With the addition operator, you can add both matrices together,  if the
dimensions of both matrices are not the same, R will complain and refuse to carry
out the operation

By default, R fills matrices column-wise. Whenever R reads a matrix, it also
reads it column-wise.

Transposing a matrix
The t() function (which stands for transpose) will do  the work

To
invert a matrix, you use the solve() function

The multiplication operator (*) works element-wise on matrices. To calculate
the inner product of two matrices,  use the special operator %*%,



Reference

R for Dummies by Vries and Meys
chap 7

R Language - Date and Time Related Commands

Working with dates in R

R has a range of functions that allow you to work with dates and times. The
easiest way of creating a date is to use the as.Date( ) function.

The default format for dates in as.Date( ) is YYYY-MM-DD
— four digits for year, and two digits for month and day, separated by a
hyphen.

To find out what day of the week this is, use weekdays():

You can add or subtract numbers from dates to create new dates.

use the seq( ) function to create sequences of dates in a far more
flexible way. As with numeric vectors, you have to specify at least three of the
arguments (from, to, by, and length.out).

In addition to weekdays( ), you also can get R to report on months() and
quarters( ):

Functions with Dates

Function
Description
as.Date()
Converts character string to Date
weekdays()
Full weekday name in the current locale (for example, Sunday, Monday, Tuesday)
months()
Full month name in the current locale (for example, January, February, March)
quarters()
Quarter numbers (Q1, Q2, Q3, or Q4)
seq()
Generates dates sequences if you pass it a Date object as its first argument


Objects that represent time series data.

ts: In R, you use the ts() function to create time series objects. These
are vector or matrix objects that contain information about the observations,
together with information about the start, frequency, and end of each
observation period. With ts class data you can use powerful R functions to do
modeling and forecasting — for example, arima() is a general model for time
series data.


Reference Book

R for Dummies, Vries and Meys
Chap 6

Thursday, May 12, 2016

R Language - Mathematical Commands



abs(x) Takes the absolute value of x
log(x,base=y) Takes the logarithm of x with base y; if base is not specified, returns the natural logarithm
exp(x) Returns the exponential of x
sqrt(x) Returns the square root of x
factorial(x) Returns the factorial of x (x!)
choose(x,y)
Returns the number of possible combinations when drawing y elements at a time from x
possibilities

round(x,digits=2)

signif(x,digits=4)


cos(120)
R always works with angles in radians, not in degrees. Pay attention to this fact

So correct way to cos of 120 degrees is to write cos(120*pi/180)

The str() function gives you the type and structure of the object.

If you want to know only how long a vector is, you can simply use the
length() function,

Creating vectors
seq(from = 4.5, to = 2.5, by = -0.5)

The c() function stands for concatenate. It doesn’t create vectors — it combines them.

To repeat the vector c(0, 0, 7) three times, use this code:  rep(c(0, 0, 7), times = 3)

You also can repeat every value by specifying the argument each in place of times.

rep(1:3,length.out=7)  says repeat the vector till length is 7. The last repetition may be incomplete.

The bracket  [] represents a function that you can use to extract a value from that vector. You can get the fifth value of the vector numbers bygiving the command. 
numbers[5]





Special Symbols - Copy Paste



⅛  ⅜  ⅓ ⅔  ¼  ½  ¾

≥  

→  ←  ↑ ↓ 





∛ ∜  ∝ ∟  ∠ ∡ ∢ ∣ ∤ ∥ ∦  ∧  ∨  ∩  ∪    ∫    ∬    ∭    ∮ 






http://www.copypastecharacter.com/mathematical  - Other symbol sets are also available

R Language Commands - Input and Display



# is for comments
R ignores everything that appears after the hash (#).
The assignment operator is symbol for less than followed by hash. (The blogger is creating a problem by giving strange interpretation of the symbol.)
#read files with labels in first row
read.table(filename,header=TRUE)           #read a tab or space delimited file
read.table(filename,header=TRUE,sep=',')   #read csv files

<- a="" c="" create="" data="" elements="" nbsp="" p="" specified="" vector="" with=""><- 1="" a="" c="" create="" data="" elements="" nbsp="" p="" to10="" vector="" with=""><- 10="" p=""><- a="" c="" create="" deviates="" item="" n="" nbsp="" normal="" of="" p="" random="" rnorm="" vector=""><- a="" added="" c="" create="" distribution="" each="" has="" item="" n="" nbsp="" p="" random="" runif="" that="" to="" uniform="" vector=""><- binomial="" create="" from="" n="" nbsp="" of="" p="" prob="" probability="" rbinom="" samples="" size="" the="" with=""><- and="" c="" combine="" into="" length="" nbsp="" of="" one="" p="" vector="" vectors="" x="" y=""><- 2="" a="" and="" cbind="" combine="" into="" matrix="" n="" nbsp="" p="" x="" y="">

mat[4,2]                                   #display the 4th row and the 2nd column
mat[3,]                                    #display the 3rd row
mat[,2]                                    #display the 2nd column


subset(dataset,logical)                    #those objects meeting a logical criterion
subset(data.df,select=variables,logical)   #get those objects from a data frame that meet a criterion
data.df[data.df=logical]                   #yet another way to get a subset
x[order(x$B),]                             #sort a dataframe by the order of the elements in B
x[rev(order(x$B)),]                        #sort the dataframe in reverse order

browse.workspace                           #a Mac menu command that creates a window with information about all variables in the workspace


Details




Input and display
#read files with labels in first row
read.table(filename,header=TRUE)           #read a tab or space delimited file
read.table(filename,header=TRUE,sep=',')   #read csv files

x <- a="" c="" create="" data="" elements="" nbsp="" p="" specified="" vector="" with="">To construct a vector
 > c(1,2,3,4,5)
[1] 1 2 3 4 5

y <- 1="" a="" c="" create="" data="" elements="" nbsp="" p="" to10="" vector="" with="">
> x <- 1:5="" nbsp="" p="">> x
[1] 1 2 3 4 5

assign the values 1:5 to a vector named x:
> x <- 1:5="" p="">> x
[1] 1 2 3 4 5

n <- 10="" p="">x1 <- a="" c="" create="" deviates="" item="" n="" nbsp="" normal="" of="" p="" random="" rnorm="" vector="">y1 <- a="" added="" c="" create="" distribution="" each="" has="" item="" n="" nbsp="" p="" random="" runif="" that="" to="" uniform="" vector="">z <- binomial="" create="" from="" n="" nbsp="" of="" p="" prob="" probability="" rbinom="" samples="" size="" the="" with="">vect <- and="" c="" combine="" into="" length="" nbsp="" of="" one="" p="" vector="" vectors="" x="" y="">

mat <- 2="" a="" and="" cbind="" combine="" into="" matrix="" n="" nbsp="" p="" x="" y="">mat[4,2]                                   #display the 4th row and the 2nd column
mat[3,]                                    #display the 3rd row
mat[,2]                                    #display the 2nd column


subset(dataset,logical)                    #those objects meeting a logical criterion
subset(data.df,select=variables,logical)   #get those objects from a data frame that meet a criterion
data.df[data.df=logical]                   #yet another way to get a subset

x[order(x$B),]                             #sort a dataframe by the order of the elements in B
x[rev(order(x$B)),]                        #sort the dataframe in reverse order

browse.workspace                           #a Mac menu command that creates a window with information about all variables in the workspace


Rules of Names of Variables, Vectors and Matrices in R

Names must start with a letter or a dot. If you start a name with a dot, the
second character can’t be a digit.

Names should contain only letters, numbers, underscore characters (_),
and dots (.). Although you can force R to accept other characters in names, you
shouldn’t, because these characters often have a special meaning in R.
You can’t use the following special keywords as names:
• break
• else
• FALSE
• for
• function
• if
• Inf
• NA
• NaN
• next
• repeat
• return
• TRUE
• while


R Language Commands - A Brief List

# is for comments


Input and display
#read files with labels in first row
read.table(filename,header=TRUE)           #read a tab or space delimited file
read.table(filename,header=TRUE,sep=',')   #read csv files

x <- a="" c="" create="" data="" elements="" nbsp="" p="" specified="" vector="" with="">y <- 1="" a="" c="" create="" data="" elements="" nbsp="" p="" to10="" vector="" with="">n <- 10="" p="">x1 <- a="" c="" create="" deviates="" item="" n="" nbsp="" normal="" of="" p="" random="" rnorm="" vector="">y1 <- a="" added="" c="" create="" distribution="" each="" has="" item="" n="" nbsp="" p="" random="" runif="" that="" to="" uniform="" vector="">z <- binomial="" create="" from="" n="" nbsp="" of="" p="" prob="" probability="" rbinom="" samples="" size="" the="" with="">vect <- and="" c="" combine="" into="" length="" nbsp="" of="" one="" p="" vector="" vectors="" x="" y="">mat <- 2="" a="" and="" cbind="" combine="" into="" matrix="" n="" nbsp="" p="" x="" y="">mat[4,2]                                   #display the 4th row and the 2nd column
mat[3,]                                    #display the 3rd row
mat[,2]                                    #display the 2nd column
subset(dataset,logical)                    #those objects meeting a logical criterion
subset(data.df,select=variables,logical)   #get those objects from a data frame that meet a criterion
data.df[data.df=logical]                   #yet another way to get a subset
x[order(x$B),]                             #sort a dataframe by the order of the elements in B
x[rev(order(x$B)),]                        #sort the dataframe in reverse order

browse.workspace                           #a Mac menu command that creates a window with information about all variables in the workspace



Moving around


ls()                                      #list the variables in the workspace
rm(x)                                     #remove x from the workspace
rm(list=ls())                             #remove all the variables from the workspace
attach(mat)                               #make the names of the variables in the matrix or data frame available in the workspace
detach(mat)                               #releases the names (remember to do this each time you attach something)
with(mat, .... )                          #a preferred alternative to attach ... detach
new <- column="" drop="" n="" nbsp="" nth="" old="" p="" the="">new <- drop="" n="" nbsp="" nth="" old="" p="" row="" the="">new <- and="" c="" column="" drop="" i="" ith="" j="" jth="" nbsp="" old="" p="" the="">new <- cases="" condition="" logical="" meet="" nbsp="" old="" p="" select="" subset="" that="" the="" those="">complete  <- cases="" complete.cases="" data.df="" find="" missing="" nbsp="" no="" p="" subset="" those="" values="" with="">new <- n1:n2="" n1="" n2="" n3:n4="" n3="" n4="" nbsp="" of="" old="" p="" rows="" select="" the="" through="" variables="">


Distributions

beta(a, b)
gamma(x)
choose(n, k)
factorial(x)

dnorm(x, mean=0, sd=1, log = FALSE)      #normal distribution
pnorm(q, mean=0, sd=1, lower.tail = TRUE, log.p = FALSE)
qnorm(p, mean=0, sd=1, lower.tail = TRUE, log.p = FALSE)
rnorm(n, mean=0, sd=1)


dunif(x, min=0, max=1, log = FALSE)      #uniform distribution
punif(q, min=0, max=1, lower.tail = TRUE, log.p = FALSE)
qunif(p, min=0, max=1, lower.tail = TRUE, log.p = FALSE)
runif(n, min=0, max=1)


Data manipulation

replace(x, list, values)                 #remember to assign this to some object i.e., x <- replace="" x="=-9,NA) </p">                                         #similar to the operation x[x==-9] <- na="" p="">scrub(x, where, min, max, isvalue,newvalue)  #a convenient way to change particular values (in psych package)

cut(x, breaks, labels = NULL,
    include.lowest = FALSE, right = TRUE, dig.lab = 3, ...)

x.df <- ...="" a="" combine="" data.frame="" data="" different="" frame="" into="" kinds="" nbsp="" of="" p="" x1="" x2="" x3="">    as.data.frame()
    is.data.frame()
x <- as.matrix="" p="">scale()                                   #converts a data frame to standardized scores

round(x,n)                                #rounds the values of x to n decimal places
ceiling(x)                                #vector x of smallest integers > x
floor(x)                                  #vector x of largest interger < x
as.integer(x)                             #truncates real x to integers (compare to round(x,0)
as.integer(x < cutpoint)                  #vector x of 0 if less than cutpoint, 1 if greater than cutpoint)
factor(ifelse(a < cutpoint, "Neg", "Pos"))  #is another way to dichotomize and to make a factor for analysis
transform(data.df,variable names = some operation) #can be part of a set up for a data set

x%in%y                     #tests each element of x for membership in y
y%in%x                     #tests each element of y for membership in x
all(x%in%y)                #true if x is a proper subset of y
all(x)                     # for a vector of logical values, are they all true?
any(x)                     #for a vector of logical values, is at least one true?



Statistics and transformations

max(x, na.rm=TRUE)     #Find the maximum value in the vector x, exclude missing values
min(x, na.rm=TRUE)
mean(x, na.rm=TRUE)
median(x, na.rm=TRUE)
sum(x, na.rm=TRUE)
var(x, na.rm=TRUE)     #produces the variance covariance matrix
sd(x, na.rm=TRUE)      #standard deviation
mad(x, na.rm=TRUE)    #(median absolute deviation)
fivenum(x, na.rm=TRUE) #Tukey fivenumbers min, lowerhinge, median, upper hinge, max
table(x)    #frequency counts of entries, ideally the entries are factors(although it works with integers or even reals)
scale(data,scale=FALSE)   #centers around the mean but does not scale by the sd)
cumsum(x,na=rm=TRUE)     #cumulative sum, etc.
cumprod(x)
cummax(x)
cummin(x)
rev(x)      #reverse the order of values in x

cor(x,y,use="pair")   #correlation matrix for pairwise complete data, use="complete" for complete cases

aov(x~y,data=datafile)  #where x and y can be matrices
aov.ex1 = aov(DV~IV,data=data.ex1)  #do the analysis of variance or
aov.ex2 = aov(DV~IV1*IV21,data=data.ex2)         #do a two way analysis of variance
summary(aov.ex1)                                    #show the summary table
print(model.tables(aov.ex1,"means"),digits=3)       #report the means and the number of subjects/cell
boxplot(DV~IV,data=data.ex1)        #graphical summary appears in graphics window

lm(x~y,data=dataset)                      #basic linear model where x and y can be matrices  (see plot.lm for plotting options)
t.test(x,g)
pairwise.t.test(x,g)
power.anova.test(groups = NULL, n = NULL, between.var = NULL,
                 within.var = NULL, sig.level = 0.05, power = NULL)
power.t.test(n = NULL, delta = NULL, sd = 1, sig.level = 0.05,
             power = NULL, type = c("two.sample", "one.sample", "paired"),
             alternative = c("two.sided", "one.sided"),strict = FALSE)




Regression, the linear model, factor analysis and principal components analysis (PCA)


matrices
t(X)                                     #transpose of X
X %*% Y                                  #matrix multiply X by Y
solve(A)                                 #inverse of A
solve(A,B)                               #inverse of A * B    (may be used for linear regression)

data frames are needed for regression
lm(Y~X1+X2)
lm(Y~X|W)                            


factanal()    (see also fa in the psych package)
princomp()     (see principal in the psych package)


Useful additional commands


colSums (x, na.rm = FALSE, dims = 1)
rowSums (x, na.rm = FALSE, dims = 1)
colMeans(x, na.rm = FALSE, dims = 1)
rowMeans(x, na.rm = FALSE, dims = 1)
rowsum(x, group, reorder = TRUE, ...)         #finds row sums for each level of a grouping variable
apply(X, MARGIN, FUN, ...)                    #applies the function (FUN) to either rows (1) or columns (2) on object X
apply(x,1,min)                             #finds the minimum for each row
apply(x,2,max)                            #finds the maximum for each column
col.max(x)                                   #another way to find which column has the maximum value for each row
which.min(x)
which.max(x)
z=apply(x,1,which.min)               #tells the row with the minimum value for every column


Graphics

par(mfrow=c(nrow,mcol))                   #number of rows and columns to graph
par(ask=TRUE)                             #ask for user input before drawing a new graph
par(omi=c(0,0,1,0) )                      #set the size of the outer margins
mtext("some global title",3,outer=TRUE,line=1,cex=1.5)    #note that we seem to need to add the global title last
                     #cex = character expansion factor

boxplot(x,main="title")                  #boxplot (box and whiskers)


title( "some title")                          #add a title to the first graph


hist()                                   #histogram
plot()
plot(x,y,xlim=range(-1,1),ylim=range(-1,1),main=title)
par(mfrow=c(1,1))     #change the graph window back to one figure
symb=c(19,25,3,23)
colors=c("black","red","green","blue")
charact=c("S","T","N","H")
plot(PA,NAF,pch=symb[group],col=colors[group],bg=colors[condit],cex=1.5,main="Postive vs. Negative Affect by Film condition")
points(mPA,mNA,pch=symb[condit],cex=4.5,col=colors[condit],bg=colors[condit])

curve()
abline(a,b)
abline(a, b, untf = FALSE, ...)
     abline(h=, untf = FALSE, ...)
     abline(v=, untf = FALSE, ...)
     abline(coef=, untf = FALSE, ...)
     abline(reg=, untf = FALSE, ...)

identify()
plot(eatar,eanta,xlim=range(-1,1),ylim=range(-1,1),main=title)
identify(eatar,eanta,labels=labels(energysR[,1])  )       #dynamically puts names on the plots
locate()

legend()
pairs()                                  #SPLOM (scatter plot Matrix)
pairs.panels ()    #SPLOM on lower off diagonal, histograms on diagonal, correlations on diagonal
                   #not standard R, but in the psych package
matplot ()
biplot ())
plot(table(x))                           #plot the frequencies of levels in x

x= recordPlot()                     #save the current plot device output in the object x
replayPlot(x)                       #replot object x
dev.control                         #various control functions for printing/saving graphic files
pdf(height=6, width=6)              #create a pdf file for output
dev.of()                            #close the pdf file created with pdf
layout(mat)                         #specify where multiple graphs go on the page
                                    #experiment with the magic code from Paul Murrell to do fancy graphic location
layout(rbind(c(1, 1, 2, 2, 3, 3),
             c(0, 4, 4, 5, 5, 0)))  
for (i in 1:5) {
  plot(i, type="n")
  text(1, i, paste("Plot", i), cex=4)
}



Distributions

To generate random samples from a variety of distributions
rnorm(n,mean,sd)
rbinom(n,size,p)
sample(x, size, replace = FALSE, prob = NULL)      #samples with or without replacement
Working with Dates
date <-strptime a="" as.character="" change="" d="" date="" field="" for="" form="" internal="" m="" nbsp="" p="" the="" time="" to="" y="">                                                  #see ?formats and ?POSIXlt
 as.Date
 month= months(date)                #see also weekdays, Julian