Data Warehousing and Mining; Business Intelligence - Mumbai university Syllabus and Related Knols
Authors: Narayana Rao
Original URL: http://knol.google.com/k/-/-/2utb2lsm2k7a/5736
Prerequisite: Data Base Management System
Objective: Today is the era characterized by Information Overload – Minimum
knowledge. Every business must rely extensively on data analysis to increase
productivity and survive competition. This course provides a comprehensive
introduction to data mining problems concepts with particular emphasis on business
The three main goals of the course are to enable students to:
1. Approach business problems data-analytically by identifying opportunities to
derive business value from data.
2. know the basics of data mining techniques and how they can be applied to extract
relevant business intelligence.
1. Introduction to Data Mining: Motivation for Data Mining, Data Mining-Definition
& Functionalities, Classification of DM systems, DM task primitives, Integration of a
Data Mining system with a Database or a Data Warehouse, Major issues in Data
2. Data Warehousing – (Overview Only): Overview of concepts like star schema, fact
and dimension tables, OLAP operations, From OLAP to Data Mining.
3. Data Preprocessing: Why? Descriptive Data Summarization, Data Cleaning:
Missing Values, Noisy Data, Data Integration and Transformation. Data Reduction:-
Data Cube Aggregation, Dimensionality reduction, Data Compression, Numerosity
Reduction, Data Discretization and Concept hierarchy generation for numerical and
4. Mining Frequent Patterns, Associations, and Correlations: Market Basket
Analysis, Frequent Itemsets, Closed Itemsets, and Association Rules, Frequent
Pattern Mining, Efficient and Scalable Frequent Itemset Mining Methods, The
Apriori Algorithm for finding Frequent Itemsets Using Candidate Generation,
Generating Association Rules from Frequent Itemsets, Improving the Efficiency of
Apriori, Frequent Itemsets without Candidate Generation using FP Tree, Mining
Multilevel Association Rules, Mining Multidimensional Association Rules, From
Association Mining to Correlation Analysis, Constraint-Based Association Mining.
5. Classification & Prediction: What is it? Issues regarding Classification and
• Classification methods: Decision tree, Bayesian Classification, Rule based
• Prediction: Linear and non linear regression
Accuracy and Error measures, Evaluating the accuracy of a Classifier or Predictor.
6. Cluster Analysis: What is it? Types of Data in cluster analysis, Categories of
clustering methods, Partitioning methods – K-Means, K-Mediods. Hierarchical
Clustering- Agglomerative and Divisive Clustering, BIRCH and ROCK methods,
DBSCAN, Outlier Analysis
7. Mining Stream and Sequence Data: What is stream data? Classification, Clustering
Association Mining in stream data. Mining Sequence Patterns in Transactional
8. Spatial Data and Text Mining: Spatial Data Cube Construction and Spatial OLAP,
Mining Spatial Association and Co-location Patterns, Spatial Clustering Methods,
Spatial Classification and Spatial Trend Analysis. Text Mining Text Data Analysis
and Information Retrieval, Dimensionality Reduction for Text, Text Mining
9. Web Mining: Web mining introduction, Web Content Mining, Web Structure
Mining, Web Usage mining, Automatic Classification of web Documents.
10. Data Mining for Business Intelligence Applications: Data mining for business
Applications like Balanced Scorecard, Fraud Detection, Clickstream Mining, Market
Segmentation, retail industry, telecommunications industry, banking & finance and
1. Han, Kamber, "Data Mining Concepts and Techniques", Morgan Kaufmann 2nd
Notes for the chapters of this book available below.
2. P. N. Tan, M. Steinbach, Vipin Kumar, “Introduction to Data Mining”, Pearson
1. MacLennan Jamie, Tang ZhaoHui and Crivat Bogdan, “Data Mining with Microsoft
SQL Server 2008”, Wiley India Edition.
2. G. Shmueli, N.R. Patel, P.C. Bruce, “Data Mining for Business Intelligence:
Concepts, Techniques, and Applications in Microsoft Office Excel with XLMiner”,
3. Michael Berry and Gordon Linoff “Data Mining Techniques”, 2nd Edition Wiley
4. Alex Berson and Smith, “Data Mining and Data Warehousing and OLAP”, McGraw
5. E. G. Mallach, “Decision Support and Data Warehouse Systems", Tata McGraw Hill.
6. Michael Berry and Gordon Linoff “Mastering Data Mining- Art & science of CRM”,
Wiley Student Edition
7. Arijay Chaudhry & P. S. Deshpande, “Multidimensional Data Analysis and Data
Mining Dreamtech Press
8. Vikram Pudi & Radha Krishna, “Data Mining”, Oxford Higher Education.
9. Chakrabarti, S., “Mining the Web: Discovering knowledge from hypertext data”,
10. M. Jarke, M. Lenzerini, Y. Vassiliou, P. Vassiliadis (ed.), “Fundamentals of Data
Warehouses”, Springer-Verlag, 1999.
Term work shall consist of at least 10 experiments covering all topics Term work should
consist of at least 6 programming assignments and one mini project in Business
Intelligence and two assignments covering the topics of the syllabus. One written test is
also to be conducted.
Distribution of marks for term work shall be as follows:
1. Laboratory work (Experiments and Journal) 15 Marks
2. Test (at least one) 10 Marks
The final certification and acceptance of TW ensures the satisfactory Performance of
laboratory Work and Minimum Passing in the term work.
Suggested Experiment List
1. Students can learn to use WEKA open source data mining tool and run data mining
algorithms on datasets.
2. Program for Classification – Decision tree, Naïve Bayes using languages like JAVA
3. Program for Clustering – K-means, Agglomerative, Divisive using languages like
4. Program for Association Mining using languages like JAVA
5. Web mining
6. BI projects: any one of Balanced Scorecard, Fraud detection, Market Segmentation
7. Using any commercial BI tool like SQLServer 2008, Oracle BI, SPSS, Clementine,
and XLMiner etc.
Notes - Data MiningData Mining Concepts and Techniques
University of Illinois at Urbana–Champaign
Micheline Kamber, Jian Pei
Simon Fraser University
Introduction to Data Mining
Data Preprocessing for Data Mining
Data Warehousing and Online Analytical Processing - Chapter of Data Mining
Data Cube Technologies for Data Mining
Mining Frequent Patterns, Associations, and Correlations: Basic Concepts and Methods
Advanced Patterns - Data Mining
Data Mining - Classification: Basic Concepts
Data Mining - Classification: Advanced Methods
Data Mining Recent Trends and Research Frontiers
Updated 14 Apr 2016
First Published 2011