FRE xxx3: Data Mining in Business and Finance

FRE 7851 Topics in Financial & Risk Engineering: Data Mining

Lecture periods: 2.5 hours (for 7 weeks)

Laboratory periods: 0 hours

Recitation periods: 0 hours

Credits: 1.5

Data are accumulating at incredible rate in almost every sector of our life due to technological advances in areas such as the internet, wireless telecommunication, point-of-sale devices, and data storage. A wealth of useful information is hidden in this vast amount of data. Nuggets of meaningful correlations, patterns and trends can be discovered using a variety of techniques in Data Mining to sifting through large amounts of data stored in repositories and data warehouses. Some proven successful applications of data mining in finance include forecasting stock market, currency exchange rate, bank bankruptcies, understanding and managing financial risk, trading futures, credit rating, loan management, bank customer profiling, and money laundering analyses.

Data mining techniques covered in this course may include, for example, k-Nearest Neighbor algorithms, Classification and Regression Trees, Discrimination Analysis, Logistic Regression, Artificial Neural Networks, Multiple Linear Regression, k-Means Clustering, Hierarchical Clustering, Principal Components Analysis, Association Rules, Collaborative Filtering, Genetic and Evolutionary Algorithms, and Support Vector Machines and other Kernel-Based Learning Methods. The relative merits and short-comings of the various methods will also be made.

Prerequisites: FRE 6083 or permission of program/course director.

Grading:

5% Classroom participation

50% Homework

45% Final Project and/or Examination

Text

Mehmed Kantardzic, “Data Mining: Concepts, Models, Methods, and Algorithms”, Wiley-IEEE, 2002, ISBN 0471228524.

Topics:

Week	Topic
1	Data Mining Concepts and Applications
2	Data Preprocessing and Data Reduction
3	Statistical Methods: Naïve Bayesian Classifier and Logistic Regression
4	Contingency Table and Linear Discrimination Analysis
5	Cluster Analysis, Similarity Measures, Agglomerative and Partitional Clustering
6	Decision Trees and Decision Rules
7	Association Rules

Lecture Notes:

Introduction and Motivations: introDataMining1.pdf

Preprocessing the Data: preparingData.pdf

k-Nearest Neighbor Clustering: k-NearestNeighbor.pdf

Decision Trees and Decision Rules: decisionTrees.pdf

Naïve Bayesian Classifier: naiveBayesianClassifier.pdf

Regression: regression.pdf

Association Rules: AssociationRules3.pdf

Error Estimation and Reduction: AccuracyErrors.pdf

Assignments:

Chapter 2: #6, #9, #10, and #11 (due Nov. 13)
Chapter 6: #9 (due Nov. 27)
Chapter 7: #4 (due Dec. 4)
Chapter 5: #3 Training set given in Table 5.1, and classify the following two samples - a): {1, 2, 1} and b): { 1, 2, 2}, instead of the ones given in the book, and #4 In the statement for part (c) of this problem, instead of referring to Figure 4.5, it should really be table 5.3. (due Dec. 11)
Chapter 8: #5 Note that items in a transaction (and of course in an itemset) should be sorted lexicographically. (due Dec. 18)