BIA656 Statistical Learning and Analytics

 Semester: Fall 2012 Day of Week/Time Wed 06:15-08:45PM, Hanlon Fin. Lab Babbio 110 and Hanlon Finance Lab Instructor name and contact information Germán Creamer, Babbio 637 gcreamer@stevens.edu Office Hours: Th. 10.30AM-12.00 PM Class Website: Moodle

Overview

The significant amount of corporate information available requires a systematic and analytical approach to select the most important information and anticipate major events. Statistical learning algorithms facilitate this process understanding, modeling and forecasting the behavior of major corporate variables.

This course introduces time series and statistical and graphical models used for inference and prediction. The emphasis of the course is in the learning capability of the algorithms and their application to several business areas. Students should have a basic knowledge of probability theory, and linear algebra.

Prerequisites: Basic course in probability and statistics at the level of MGT 620; BIA 654 Multivariate data analytics.

Course Objectives

Students will:

• Learn the fundamental concepts of time series analysis and statistical learning algorithms.
• Explore existent and new applications of time series and statistical learning methods to finance, marketing and operations problems.

Relationship of Course to Rest of Curriculum

Students will have the opportunity to explore the main concepts of time series and statistical learning that will be used in the applied modules of this program.

Learning Goals

By the end of this course, the students will be able to:
1. Understand the foundations of time series and statistical learning algorithms
2. Apply statistical models and analytical methods to several business domains using a statistical language.
3. Recognize the value and also the limits of time series and statistical learning algorithms to solve business problems.

Pedagogy

The class will combine class presentations, discussions, exercises and case analysis to motivate students and train them in the appropriate use of statistical and econometric techniques.

Required Texts

Trevor Hastie, Robert Tibshirani and Jerome Friedman, The Elements of Statistical Learning. Springer-Verlag, New York,. 2010 (downloadable at http://www-stat.stanford.edu/~tibs/ElemStatLearn/).

Ruey S. Tsay, Analysis of Financial Time Series, 3rd Ed, John Wiley, 2010. (2nd. Edition can be accessed through the library website. Only chapters 2 and 12.1-12.3)

Christopher D. Manning, Prabhakar Raghavan and Hinrich Schutze, Introduction to Information Retrieval, Cambridge University Press. 2008 (downloadable at http://nlp.stanford.edu/IR-book).

Optional Texts

John Brocklebank and David Dickey, SAS for Forecasting Time Series, 2nd. ed. Wiley, 2009.

Christopher M. Bishop, Pattern Recognition and Machine Learning, Springer, 2006.

R.O. Duda, P.E. Hart and D.G. Stork, Pattern Classification, John Wiley & Sons, 2001.

Tony Jebara, Machine Learning: Discriminative and Generative, Kluwer, 2004, Boston, MA, 2004.

Tom M. Mitchell, Machine Learning, McGraw-Hill Series in Computer Science, 1997.

Vasant Dhar and Roger Stein. Seven methods for transforming corporate data into business intelligence. Upper Saddle River: Prentice Hall. 1997.

Jiuyong Li, Ada Wai-chee Fu, Hongxing He, Jie Chen, Huidong Jin, Damien McAullay, Graham Williams, Ross Sparks, and Chris Kelman. Mining risk patterns in medical data. In Proceedings of the eleventh ACM SIGKDD International conference on Knowledge discovery in data mining (KDD '05). ACM, New York, NY, USA, 2005: 770-775.

Riccardo Bellazzi and Blaz Zupanb. Predictive data mining in clinical medicine: Current issues and guidelines. International Journal of Medical Informatics 77, 2008: 81–97.

Germán Creamer. Model calibration and automated trading agent for Euro futures. Quantitative Finance, 12 (4), 2012, 531-545.
Paul Tetlock, Maytal Saar-Tsechansky, and Sofus Macskassy. More than words: Quantifying language to measure firms’ fundamentals. Journal of Finance 63, 2008: 1437-1467.
Yoav Freund and Robert E. Schapire, A decision-theoretic generalization of on-line learning and an application to boosting Journal of Computer and System Sciences 55, 1, 1997, 119-139.
Leo Breiman, Bagging predictors. Machine Learning 24, 2, 1996: 123–140.
Leo Breiman, Random forests. Machine Learning 45, 1, 2001: 5-32.

Assignments

The course will have a main project and six assignments of data analysis. The assignments are due at 10PM on the due day and must be submitted electronically through the course website. Each student must submit his/her own solution. Neither e-mail nor late submissions will be accepted.
Each assignment has a value of 2 points. These points are based on the completion of the assignment. The solutions will be circulated after the due date so each student will have the opportunity to review his/her answers.

Project

The project requires that participants build a decision support system (DSS) based on one of the methods explored in this course. Each project must be developed by groups of three students and they should present a project proposal.
The final report must include at least the following sections:

• Organization or area of application
• Problem
• Solution
• Results
• Lessons learned

This final report may have one of the two following formats according to the student’s orientation:

• Industry (biomedical, finance etc.): case study or “consulting-like” project in students’ corporate area of interest (see appendix D, Vasant Dhar and Roger Stein. Seven methods for transforming corporate data into business intelligence. Upper Saddle River: Prentice Hall. 1997.)
• Academic: academic paper presented in conferences such as "Innovative Applications of Artificial Intelligence Conference". The paper should be based on a theoretical or applied exploration of one of the methods studied in this course or any other method approved by the instructor. The research paper should follow (in general terms) this academic format.

This paper describes the statistical tests to be used to compare different learning algorithms: Dietterich, T. G., (1998). Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms. Neural Computation, 10 (7) 1895-1924. Postscript preprint. (Revised December 30, 1997).