Printer Friendly Page

Introduction To Statistical Tools
Used For Data Mining

Who Should Attend

All managers, scientists, engineers, and Six Sigma practitioners who wish to understand how to explore and exploit information contained in their data sets of any size will be interested in this course.  Those who are interested in answers to the following questions should attend: What are the most useful methods for mining data?  What kind of software do I need and how do I run that software? How do I setup, control, and interpret the models? What are proven techniques for ensuring model validity and credibility?  How do I compare methods?  Very little prior knowledge of statistics is needed for this course.

What to Expect

  • This 3-day course takes a very hands-on approach to learning.  Participants will learn how to apply the methods via a combination of lecture and working examples using software.
  • While data management is an important subject, this course will focus on the statistical models used in data mining.
  • The course will cover the most useful tools such as data visualization, linear regression, logistic regression, classification and regression trees, neural networks, clustering, and nearest neighbors. While these data mining tools may sound exotic and difficult to understand, they can be explained simply and modern software makes them easy to use.
  • The instructor will make time for one-on-one consulting so participants are encouraged to bring their own data sets to class.

Course Outline

  • Module 1: Introduction to Data Mining
  • Module 2: Exploring and Preparing Data
  • Module 3: Introduction to Modeling and Validation
  • Module 4: Multiple Linear Regression
  • Module 5: Logistic Regression
  • Module 6: Model Validation
  • Module 7: Neural Networks
  • Module 8: Classification and Regression Trees
  • Module 9: Combining Models (bagging and boosting)
  • Module 10: Cluster Analysis