CIS 690: Implementation of High-Performance Data Mining Systems
Summer 2001
Hours: 3
Prerequisite: CIS 300 (Algorithms and Data Structures) and instructor permission, or CIS 500 (Analysis of Algorithms and Data Structures); basic courses in probability and statistics, databases recommended
Textbook: none (course notes)
Venue: Monday-Friday 15:30-17:30 (3:30pm - 5:30pm), 236 Nichols Hall (lecture) and 128 Nichols Hall (lab)
Instructor: William H. Hsu, Department of Computing and Information Sciences
Office hours: after class; 1-3pm Monday; by appointment
Class web page: http://ringil.cis.ksu.edu/Courses/Summer-2000/CIS690/
Course Description
This is a implementation practicum and basic tutorial on knowledge discovery in databases (KDD) for students interested in applications of pattern recognition and machine learning such as data mining, classification, expert systems, and planning and design automation. No prior background in artificial intelligence, machine learning, or knowledge-based systems is assumed or required, but preliminary coursework in probability and database systems is recommended. The course will introduce the following basic algorithms and models: decision trees, simple (naïve) Bayes, feedforward artificial neural networks (specifically, multilayer perceptrons), and the simple genetic algorithm. It will focus on implementation of some basic algorithms and configuring, modifying, and augmenting existing codes for machine learning and KDD.
Half of the course will be spent in lecture and discussion (60 minutes per day, 5 days per week); the other half, in the laboratory.
Course Requirements
Selected reading (on reserve in K-State CIS Library):
Additional bibliography (excerpt ed in course notes and handouts):
Syllabus
| Lecture | Date | Topic | Source | 
| 0 | May
  14 | Administrivia;
  overview of KDD Lab environment, MLC++ | TMM
  Chapter 1 | 
| 1 | May
  15 | Decision
  trees ID3 in MLC++; C4.5 | TMM 3; Quinlan; RN 18 | 
| 2 | May
  16 | Decision
  trees, overfitting MineSet Tree Visualizer | TMM 3; Quinlan; RN 18 | 
| 3 | May
  17 | Wrappers Wrappers in MLC++ | MLC++ manual | 
| 4 | May
  19 | Bagging Using wrapper inducers
  in MLC++ | MLC++ manual | 
| 5 | May
  22 | Boosting Implementing wrapper inducers | MLC++ manual | 
| 6 | May 23 | Simple
  Bayes (naïve Bayes) Naïve Bayes inducer in
  MLC++ | TMM
  6; MLC++ manual | 
| 7 | May
  24 | Improving
  simple Bayes (naïve Bayes) Improvements to simple Bayes | TMM
  6; paper | 
| 8 | May 25 | Using
  simple Bayes for text mining NCSA Data to Knowledge (D2K) | TMM
  6; paper; D2K manual | 
| 9 | May 26 | Introduction
  to Bayesian networks Hugin Take-Home Midterm (due 6/1/2000) | TMM
  6 | 
| 10 | May
  30 | Learning
  / building Bayesian networks Bayesian Network Interchange Format (BNIF) | TMM
  6 | 
| 11 | May
  31 | Learning
  Bayesian network structure MSBN, XML; ODBC and
  Bayesian networks | TMM
  6; XBN docs | 
| 12 | June
  1 | Perceptrons
  and winnow Perceptrons in MLC++;
  SNOW | TMM
  4; RN 19; MLC++ manual | 
| 13 | June
  2 | Intro to artificial neural
  networks (ANNs) SNNS | TMM
  4; RN 19; SNNS manual | 
| 14 | June
  5 | Learning
  in ANNs NeuroSolutions | TMM
  4; RN 19; NS manual | 
| 15 | June
  6 | Intro to genetic
  algorithms (GAs) Genesis | TMM
  9; DEG 1; paper; D2K Jenesis manual | 
| 16 | June
  7 | Designing genetic
  algorithms (GAs) NCSA Jenesis | TMM
  9; DEG 6 | 
| 17 | June 8 | Intro
  to genetic programming (GP) Implementing GP | TMM
  9 | 
| 18 | June 9 | Conclusions and wrap-up KDD
  developer resources NO FINAL EXAM | TMM
  1, 3, 4, 6, 9; RN
  18, 19; DEG 1, 6 | 
 
TMM: Machine Learning, T. M. Mitchell
RN: Artificial Intelligence: A Modern Approach, S. J. Russell and P. Norvig
DEG: Genetic Algorithms in Search, Optimization, and Machine Learning, D. E. Goldberg
Lightly-shaded entries denote the (tentative) due dates of paper reviews.
Heavily-shaded entries denote the (tentative) due dates of written or programming assignments.