CIS 690: Implementation of High-Performance Data Mining Systems
Summer 2001
Hours: 3
Prerequisite: CIS 300 (Algorithms and Data Structures) and instructor permission, or CIS 500 (Analysis of Algorithms and Data Structures); basic courses in probability and statistics, databases recommended
Textbook: none (course notes)
Venue: Monday-Friday 15:30-17:30 (3:30pm - 5:30pm), 236 Nichols Hall (lecture) and 128 Nichols Hall (lab)
Instructor: William H. Hsu, Department of Computing and Information Sciences
Office hours: after class; 1-3pm Monday; by appointment
Class web page: http://ringil.cis.ksu.edu/Courses/Summer-2000/CIS690/
Course Description
This is a implementation practicum and basic tutorial on knowledge discovery in databases (KDD) for students interested in applications of pattern recognition and machine learning such as data mining, classification, expert systems, and planning and design automation. No prior background in artificial intelligence, machine learning, or knowledge-based systems is assumed or required, but preliminary coursework in probability and database systems is recommended. The course will introduce the following basic algorithms and models: decision trees, simple (naïve) Bayes, feedforward artificial neural networks (specifically, multilayer perceptrons), and the simple genetic algorithm. It will focus on implementation of some basic algorithms and configuring, modifying, and augmenting existing codes for machine learning and KDD.
Half of the course will be spent in lecture and discussion (60 minutes per day, 5 days per week); the other half, in the laboratory.
Course Requirements
Selected reading (on reserve in K-State CIS Library):
Additional bibliography (excerpt ed in course notes and handouts):
Syllabus
Lecture |
Date |
Topic |
Source |
0 |
May
14 |
Administrivia;
overview of KDD Lab environment, MLC++ |
TMM
Chapter 1 |
1 |
May
15 |
Decision
trees ID3 in MLC++; C4.5 |
TMM 3; Quinlan; RN 18 |
2 |
May
16 |
Decision
trees, overfitting MineSet Tree Visualizer |
TMM 3; Quinlan; RN 18 |
3 |
May
17 |
Wrappers Wrappers in MLC++ |
MLC++ manual |
4 |
May
19 |
Bagging Using wrapper inducers
in MLC++ |
MLC++ manual |
5 |
May
22 |
Boosting Implementing wrapper inducers |
MLC++ manual |
6 |
May 23
|
Simple
Bayes (naïve Bayes) Naïve Bayes inducer in
MLC++ |
TMM
6; MLC++ manual |
7 |
May
24 |
Improving
simple Bayes (naïve Bayes) Improvements to simple Bayes |
TMM
6; paper |
8 |
May 25 |
Using
simple Bayes for text mining NCSA Data to Knowledge (D2K) |
TMM
6; paper; D2K manual |
9 |
May 26 |
Introduction
to Bayesian networks Hugin Take-Home Midterm (due 6/1/2000) |
TMM
6 |
10 |
May
30 |
Learning
/ building Bayesian networks Bayesian Network Interchange Format (BNIF) |
TMM
6 |
11 |
May
31 |
Learning
Bayesian network structure MSBN, XML; ODBC and
Bayesian networks |
TMM
6; XBN docs |
12 |
June
1 |
Perceptrons
and winnow Perceptrons in MLC++;
SNOW |
TMM
4; RN 19; MLC++ manual |
13 |
June
2 |
Intro to artificial neural
networks (ANNs) SNNS |
TMM
4; RN 19; SNNS manual |
14 |
June
5 |
Learning
in ANNs NeuroSolutions |
TMM
4; RN 19; NS manual |
15 |
June
6 |
Intro to genetic
algorithms (GAs) Genesis |
TMM
9; DEG 1; paper; D2K Jenesis manual |
16 |
June
7 |
Designing genetic
algorithms (GAs) NCSA Jenesis |
TMM
9; DEG 6 |
17 |
June 8 |
Intro
to genetic programming (GP) Implementing GP
|
TMM
9 |
18 |
June 9 |
Conclusions and wrap-up KDD
developer resources NO FINAL EXAM
|
TMM
1, 3, 4, 6, 9; RN
18, 19; DEG 1, 6 |
TMM: Machine Learning, T. M. Mitchell
RN: Artificial Intelligence: A Modern Approach, S. J. Russell and P. Norvig
DEG: Genetic Algorithms in Search, Optimization, and Machine Learning, D. E. Goldberg
Lightly-shaded entries denote the (tentative) due dates of paper reviews.
Heavily-shaded entries denote the (tentative) due dates of written or programming assignments.