CIS 690: Implementation of High-Performance Data Mining Systems
Summer 2001

Hours: 3

Prerequisite: CIS 300 (Algorithms and Data Structures) and instructor permission, or CIS 500 (Analysis of Algorithms and Data Structures); basic courses in probability and statistics, databases recommended

Textbook: none (course notes)

Venue: Monday-Friday 15:30-17:30 (3:30pm - 5:30pm), 236 Nichols Hall (lecture) and 128 Nichols Hall (lab)

Instructor: William H. Hsu, Department of Computing and Information Sciences

Office hours: after class; 1-3pm Monday; by appointment

Class web page: http://ringil.cis.ksu.edu/Courses/Summer-2000/CIS690/


Course Description

This is a implementation practicum and basic tutorial on knowledge discovery in databases (KDD) for students interested in applications of pattern recognition and machine learning such as data mining, classification, expert systems, and planning and design automation. No prior background in artificial intelligence, machine learning, or knowledge-based systems is assumed or required, but preliminary coursework in probability and database systems is recommended. The course will introduce the following basic algorithms and models: decision trees, simple (naïve) Bayes, feedforward artificial neural networks (specifically, multilayer perceptrons), and the simple genetic algorithm. It will focus on implementation of some basic algorithms and configuring, modifying, and augmenting existing codes for machine learning and KDD.

Half of the course will be spent in lecture and discussion (60 minutes per day, 5 days per week); the other half, in the laboratory.


Course Requirements

Selected reading (on reserve in K-State CIS Library):

Additional bibliography (excerpt ed in course notes and handouts):


Syllabus

K-State CIS 690: Syllabus - Summer, 2000

Lecture

Date

Topic

Source

0

May 14

Administrivia; overview of KDD

Lab environment, MLC++

TMM Chapter 1

1

May 15

Decision trees

ID3 in MLC++; C4.5

TMM 3; Quinlan; RN 18

2

May 16

Decision trees, overfitting

MineSet Tree Visualizer

TMM 3; Quinlan; RN 18

3

May 17

Wrappers

Wrappers in MLC++

MLC++ manual

4

May 19

Bagging

Using wrapper inducers in MLC++

MLC++ manual

5

May 22

Boosting

Implementing wrapper inducers

MLC++ manual

6

May 23

Simple Bayes (naïve Bayes)

Naïve Bayes inducer in MLC++

TMM 6; MLC++ manual

7

May 24

Improving simple Bayes (naïve Bayes)

Improvements to simple Bayes

TMM 6; paper

8

May 25

Using simple Bayes for text mining

NCSA Data to Knowledge (D2K)

TMM 6; paper; D2K manual

9

May 26

Introduction to Bayesian networks

Hugin

Take-Home Midterm (due 6/1/2000)

TMM 6

10

May 30

Learning / building Bayesian networks

Bayesian Network Interchange Format (BNIF)

TMM 6

11

May 31

Learning Bayesian network structure

MSBN, XML; ODBC and Bayesian networks

TMM 6; XBN docs

12

June 1

Perceptrons and winnow

Perceptrons in MLC++; SNOW

TMM 4; RN 19;

MLC++ manual

13

June 2

Intro to artificial neural networks (ANNs)

SNNS

TMM 4; RN 19;

SNNS manual

14

June 5

Learning in ANNs

NeuroSolutions

TMM 4; RN 19; NS manual

15

June 6

Intro to genetic algorithms (GAs)

Genesis

TMM 9; DEG 1; paper;

D2K Jenesis manual

16

June 7

Designing genetic algorithms (GAs)

NCSA Jenesis

TMM 9; DEG 6

17

June 8

Intro to genetic programming (GP)

Implementing GP

TMM 9

18

June 9

Conclusions and wrap-up

KDD developer resources

NO FINAL EXAM

TMM 1, 3, 4, 6, 9;

RN 18, 19; DEG 1, 6

 

TMM: Machine Learning, T. M. Mitchell
RN: Artificial Intelligence: A Modern Approach, S. J. Russell and P. Norvig
DEG: Genetic Algorithms in Search, Optimization, and Machine Learning, D. E. Goldberg

Lightly-shaded entries denote the (tentative) due dates of paper reviews.
Heavily-shaded entries denote the (tentative) due dates of written or programming assignments.