Credit: 3 hours
Prerequisite: CIS 300 (Algorithms and Data
Structures) and instructor permission, or CIS 500 (Analysis of Algorithms and
Data Structures); basic courses in probability and statistics, databases
recommended
Textbook: none (course notes)
Venue: Monday-Friday 2:30-4:40pm, 236 Nichols Hall (lecture) and 126/128 Nichols Hall (lab)
Instructor: William H. Hsu, Department of Computing and Information Sciences
Office: 213 Nichols Hall URL:
http://www.cis.ksu.edu/~bhsu E-mail:
bhsu@cis.ksu.edu
Office phone: 532-6350 Home
phone: 539-7180
Office hours: after class; 1-3pm Monday; by appointment
Class web page:http://www.kddresearch.org/Courses/Summer-2003/CIS690/
This is a implementation practicum and basic tutorial on knowledge discovery in databases (KDD) for students interested in applications of pattern recognition and machine learning such as data mining, classification, expert systems, and planning and design automation. No prior background in artificial intelligence, machine learning, or knowledge-based systems is assumed or required, but preliminary coursework in probability and database systems is recommended. The course will introduce the following basic algorithms and models: decision trees, simple (naïve) Bayes, feedforward artificial neural networks (specifically, multilayer perceptrons), and the simple genetic algorithm. It will focus on implementation of some basic algorithms and configuring, modifying, and augmenting existing codes for machine learning and KDD.
Half of the course will be spent in lecture and discussion (60 minutes per day, 5 days per week); the other half, in the laboratory.
Homework: 2 (out of 3) programming and written assignments (15%)
Paper reviews: 2 (out of 3) written reviews (1-2 pages) of research papers (10%)
Examinations: 1 in-class midterm (15%)
Computer language(s): C/C++ and Java (either permitted for term programming project)
Project: programming practicum using Linux (Beowulf) supercluster (60% total)
Lecture | Date | Topic | Source |
0 | May 19 |
Administrivia; overview of KDD Lab environment, MLC++ |
TMM Chapter 1 |
1 | May 20 |
Decision trees
ID3 in MLC++; C4.5 |
TMM 3; Quinlan; RN 18 |
2 | May 21 |
Decision trees, overfitting
MineSet Tree Visualizer |
TMM 3; Quinlan; RN 18 |
3 | May 22 |
Wrappers
Wrappers in MLC++ |
MLC++ manual |
4 | May 23 |
Bagging
Using wrapper inducers in MLC++ |
MLC++ manual |
5 | May 26 |
Boosting Implementing wrapper inducers |
MLC++ manual |
6 |
May 27 |
Simple Bayes (naïve Bayes)
Naïve Bayes inducer in MLC++ |
TMM 6; MLC++ manual |
7 | May 28 |
Improving simple Bayes (naïve Bayes) Improvements to simple Bayes |
TMM 6; paper |
8 | May 29 |
Using simple Bayes for text mining NCSA Data to Knowledge (D2K) |
TMM 6; paper; D2K manual |
9 | May 30 |
Introduction to Bayesian networks
Hugin |
TMM 6 |
10 | June 2 | Bioinformatics Topics / Lab work | TBA |
11 | June 3 | Bioinformatics Topics / Lab work | TBA |
12 | June 4 | In class Midterm | TBA |
13 | June 5 | Learning / building Bayesian networks Bayesian Network Interchange Format (BNIF) | TMM 6 |
14 | June 6 |
Learning Bayesian network structure
MSBN, XML; ODBC and Bayesian networks |
TMM 6; XBN docs |
15 | June 9 |
Perceptrons and winnow
Perceptrons in MLC++; SNOW |
TMM 4; RN 19; MLC++ manual |
16 | June 10 |
Intro to
artificial neural networks (ANNs)
SNNS |
TMM 4; RN 19; SNNS manual |
17 | June 11 |
Bioinformatics Topics |
TBA |
18 | June 12 |
Bioinformatics Topics |
TBA |
19 | June 13 | Bioinformatics Topics | TBA |
20 | June 16 | Bioinformatics Topics | TBA |
21 | June 17 |
Conclusions and wrap-up KDD developer resources NO FINAL EXAM |
TMM 1, 3, 4, 6, 9; RN 18, 19; DEG 1, 6 |
TMM: Machine Learning, T. M. Mitchell
RN: Artificial Intelligence: A Modern Approach, S. J. Russell and P. Norvig
DEG: Genetic Algorithms in Search, Optimization, and Machine Learning, D. E. Goldberg
Lightly-shaded entries denote the (tentative) due dates of paper reviews.
Heavily-shaded entries denote the (tentative) due dates of written or programming assignments.