K-State CIS 732: Machine Learning and Pattern Recognition

CIS 732: Machine Learning and Pattern Recognition

Fall 2003

Hours: 3 hours (extended CIS732 course project option available)

Prerequisite: CIS 501 or equivalent coursework in data structures and algorithms; CIS 301 (set theory/logic) and 575 (algorithm design), Math 510 (discrete math), Stat 410 (intro probability) strongly recommended

Textbook: Machine Learning, T. M. Mitchell. McGraw-Hill, 1997. ISBN: 0070428077

Time and Venue: Mon, Wed, Fri 13:30 – 14:20, Room 127 Nichols Hall (N127)

Instructor: William H. Hsu, Department of Computing and Information Sciences

Office: N213 Office phone: (785) 532-6350 x29 Home phone: (785) 539-7180

URL: http://www.cis.ksu.edu/~bhsu

E-mail: cis732ta@www.kddresearch.org

Office hours: 10:30-11:30 Mon, Wed, Fri;

by appointment, Friday after 16:00

Class web page: http://www.kddresearch.org /Courses/Fall-2003/CIS732/

Course Description

An introductory course in machine learning for development of intelligent knowledge based systems. The first half of the course will focus on basic taxonomies and theories of learning, algorithms for concept learning, statistical learning, knowledge representation, pattern recognition, and reasoning under uncertainty. The second half of the course will survey some basic topics in combining multiple models, learning from time series, learning to reason, and selected applications in knowledge discovery and data mining, especially in bioinformatics.

The course will include several written and programming assignments and a term project option for graduate students. Ancillary readings will be assigned; students will write a brief synopsis and review for one of these papers every other lecture.

Course Requirements

Homework: 3 out of 4 programming and written assignments (15%)

Examinations: 1 in-class closed-book midterm (20%), 1 in-class open-book final (25%)

Computer language(s): C/C++, Java, or student choice (upon instructor approval)

Project: research project and report for all students (25%)

Paper reviews: 10 out of 13 paper reviews (10%)

Class participation: class and online discussions, asking and answering questions (5%)

Selected reading (on reserve in K-State CIS Library)

· Artificial Intelligence: A Modern Approach, 2^nd edition, S. J. Russell and P. Norvig. Prentice-Hall, 2003. ISBN: 0137903952

· Readings in Machine Learning, J. W. Shavlik and T. G. Dietterich, eds. Morgan Kaufmann, 1990. ISBN: 1558601430

· Readings in Computer Inference and Knowledge Acquisition, B. G. Buchanan and D. C. Wilkins, eds. Morgan Kaufmann, 1993. ISBN: 1558601635

Additional bibliography (excerpted in course notes and handouts)

· Genetic Algorithms in Search, Optimization, and Machine Learning, D. E. Goldberg. Addison-Wesley, 1989. ISBN: 0201157675

· Genetic Programming: On The Programming of Computers by Means of Natural Selection, J. Koza. MIT Press, 1992. ISBN: 0262111705

Syllabus

Lecture	Date	Topic	(Primary) Source
0	2003 Aug 20	Administrivia; overview of learning	TMM Chapter 1
1	2003 Aug 22	Concept learning, version spaces	TMM 2
2	2003 Aug 25	Inductive bias	TMM 2
3	2003 Aug 27	Decision trees; using MLC++	TMM 3; RN 18
4	2003 Aug 29	Decision trees	TMM 3
5	2003 Sep 03	Overfitting and Occam’s Razor	TMM 3
6	2003 Sep 05	Perceptrons	TMM 4
7	2003 Sep 08	Winnow	TMM 4
8	2003 Sep 10	Multi-layer perceptrons	TMM 4; handouts
9	2003 Sep 12	Estimation and confidence intervals	TMM 5
10	2003 Sep 15	Bayesian learning: MAP, ML	TMM 6
11	2003 Sep 17	Bayesian learning: MDL, BOC, Gibbs	TMM 6
12	2003 Sep 19	Naïve Bayes; prob. learning over text	TMM 6; handouts
13	2003 Sep 22	Graphical models: inference basics	TMM 6; RN 14-15
14	2003 Sep 26	Graphical models: learning from data 1	TMM 6; paper
15	2003 Sep 29	Graphical models: learning from data 2	TMM 6
16	2003 Oct 01	Unsupervised learning: AutoClass, SOM	TMM 6
17	2003 Oct 03	Expectation-Maximization (EM)	TMM 6
18	2003 Oct 06	COLT 1: PAC learning	TMM 2, 7.1-3; handouts
19	2003 Oct 08	COLT 2: PAC, VC dimension,	TMM 7.4.1-3
20	2003 Oct 10	COLT 3: Error bounds	TMM 7.5.1-3
21	2003 Oct 15	Midterm Review	TMM 1-7; RN 14-15, 18
22	2003 Oct 17	Midterm Exam	–
23	2003 Oct 20	Reinforcement learning I	TMM 13; RN 20
24	2003 Oct 22	Reinforcement learning II	TMM 13; papers
25	2003 Oct 24	Dynamic models I: value/policy iteration	RN 16
26	2003 Oct 27	Dynamic models II: DBNs	RN 17
27	2003 Oct 29	Markov Decision Processes	RN 17
28	2003 Oct 31	Combining classifiers (WM, bagging)	TMM 7
29	2003 Nov 03	Boosting	TMM 9; papers
30	2003 Nov 05	Introduction to GEC	TMM 9
31	2003 Nov 07	Genetic algorithms	DEG
32	2003 Nov 10	Genetic programming 1	JK; paper
33	2003 Nov 12	Genetic programming 2	JK; paper
34	2003 Nov 14	IBL, k-nearest neighbor	TMM 8.1-4
35	2003 Nov 17	Rule learning and extraction	Paper
36	2003 Nov 19	Inductive logic programming	TMM 10; RN 21
37	2003 Nov 21	Support vector machines	New
38	2003 Nov 24	Applications 1: Decision Support	–
39	2003 Nov 26	Applications 2: Commercial KDD	–
40	2003 Dec 01	Applications 3: Bioinformatics	–
41	2003 Dec 03	Applications 4: KDD Cup	–
42	2003 Dec 05	Applications 5: Machine Translation	–
43	2003 Dec 08	Applications 6: Advanced Topics	–
44	2003 Dec 10	Final review	TMM 1-10, 13; RN 14-21
	TBD	FINAL EXAM	TMM 1-10, 13

TMM: Machine Learning, T. M. Mitchell

RN: Artificial Intelligence: A Modern Approach, S. J. Russell and P. Norvig

DEG: Genetic Algorithms in Search, Optimization, and Machine Learning, D. E. Goldberg

JK: Genetic Programming, J. Koza

Lightly-shaded entries denote the (Friday) due date of a written problem set.

Heavily-shaded entries denote the (Friday) due date of a machine problem (programming assignment).

Green-shaded entries denote the (Monday) due date of a paper review.

Course Description

Course Requirements

Selected reading (on reserve in K-State CIS Library)

Additional bibliography (excerpted in course notes and handouts)

Syllabus

Midterm Exam

FINAL EXAM