CIS 732: Machine Learning and Pattern Recognition

Fall 2003

 

Hours: 3 hours (extended CIS732 course project option available)

Prerequisite: CIS 501 or equivalent coursework in data structures and algorithms; CIS 301 (set theory/logic) and 575 (algorithm design), Math 510 (discrete math), Stat 410 (intro probability) strongly recommended

Textbook: Machine Learning, T. M. Mitchell.  McGraw-Hill, 1997. ISBN: 0070428077

Time and Venue: Mon, Wed, Fri 13:30 – 14:20, Room 127 Nichols Hall (N127)

Instructor: William H. Hsu, Department of Computing and Information Sciences

Office: N213        Office phone: (785) 532-6350 x29  Home phone: (785) 539-7180

URL: http://www.cis.ksu.edu/~bhsu                  

E-mail: cis732ta@www.kddresearch.org

Office hours:           10:30-11:30 Mon, Wed, Fri;

by appointment, Friday after 16:00

Class web page: http://www.kddresearch.org /Courses/Fall-2003/CIS732/

 

Course Description

 

An introductory course in machine learning for development of intelligent knowledge based systems.  The first half of the course will focus on basic taxonomies and theories of learning, algorithms for concept learning, statistical learning, knowledge representation, pattern recognition, and reasoning under uncertainty.  The second half of the course will survey some basic topics in combining multiple models, learning from time series, learning to reason, and selected applications in knowledge discovery and data mining, especially in bioinformatics.

 

The course will include several written and programming assignments and a term project option for graduate students.  Ancillary readings will be assigned; students will write a brief synopsis and review for one of these papers every other lecture.

 

Course Requirements

 

Homework: 3 out of 4 programming and written assignments (15%)

Examinations: 1 in-class closed-book midterm (20%), 1 in-class open-book final (25%)

Computer language(s): C/C++, Java, or student choice (upon instructor approval)

Project: research project and report for all students (25%)

Paper reviews: 10 out of 13 paper reviews (10%)

Class participation: class and online discussions, asking and answering questions (5%)

 

Selected reading (on reserve in K-State CIS Library)

 

·         Artificial Intelligence: A Modern Approach, 2nd edition, S. J. Russell and P. Norvig.  Prentice-Hall, 2003.  ISBN: 0137903952

·         Readings in Machine Learning, J. W. Shavlik and T. G. Dietterich, eds.  Morgan Kaufmann, 1990. ISBN: 1558601430

·         Readings in Computer Inference and Knowledge Acquisition, B. G. Buchanan and D. C. Wilkins, eds.  Morgan Kaufmann, 1993.  ISBN: 1558601635

 

Additional bibliography (excerpted in course notes and handouts)

 

·         Genetic Algorithms in Search, Optimization, and Machine Learning, D. E. Goldberg.  Addison-Wesley, 1989.  ISBN: 0201157675

·         Genetic Programming: On The Programming of Computers by Means of Natural Selection, J. Koza.  MIT Press, 1992.  ISBN: 0262111705


Syllabus

 

Lecture

Date

Topic

(Primary) Source

0

2003 Aug 20

Administrivia; overview of learning

TMM Chapter 1

1

2003 Aug 22

Concept learning, version spaces

TMM 2

2

2003 Aug 25

Inductive bias

TMM 2

3

2003 Aug 27

Decision trees; using MLC++

TMM 3; RN 18

4

2003 Aug 29

Decision trees

TMM 3

5

2003 Sep 03

Overfitting and Occam’s Razor

TMM 3

6

2003 Sep 05

Perceptrons

TMM 4

7

2003 Sep 08

Winnow

TMM 4

8

2003 Sep 10

Multi-layer perceptrons

TMM 4; handouts

9

2003 Sep 12

Estimation and confidence intervals

TMM 5

10

2003 Sep 15

Bayesian learning: MAP, ML

TMM 6

11

2003 Sep 17

Bayesian learning: MDL, BOC, Gibbs

TMM 6

12

2003 Sep 19

Naïve Bayes; prob. learning over text

TMM 6; handouts

13

2003 Sep 22

Graphical models: inference basics

TMM 6; RN 14-15

14

2003 Sep 26

Graphical models: learning from data 1

TMM 6; paper

15

2003 Sep 29

Graphical models: learning from data 2

TMM 6

16

2003 Oct 01

Unsupervised learning: AutoClass, SOM

TMM 6

17

2003 Oct 03

Expectation-Maximization (EM)

TMM 6

18

2003 Oct 06

COLT 1: PAC learning

TMM 2, 7.1-3; handouts

19

2003 Oct 08

COLT 2: PAC, VC dimension,

TMM 7.4.1-3

20

2003 Oct 10

COLT 3: Error bounds

TMM 7.5.1-3

21

2003 Oct 15

Midterm Review

TMM 1-7; RN 14-15, 18

22

2003 Oct 17

Midterm Exam

23

2003 Oct 20

Reinforcement learning I

TMM 13; RN 20

24

2003 Oct 22

Reinforcement learning II

TMM 13; papers

25

2003 Oct 24

Dynamic models I: value/policy iteration

RN 16

26

2003 Oct 27

Dynamic models II: DBNs

RN 17

27

2003 Oct 29

Markov Decision Processes

RN 17

28

2003 Oct 31

Combining classifiers (WM, bagging)

TMM 7

29

2003 Nov 03

Boosting

TMM 9; papers

30

2003 Nov 05

Introduction to GEC

TMM 9

31

2003 Nov 07

Genetic algorithms

DEG

32

2003 Nov 10

Genetic programming 1

JK; paper

33

2003 Nov 12

Genetic programming 2

JK; paper

34

2003 Nov 14

IBL, k-nearest neighbor

TMM 8.1-4

35

2003 Nov 17

Rule learning and extraction

Paper

36

2003 Nov 19

Inductive logic programming

TMM 10; RN 21

37

2003 Nov 21

Support vector machines

New

38

2003 Nov 24

Applications 1: Decision Support

39

2003 Nov 26

Applications 2: Commercial KDD

40

2003 Dec 01

Applications 3: Bioinformatics

41

2003 Dec 03

Applications 4: KDD Cup

42

2003 Dec 05

Applications 5: Machine Translation

43

2003 Dec 08

Applications 6: Advanced Topics

44

2003 Dec 10

Final review

TMM 1-10, 13; RN 14-21

 

TBD

FINAL EXAM

TMM 1-10, 13

 

TMM: Machine Learning, T. M. Mitchell

RN: Artificial Intelligence: A Modern Approach, S. J. Russell and P. Norvig

DEG: Genetic Algorithms in Search, Optimization, and Machine Learning, D. E. Goldberg

JK: Genetic Programming, J. Koza

 

Lightly-shaded entries denote the (Friday) due date of a written problem set.

Heavily-shaded entries denote the (Friday) due date of a machine problem (programming assignment).

Green-shaded entries denote the (Monday) due date of a paper review.