CIS 798 (Topics in Computer Science)

Topics in Intelligent Systems and Machine Learning

Fall, 1999

Homework Assignment 2

Wednesday, September 22, 1999

Due: Friday, October 22, 1999 (by 11pm)

 

Machine Problems

In this programming assignment, you will implement three simple learning algorithms (ID3, Perceptron/Winnow, and simple Bayes) that we have covered in class, and test them on data sets.

You may use Java, C++, or any other high-level programming language with which you are familiar, provided it has been approved by the instructor. Java and C++ are strongly preferred. You may also use MLC++ or other published codes to check your program outputs (please document this usage). You may not, however, include any sources other than standard libraries (C standard library, standard template library, MFC, and basic Java classes) in the program you hand in.

Your programs will be evaluated based upon functionality, correctness (performance on general test cases and some boundary cases), readability, and documentation (which may be inline or separate as you choose).

You have one month for this assignment. It is important that you start early! Each part should be possible for you to complete within a week. (You should consider it a milestone goal to finish one of these parts by each Friday, starting on October 1.)

General Guidelines

Feel free to talk to other members of the class in doing the homework. I am more concerned that you learn how to solve the problem than that you demonstrate that you solved it entirely on your own. You should, however, produce your source code yourself. For programming assignments, in addition to the results (see below) you should include brief documentation (one paragraph per part) describing what you did, what difficulties you encountered, and what conclusions you reached.

Feel free to send e-mail, to come to ask questions during office hours, and especially to post your questions on the class web board, http://ringil.cis.ksu.edu/Courses/Fall-1999/CIS798/Board/.

Obtaining the test data

Test data for each machine problem will be made available in parts on the class web page (under http://ringil.cis.ksu.edu/Courses/Fall-1999/CIS798/Homework/). The names of the files you should download (where available) will be: part-i-train.data, part-i-test.data, part-i.names (a common names, or schema, file containing type declarations for the data), part-i-info.txt (instructions on the test format) part-i-train.out, and part-i-test.out (examples of correct output).

Testing your results

Your should download the sample training data and documentation after each part is complete (one per week). I will test your programs using the sample data and using a separate validation set (or other tests). Be sure that your program behaves correctly on files of the specified format (consult part-i-info.txt).

Part 1 (30 points): Decision Tree Learning – Implementing ID3

You should finish this part by Friday, October 1, 1999.

Part 2 (30 points): Decision Tree Learning – The Badge Game

You should finish this part by Friday, October 8, 1999.

Part 3 (35 points): Perceptrons and Winnow

You should finish this part by Friday, October 15, 1999.

Part 4 (30 points): Simple (Naïve) Bayes

You should finish this part by Friday, October 22, 1999.

Extra credit (5 points each)

  1. Mitchell, Problem 5.3
  2. Russell and Norvig, Problem 15.2