CIS 830/864 (Advanced Topics in AI / Data Engineering)
Spring, 2000

Homework Assignment 3

Friday, April 14, 2000
Due: Friday, April 28, 2000 (by 5pm)
 

This assignment is designed to give you some practice in implementing a basic machine learning algorithm using your preferred programming language.

Refer to the course intro handout for guidelines on working with other students. Remember to submit your solutions in electronic form to cis830ta@ringil.cis.ksu.edu and produce them only from your personal data, source code, and notes (not common work or sources other than the codes specified in this machine problem). Do not use any source code other than published pseudo-code for this machine problem (and cite your source, as always).

  1. (60 points) Implementing the version space algorithm. For this machine problem, you will use your course accounts on the KSU CIS department KDD cluster machines (Topeka and Salina).
Using your favorite programming language, implement the candidate elimination algorithm described in Chapter 18 of Russell and Norvig and Chapter 2 of Mitchell. (You may use a functional or logic programming language if you know one.) Submit your source code, project files, and instructions for compiling your project.

Specification:

    1. Your program should take training examples in the format of MLC++ .data files and schema files in the format of MLC++ .names files. The program must be invoked as follows:
    2. <VS-program> XYZ
      where XYZ.data and XYZ.names contain the data files.
    3. Your program should print out the S and G sets after each training example.
  1. (40 points) Testing and documenting your project.
    1. Run your program on the EnjoySport training examples and print out the resulting members of S and G. Turn in this output.
    2. Run your program on one of the Irvine test data sets from Homework 2, and compare your training accuracy to that of ID3. Report the results in a table.