CIS 830/864 (Advanced Topics in AI / Data Engineering)
Spring, 2000
Homework Assignment 3
Friday, April 14, 2000
Due: Friday, April 28, 2000 (by 5pm)
This assignment is designed to give you some practice in implementing
a basic machine learning algorithm using your preferred programming language.
Refer to the course intro handout for guidelines on working with other
students. Remember to submit your solutions in electronic form to cis830ta@ringil.cis.ksu.edu
and produce them only from your personal data, source code, and
notes (not common work or sources other than the codes specified in
this machine problem). Do not use any source code other than
published pseudo-code for this machine problem (and cite your source, as
always).
-
(60 points) Implementing the version space algorithm. For this machine
problem, you will use your course accounts on the KSU CIS department KDD
cluster machines (Topeka and Salina).
Using your favorite programming language, implement the candidate
elimination algorithm described in Chapter 18 of Russell and Norvig
and Chapter 2 of Mitchell. (You may use a functional or logic programming
language if you know one.) Submit your source code, project files, and
instructions for compiling your project.
Specification:
-
Your program should take training examples in the format of MLC++ .data
files and schema files in the format of MLC++ .names files. The
program must be invoked as follows:
<VS-program> XYZ
where XYZ.data and XYZ.names contain the data files.
-
Your program should print out the S and G sets after each
training example.
-
(40 points) Testing and documenting your project.
-
Run your program on the EnjoySport training examples and print out
the resulting members of S and G. Turn in this output.
-
Run your program on one of the Irvine test data sets from Homework 2, and
compare your training accuracy to that of ID3. Report the results
in a table.