CIS 830/864 (Advanced Topics in AI / Data Engineering)
Spring, 2000
Homework Assignment 2
Wednesday, March 8, 2000
Due: Friday, March 31, 2000 (by 5pm)
This assignment is designed to give you some practice in using existing
machine learning (ML) packages to implement for ML for knowledge discovery
in databases (KDD).
Refer to the course intro handout for guidelines on working with other
students. Remember to submit your solutions in electronic form to cis830ta@ringil.cis.ksu.edu
and produce them only from your personal data, source code, and
notes (not common work or sources other than the codes specified in
this machine problem). If you intend to use other references (e.g., codes
downloaded from the CMU archive, NRL archive, or other software repositories
such as those referenced from KD Nuggets or the instructors "related
links" page), get the instructor’s permission, and cite your reference
properly.
-
(40 points) Running ID3. For this machine problem, you will
use your course accounts on the KSU CIS department KDD cluster. Note:
Your accounts may not yet be operational by the time this assignment is
distributed – check the course web page for the latest information.
-
Log into your course accounts on Topeka and Salina (when they are ready)
and download the files:
http://ringil.cis.ksu.edu/Courses/Spring-2000/CIS830/Homework/Problems/HW2/MLC++-2.01.tar.gz
http://ringil.cis.ksu.edu/Courses/Spring-2000/CIS830/Homework/Problems/HW2/db.tar.gz
This is the pre-compiled binary for RedHat Linux 6.x.
-
Follow the instructions in the MLC++ manual (Utilities 2.0, in your
first notes packet) for installing it in your scratch directory
on Topeka and Salina:
/cis/{topeka | salina}/scratch/CIS830/yourlogin
-
Follow the instructions in the MLC++ tutorial (also in your first
notes packet) to run the ID3 inducer on the following data sets
from the UCI Machine Learning Database Repository: Breast, Tic-Tac-Toe,
Ionosphere. Use the .test files for testing. Turn in a PostScript file
containing the decision tree and another file containing a table of training
and test set accuracy values for each data set.
-
(30 points) Using NeuroSolutions.
-
Download the NeuroSolutions 3.02 demo from http://www.nd.com
and install it on a Windows 95, 98, NT 4.0, or NT 5 (Windows 2000) machine.
-
Use the NeuralWizard (which is fully documented in the online help for
NeuroSolutions 3) to build a multilayer perceptron for learning
the second sleep stage data set. Your training data file should
be sleep2.asc and your desired response file should be sleep2t.asc.
Use a 20% holdout data set for cross validation. Report both training
and cross validation performance (mean-squared error) by stamping a
MatrixViewer probe on top of the (octagonal) costTransmitter module and
recording the final value after training (for the default of 1000 epochs).
-
(30 points) Using Hugin.
Download the Hugin Lite demo from http://www.hugin.com
and use it to build the full Bayesian network for the Forest Fire example
from Lecture 19, using your own subjective estimates of CPTs. Make sure
that all your probability values are legitimate (specifically, that they
have the proper range and marginalize properly). Turn in, by e-mail,
a screen shot of your BBN and attach it as a Hugin file titled ForestFire.hkb.
Extra credit (15 points): Learning time series data with NeuroSolutions.
For all parts, turn in training (80%) and cross validation (20%)
error values.
-
Train a Jordan-Elman network for the same task and report the results.
Use the default settings and the input recurrent network (the upper left
entry among the 4 choices. Take a screen shot of your artificial neural
network after training (in Windows, hit Print-Screen and paste the Clipboard
into your word processor).
-
Train a time-delay neural network for the same task and report the results.
-
Train a Gamma memory for the same task and report the results.
Extra credit (5 points): Commentary.
Post substantive comments relating to your review of any of
the following papers:
-
The Lumiere Project: Bayesian User Modeling for Inferring the Goals and
Needs of Software Users (Horvitz, Breese, Heckerman, Hovel, and Rommelse)
-
Symbolic Causal Networks for Reasoning about Actions and Plans (Darwiche
and Pearl)
-
KDD for Science Data Analysis: Issues and Examples (Fayyad, Haussler, and
Stolorz)
in the class web board (http://ringil.cis.ksu.edu/Courses/Spring-2000/CIS830/Board),
or reply to one of the discussion threads on these papers. Title your article
appropriately (e.g., "Comments on Paper 12").