CIS 732
Tuesday, 27 November 2001
Due: Thursday,
06 December 2001
(before
midnight Friday 07 December 2001)
Refer
to the course intro handout for guidelines on working with other students.
Note: Remember to submit your solutions in
electronic form using hwsubmit and produce
them only from your personal source code, scripts, and documents from the
machine learning applications used in this MP (not common work or sources
other than the textbook or properly cited references).
Problems
First,
log into your course accounts on the KDD Core (Ringil, Fingolfin,
Yavanna, Nienna, Frodo, Samwise, Merry, Pippin)
and make sure your home directory is in order.
Notify admin@www.kddresearch.org
(and cc: cis732ta@www.kddresearch.org)
if you have any problems at this stage. You
should have MLC++-2.01.tar.gz installed from MP2. Actually, on KDD Core systems, it is already
in /usr and you can just set your path environment
variable in your .tcshrc or .cshrc
and the MLCDIR in your .login, then run Inducer.
1.
(60 points total) Learning time
series data with NeuroSolutions.
Your
solution to this problem must be in MS Excel, PostScript, or PDF format, and
you must use a spreadsheet (I recommend GNUmeric or
Excel 2000/XP) to record your solution.
For
all parts, turn in training (80%) and cross validation (20%) error values.
a)
(30
points) Train a Jordan-Elman network for the same
task and report the results. Use the default
settings and the input recurrent network (the upper left entry among the 4
choices. Take a screen shot of your
artificial neural network after training (in Windows, hit Print-Screen and
paste the Clipboard into your word processor).
b) (15 points) Train a time-delay
neural network for the same task and report the results.
c) (15 points) Train a Gamma memory for
the same task and report the results.
2.
(40 points) Comparing Inducer Performance.
Run Discrete-Naïve-Bayes on the Pima and Monk3-Full data sets and compare its performance (training and test
set accuracy) to C4.5. Write
a program (shell script, Perl script, or C or Java
program) to do 5-way cross-validation on these 2 data sets. Turn in this program along with a file
containing a table of training and test set accuracy values.
Extra credit (20
points) Evaluating significance. Run a paired t-test between Discrete-Naïve-Bayes and C4.5
on the Mushroom data set, divided in
5 segments (train on 4 segments and test on the 5th). Report on the test set accuracy and the
significance level.