CIS 830/864
(Advanced Topics in Artificial Intelligence /
Data Engineering)
Due: Friday, February 2,
2001 (by midnight)
This written assignment is designed to apply your theoretical understanding of the machine learning framework and fundamentals of supervised, inductive learning.
Refer to the course intro
handout for guidelines on working with other students. Your solutions should be produced them
only from your personal notes (not common work or sources other than the
textbook or properly cited references).
Note: Remember to submit your solutions in electronic
form using handin (instructions shall be posted on the web board)
and in hard copy in class (unless you are taking this course via distance
education).
Warm-up / refresher exercises (basic artificial
intelligence)
1.
(10 points total) Informed
(Heuristic) Search.
a)
(5 points) Sometimes there is no good evaluation
function for a heuristic search problem, but there is a good comparison method:
a way to tell if one node is better than another, without assigning numerical
values to either. Show that this
is enough to do a best-first search.
What properties of best-first search do we give up if we have only a
comparison method?
b)
(5 points) Define,
in your own words, the term generalization as search and explain
how it relates to machine learning.
Machine Learning and Knowledge Discovery in Databases
(KDD)
2.
(7 points) Inductive
Bias. Explain, in your own
words, the difference between representation (language) bias and preference
(search) bias, giving an original example of each. In your examples, you may use algorithms we
have covered in class so far or those discussed in papers and outside
references.
3.
(8 points) Problem
Solving (Supervised Inductive Learning). Consider
the version space example (EnjoySport) covered in class. Give the sequence of S and G boundary
sets computed by the CANDIDATE-ELIMINATION algorithm if it is given the
sequence of training examples in this example in reverse order: (x4, x3, x2, x1).
Although the final version space will be the same regardless of the
sequence of examples (explain why), the sets S and G computed at
intermediate stages will, of course, depend on this sequence. Give an idea for ordering the
training examples to minimize the sum of the sizes of these intermediate S and
G sets for the H used in the EnjoySport example.
4.
(5 points) Data
Visualization. How
can the ability to visualize models (e.g., hypotheses found using supervised
inductive learning) be of use in decision making?
Class participation:
a) (Required) Post your “turn-to-a-partner” exercise from Wednesday, January 24, 2001, or a follow-up containing your thoughts on this exercise, to the CIS830/864 web board.
b) (Optional) Post any unclear points regarding machine learning or examples that you would like to see covered in lecture or explained again on the class web board.