K-State CIS 830/864: Homework 1 (Written Problem Set)

CIS 830/864

(Advanced Topics in Artificial Intelligence / Data Engineering)

Spring, 2001

Homework Assignment 1

Problem Set

Sunday, January 21, 2001

Due: Friday, February 2, 2001 (by midnight)

This written assignment is designed to apply your theoretical understanding of the machine learning framework and fundamentals of supervised, inductive learning.

Refer to the course intro handout for guidelines on working with other students. Your solutions should be produced them only from your personal notes (not common work or sources other than the textbook or properly cited references).

Note: Remember to submit your solutions in electronic form using handin (instructions shall be posted on the web board) and in hard copy in class (unless you are taking this course via distance education).

Warm-up / refresher exercises (basic artificial intelligence)

1. (10 points total) Informed (Heuristic) Search.

a) (5 points) Sometimes there is no good evaluation function for a heuristic search problem, but there is a good comparison method: a way to tell if one node is better than another, without assigning numerical values to either. Show that this is enough to do a best-first search. What properties of best-first search do we give up if we have only a comparison method?

b) (5 points) Define, in your own words, the term generalization as search and explain how it relates to machine learning.

Machine Learning and Knowledge Discovery in Databases (KDD)

2. (7 points) Inductive Bias. Explain, in your own words, the difference between representation (language) bias and preference (search) bias, giving an original example of each. In your examples, you may use algorithms we have covered in class so far or those discussed in papers and outside references.

3. (8 points) Problem Solving (Supervised Inductive Learning). Consider the version space example (EnjoySport) covered in class. Give the sequence of S and G boundary sets computed by the CANDIDATE-ELIMINATION algorithm if it is given the sequence of training examples in this example in reverse order: (x₄, x₃, x₂, x₁). Although the final version space will be the same regardless of the sequence of examples (explain why), the sets S and G computed at intermediate stages will, of course, depend on this sequence. Give an idea for ordering the training examples to minimize the sum of the sizes of these intermediate S and G sets for the H used in the EnjoySport example.

4. (5 points) Data Visualization. How can the ability to visualize models (e.g., hypotheses found using supervised inductive learning) be of use in decision making?

Class participation:

a) (Required) Post your “turn-to-a-partner” exercise from Wednesday, January 24, 2001, or a follow-up containing your thoughts on this exercise, to the CIS830/864 web board.

b) (Optional) Post any unclear points regarding machine learning or examples that you would like to see covered in lecture or explained again on the class web board.