K-State CIS 798: Homework 2, Part 3

CIS 798 (Topics in Computer Science)

Topics in Intelligent Systems and Machine Learning

Fall, 1999

Homework Assignment 2

Part 3 (35 points): Perceptrons and Winnow

You should finish this part by Friday, October 15, 1999.

In this machine problem, you will implement two very simple and important learning algorithms and experiment with them by comparing them on a synthetic data set. The algorithms are online, one-pass versions of Perceptron and Winnow, as described in Chapter 4 of Mitchell and Lecture 6.

You will run 3 experiments. In each, you will learn a different target function (generated using the same program) using each algorithm and present the corresponding learning curves.

Target function and data

The instance space is {0, 1}⁵⁰⁰ - that is, there are 500 binary (boolean-valued) attributes, or features, in the domain. The 3 target concepts will all be k-of-m functions, each time with a different k.

Definition: Let f: {0, 1}ⁿ ® {0, 1} be an k-of-m function. f is defined as follows: Let x Î {0, 1}ⁿ. Let S be a fixed subset of attributes (out of the n possible) and let |S| = m. f(x) = 1 if and only if k attributes out of the m fixed attributes have the value 1.

Note that if m = n, then we have the well-known k-of-n concept (called m-of-n in class and in most of the literature). This is the case where all the attributes are relevant. We are going experiment with the case where only the first half (m = 250) of the attributes are relevant. This function can be represented using the hypothesis language provided by Perceptron and Winnow.

You will run 3 experiments, with different values of k: k = 10, k = 50, and k = 100.

Data generation

First, you will generate the data for the experiments yourself. For each function f₁₀, f₅₀, f₁₀₀, generate 1100 data points in the following way:

For each value of k, generate 1000 training examples and 100 test examples. You should then have 3 sets of examples: D₁₀, D₅₀, and D₁₀₀.

For each example, randomly generate 100 attributes (bits) and copy them 4 more times to get 500.

For each k, generate the data under a different probability distribution. You can do this easily by normalizing the result of your random number generator (e.g., rand() in C++) and changing the threshold for setting each x_i to 1. We do this so that you will have a good distribution of positive and negative examples.

For D₁₀, P(x_i = 1) = 0.03

For D₅₀, P(x_i = 1) = 0.2

For D₁₀₀, P(x_i = 1) = 0.4

To label the examples in D_k according to f_k, test the first half of the attributes. For instance, each example in D₁₀ is labeled positive if and only if 10 or more of the first 250 attributes have value 1.

Implementation

This should be the easy part! The data generation may actually be more complicated.

The variants of Perceptron and Winnow that you are implementing are especially simple: use online updates (after each example) and train only one time per example. Note that Perceptron and Winnow are essentially the same algorithm, with a slightly different update rule and constants - e.g., start with a weight vector of (1, 1, …, 1) for Winnow as documented in Lecture 6.

Experiments

For each of the 3 target functions, train the 2 algorithms on the first 50 to 1000 examples in increments of 50 (i.e., 50, 100, 150, …, 950, 1000), and, in each case, test on the 100 test examples you generated. The easiest way to do this is to load the data set into memory and write an outer loop that calls each algorithm on the first m training examples.

You are encouraged to experiment with parameters to find the most effective ones, but be consistent (within and across data sets) when collecting the final results.

What to submit

A 1-2 paragraph description of what you did: design decisions in the implementations of the algorithms, parameters used, etc.

3 graphs of the performance (total number of mistakes) as a function of the number of training examples of each of the algorithms. For each k, present a separate graph with 2 curves, one for each algorithm (Perceptron and Winnow). You may plot these 3 curves using Microsoft Excel, GNUplot, or any other plotting software. No hand-drawn plots, please.

A brief (2-4 sentence) description and analysis of the final hypothesis of each algorithm.

A brief (1 paragraph) discussion of the differences you see in the performance of the algorithms across target functions. Try to explain these differences.

Your source code.