Next: Evidence Files

Glue

The glue package is a genetic algorithm farm/client combination that iterates through several permutations of orderings to determine an optimal ordering for the K2 algorithm. This is an actual application that utilizes several aspects of the BNJ library and is not actually part of the library itself. However, we are including this for research purposes and to provide an example of how our toolkit can be utilized. (Please see our references.)

Table of contents:

1. Set up data files
1.1 Create Var File
1.2 Create Samples
1.2.1 Copy bbn files
1.2.2 Generate samples
2. Copy files to remote machines
3. Launch server
4. Launch clients
5. View output

--------------------------------------------------------------------------------

1. Set up data files

1.1 Create Var file

Either copy the tools/glue.var into the working directory, or create a
file entitled glue.var in the working directory. Paste into it
the following lines:

### BEGIN GLUE.VAR ###
#-- file names --#
trainingFile = asia_train.xml
valFile = asia_val.xml
testFile = asia_test.xml
generationFile = generations.csv
outputFile = output
evidenceFile = evidence.txt
outputNetworkFile = output.xml
printingsFile = printings.txt

#-- glueGA params --#
port=3200
populationSize=10
generations = 100

#-- sampling params --#
sampling_method = ais
### END GLUE.VAR ###

1.2 Create samples

1.2.1 Copy bbn files
Copy tools/data/asia.xml to the working directory.
Copy tools/data/evidence.txt to the working directory.

1.2.2 Generate samples
run:
java bbn.DataGenerator asia.xml asia_train.xml 20000
java bbn.DataGenerator asia.xml asia_val.xml 1200
java bbn.DataGenerator asia.xml asia_test.xml 12000

2. Copy files to remote machines
Tarball the working directory and copy that to as many machines as you'd
like to farm the job out to. To make sure tar copies the files and not
just the symlinks, try:
cp -bLR working farm
Then, tar -cvzf farm.tgz farm
(if anyone knows a better way, please say so. The dereference option in
tar didn't seem to work for me.)

3. Launch server

Log on to the server (the server need not be a monster of a machine..
just one that is java-enabled and is connected to the internet)

run:
java glue.GlueGA glue.var

4. Launch clients

On each client, run:
java glue.GlueClient servername 3200
(replace servername with the name or ip of the server).
(You can launch the client on the same machine the server is running, as
well.)

5. View results

The results of the populations will be in the file entitled "output" and
the best final network will be in output.xml. You can also see the
results of the generations on the server's standard out as each generation
completes.

The best generation values can be plotted by opening generations.csv in a
spreadsheet application. Each line represents the best value that
specific generation produced.

The final "blue line" can be plotted by viewing the printings.txt file.

KDD-Tools
Thurs June 27 2002