Re: questions about PhaseIIIa


[ Follow Ups ] [ Post Followup ] [ K-State KDD Lab Research Discussion Forum ] [ FAQ ]

Posted by Haipeng Guo on February 20, 2001 at 03:03:49:

In Reply to: questions about PhaseIIIa posted by Julie on February 20, 2001 at 01:00:31:

Julie,

Generally, your understanding is correct.

The core implementation of phaseIIIa is an approximating BBN inference algorithm.

-Input: <1> an .xml XMLBIF file representing a BBN
<2> a .dat file containing k cases data points for that BBN. This file can be generated by the data generator.

Followed is an example .dat file with 1 case:

*******Asia.dat*******
0 1 0 1 1 0 0 0 ---------> 1 = evidence node, 0 = query node
A B C D E F G H ---------> node names
a1 b1 c2 d1 e2 f1 g1 h1 ---------> node values
**********************

-Procedure and Output:
1. The input module loads .xml into an AllNodes object in memory, and loads .dat file into a suitable object in

memory.
2. The core implementation of your computing module computes P(X|E), the probabilities of query nodes given

evidence, for each case in the .dat file . Here X={A,C,F,G,H}, E={B,D,E}.
3. Your computing module loops through all possible values that each query node can take, computes P(X|E) =

P(X,E)/P(E) according to the Bounded-variance algorithm , and then choose the value corresponding to the maximal

probability as the prediction.
4. It then counts each error (where an error is a difference between the prediction and the actual value in the

line in the .dat file), and calculates the inferential loss.
5. Do 2,3,4 for all cases in .dat file, sum up all inferential loss , output the final value or compute the MSE

procedure Bounded-variance (see P15 in Dagum&Luby's paper)
Input: relative error 0Output: P(W)
Begin
Compute S* and A(the product of u1, u2, ... uk) from e and delta;
S=0;
t=0; // t - the number of sampling experiments
While S Generate_Sample z;
compute new S according to z;
t++;
}
T = t;
Output P(W) = A(S*)/T
End

Procedure Generate_Sample
Input: a BBN, a set of evidence nodes E
Output: a sample z for all nodes
Begin
Order the BBN nodes so that parent nodes occur before child nodes;
Generate value for all nodes starting from root nodes, if a root node belongs to E then instantiate it to its

observed value; If not, we generate a value for this node according to its prior probability;
Once all root nodes are instantiated, proceed to the set of these nodes that have all parents instantiated. Again,

if it belongs to E then instantiate it to its observed value; If not, we generate a value for this node according to its

parents' values and its PDT;
We proceed until all nodes in the BBN are instantiated;
Output the sample instantiation z;
End


Note: The Input module and Generate_Sample Procedure has been implemented in Data generator, but it may need some minor

modifications.


-hpguo


: -In a given phase, I take in a case from a line in the .dat file. Whichever
: values it has for my evidence nodes I hold constant, and then try to predict
: the outcome of all other (query) nodes.

: -To predict the values of these query nodes, I loop through all possible
: values that each query node can take, evaluate the probability of each
: possible value based on its parents, and then choose the value corresponding
: to the maximal probability as my prediction. I continue this until all
: query nodes have been predicted.
: -I then count each error (where an error is a difference between my
: prediction and the actual value in the line in the .dat file), and calculate
: the inferential loss.

: If my above ideas are correct, I have the following questions on how to
: implement them:

: -Where does the dataGenerate function come in from DataGen? It seems to me
: that if I randomly generate all query root nodes, then I would then be able
: to infer all other query nodes by simply taking the max probability
: associated with the given parents. Or does the dataGenerate generate
: appropriate values while taking the parent's values into account?

: -How do I access the probability table? I know that each node has a Vector
: called probabilitytable associated with it, but how do I tell from this
: which values correspond to a certain instantiation of the node's parents?
: -I believe I'm supposed to randomly generate (using dataGenerate) values for
: the query nodes over and over until a certain value is "honed in on". Does
: this correspond to the place in Dagum and Luby's algorithm where they loop
: till the sum over all instantiations of the product of probabilites that
: that instantiation will have given its parents until that sum is greater
: than the specified S*?

: If at all possible, I'd really like the method and class specifications that
: you mentioned as well as a somewhat layman's explanation as to how my module
: basically operates, and how exactly it makes use of all the classes and
: methods from the data generator.

: I'm sorry I keep having various confusions pop up, but I want to make sure
: that I fully understand what I am trying to do and how best to go about it
: before attacking things any more. I feel like I have a feel for the basic
: ideas behind the module, but am still unclear on the more detailed
: specifications of implementation. I'd really like to get this module
: finished this week, so your comments are greatly appreciated.




Follow Ups:



Post a Followup

Name:
E-Mail:

Subject:

Comments:

Optional Link URL:
Link Title:
Optional Image URL:


[ Follow Ups ] [ Post Followup ] [ K-State KDD Lab Research Discussion Forum ] [ FAQ ]