id3
Class NodeCategorizer

java.lang.Object
  |
  +--shared.Globals
        |
        +--shared.Categorizer
              |
              +--id3.NodeCategorizer
Direct Known Subclasses:
AttrCategorizer, LeafCategorizer, ThresholdCategorizer

public abstract class NodeCategorizer
extends Categorizer

An abstract base class categorizer for categorizers that may sit in nodes of decision trees, graphs, etc. Categorizers of this sort generally categorize by making a decision about the instance, and then asking one or more other categorizers in the graph to categorize. The recursion ends when a NodeCategorizer can decide on the category (or distribution, in the case of scoring) without consulting other categorizers.


Fields inherited from class shared.Categorizer
CATEGORIZER_ID_BASE, CLASS_ATTR_CATEGORIZER, CLASS_ATTR_EQ_CATEGORIZER, CLASS_ATTR_SUBSET_CATEGORIZER, CLASS_BAD_CATEGORIZER, CLASS_BAGGING_CATEGORIZER, CLASS_CASCADE_CATEGORIZER, CLASS_CLUSTER_CATEGORIZER, CLASS_CONST_CATEGORIZER, CLASS_CONSTRUCT_CATEGORIZER, CLASS_DISC_CATEGORIZER, CLASS_DISC_NODE_CATEGORIZER, CLASS_DTREE_CATEGORIZER, CLASS_IB_CATEGORIZER, CLASS_LAZYDT_CATEGORIZER, CLASS_LEAF_CATEGORIZER, CLASS_LINDISCR_CATEGORIZER, CLASS_MAJORITY_CATEGORIZER, CLASS_MULTI_SPLIT_CATEGORIZER, CLASS_MULTITHRESH_CATEGORIZER, CLASS_NB_CATEGORIZER, CLASS_ODT_CATEGORIZER, CLASS_ONE_R_CATEGORIZER, CLASS_OPTION_CATEGORIZER, CLASS_PROJECT_CATEGORIZER, CLASS_RDG_CATEGORIZER, CLASS_STACKING_CATEGORIZER, CLASS_TABLE_CATEGORIZER, CLASS_THRESHOLD_CATEGORIZER, logOptions
 
Fields inherited from class shared.Globals
badCategorizer, CONFIDENCE_INTERVAL_Z, DBG, DEFAULT_DATA_EXT, DEFAULT_EPSILON, DEFAULT_EVAL_LIMIT, DEFAULT_LAMBDA, DEFAULT_MAX_EVALS, DEFAULT_MAX_STALE, DEFAULT_MIN_EXP_EVALS, DEFAULT_NAMES_EXT, DEFAULT_SAS_SEED, DEFAULT_SEARCH_METHOD, DEFAULT_SHOW_TEST_SET_PERF, DEFAULT_TEST_EXT, DISPLAY_NAMES, EMPTY_STRING, FIRST_CATEGORY_VAL, FIRST_NOMINAL_VAL, LEFT_NODE, MAX_NUM_CATEGORIES, Mcerr, Mcout, optionServer, optionsFileName, REAL_MAX, RIGHT_NODE, SHOW_TEST_SET_PERF_HELP, SINGLE_QUOTE, STORED_REAL_MAX, TS, UNDEFINED_INT, UNDEFINED_REAL, UNDEFINED_VARIANCE, UNKNOWN_AUG_CATEGORY, UNKNOWN_CATEGORY_VAL, UNKNOWN_NODE, UNKNOWN_NOMINAL_VAL, UNKNOWN_STORED_REAL_VAL, UNKNOWN_VAL_STR
 
Constructor Summary
NodeCategorizer(int noCat, java.lang.String dscr, Schema schema)
          Constructor.
 
Method Summary
 void add_instance_loss(Instance instance, CatDist pred)
          Updates the loss information for this node to reflect the node's performance on the given instance, and the given prediction.
abstract  AugCategory branch(Instance inst)
          Traverses the graph of nodes from this NodeCategorizer to determine the category the given instance should be predicted as.
 AugCategory categorize(Instance instance)
          Categorize an instance.
 void distribute_instances(InstanceList il, double pruningFactor, DoubleRef pessimisticErrors, int ldType, double leafDistParameter, double[] parentWeightDist, boolean saveOriginalDistr)
          Recomputes the distribution of the categorizer according to the given instance list, splits it, and redistributes the split lists among the child categorizers.
 NodeCategorizer get_child_categorizer(AugCategory branch)
          Returns the child categorizer of this node that is found by following the edge with the given label.
 NodeCategorizer get_child_categorizer(Instance inst)
          Retrieves the appropriate categorizer one level down in the graph, obtained by following the edge appropriate for the instance provided.
protected  CGraph get_graph()
          Returns the graph for this NodeCategorizer.
 NodeLoss get_loss()
          Returns the loss information.
protected  Node get_node()
          Returns the node for this NodeCategorizer.
 boolean in_graph()
          Returns TRUE if a graph has been set for this NodeCategorizer, FALSE otherwise.
 void reset_node_loss()
          Clears the loss information.
 CatDist score(Instance inst)
          Score an instance.
 CatDist score(Instance inst, boolean addLoss)
          Score an instance.
 void set_graph_and_node(CGraph aGraph, Node aNode)
          Install the graph and node into the object.
 InstanceList[] split_instance_list(InstanceList il)
          Splits the instance list according to the value returned by branch() for each instance.
 void stop()
          Prints an empty string to System.out.
 boolean supports_scoring()
          Returns TRUE if scoring supported by this node categorizer.
 java.lang.String toString()
          Creates a String representation of this NodeCategorizer.
protected  void update_loss(double weight, double loss)
          Updates the loss information with the given values.
 
Methods inherited from class shared.Categorizer
build_distr, clone, description, display_struct, get_distr, get_log_level, get_log_options, get_log_stream, get_schema, has_distr, num_categories, set_description, set_distr, set_log_level, set_log_options, set_log_prefixes, set_log_stream, set_original_distr, set_used_attr, total_weight
 
Methods inherited from class java.lang.Object
equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Constructor Detail

NodeCategorizer

public NodeCategorizer(int noCat,
                       java.lang.String dscr,
                       Schema schema)
Constructor.
Parameters:
noCat - The category for this NodeCategorizer.
dscr - Description of this NodeCategorizer.
schema - Schema for the data this categorizer classifies.
Method Detail

stop

public void stop()
Prints an empty string to System.out.

toString

public java.lang.String toString()
Creates a String representation of this NodeCategorizer.
Overrides:
toString in class java.lang.Object
Returns:
A String representation of this NodeCategorizer.

reset_node_loss

public void reset_node_loss()
Clears the loss information.

in_graph

public boolean in_graph()
Returns TRUE if a graph has been set for this NodeCategorizer, FALSE otherwise.
Returns:
TRUE if a graph has been set for this NodeCategorizer, FALSE otherwise.

split_instance_list

public InstanceList[] split_instance_list(InstanceList il)
Splits the instance list according to the value returned by branch() for each instance.
Parameters:
il - The InstanceList to be split.
Returns:
A array of partitions of the given InstanceList.

branch

public abstract AugCategory branch(Instance inst)
Traverses the graph of nodes from this NodeCategorizer to determine the category the given instance should be predicted as.
Parameters:
inst - The instance for which a prediction is requested.
Returns:
The category for the given instance.

categorize

public AugCategory categorize(Instance instance)
Categorize an instance.
Overrides:
categorize in class Categorizer
Parameters:
instance - The instance to be categorized.
Returns:
The category of the given instance.

supports_scoring

public boolean supports_scoring()
Returns TRUE if scoring supported by this node categorizer. TRUE is always returned.
Overrides:
supports_scoring in class Categorizer
Returns:
TRUE.

score

public CatDist score(Instance inst)
Score an instance. Scoring function contains the option of carrying the loss information through the graph.
Overrides:
score in class Categorizer
Parameters:
inst - The instance to be scored.
Returns:
The score of the given instance.

score

public CatDist score(Instance inst,
                     boolean addLoss)
Score an instance. Scoring function contains the option of carrying the loss information through the graph.
Parameters:
inst - The instance to be scored.
addLoss - TRUE if the loss information is to be carried through the graph, FALSE otherwise.
Returns:
The score of the given instance.

add_instance_loss

public void add_instance_loss(Instance instance,
                              CatDist pred)
Updates the loss information for this node to reflect the node's performance on the given instance, and the given prediction.
Parameters:
instance - The instance to which given prediction applies.
pred - The prediction of category distributions.

get_child_categorizer

public NodeCategorizer get_child_categorizer(AugCategory branch)
Returns the child categorizer of this node that is found by following the edge with the given label.
Parameters:
branch - The category of the edge for which the child categorizer is requested.
Returns:
The child categorizer.

get_child_categorizer

public NodeCategorizer get_child_categorizer(Instance inst)
Retrieves the appropriate categorizer one level down in the graph, obtained by following the edge appropriate for the instance provided.
Parameters:
inst - The instance provided for determining which edge to traverse.
Returns:
The child categorizer of the appropriate edge.

update_loss

protected void update_loss(double weight,
                           double loss)
Updates the loss information with the given values.
Parameters:
weight - The new weight value.
loss - The new loss value.

get_graph

protected CGraph get_graph()
Returns the graph for this NodeCategorizer.
Returns:
The graph for this NodeCategorizer.

get_node

protected Node get_node()
Returns the node for this NodeCategorizer.
Returns:
The node for this NodeCategorizer.

distribute_instances

public void distribute_instances(InstanceList il,
                                 double pruningFactor,
                                 DoubleRef pessimisticErrors,
                                 int ldType,
                                 double leafDistParameter,
                                 double[] parentWeightDist,
                                 boolean saveOriginalDistr)
Recomputes the distribution of the categorizer according to the given instance list, splits it, and redistributes the split lists among the child categorizers. This process is used to backfit an instance list to a graph structure.
Parameters:
il - The instance list used for recomputation.
pruningFactor - The amount of pruning being done.
pessimisticErrors - The pessimistic Error value.
ldType - Leaf distribution type.
leafDistParameter - The leaf distribution.
parentWeightDist - The weight distribution of the parent categorizer.
saveOriginalDistr - TRUE if the original distribution should be preserved, FALSE otherwise.

set_graph_and_node

public void set_graph_and_node(CGraph aGraph,
                               Node aNode)
Install the graph and node into the object.
Parameters:
aGraph - The graph of NodeCategorizers.
aNode - The node for this NodeCategorizer.

get_loss

public NodeLoss get_loss()
Returns the loss information.
Returns:
The loss information.