MLJ: Machine Learning in Java API Manual: Class ID3Inducer

Overview

Package

Class

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: INNER | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

id3
Class ID3Inducer

java.lang.Object
  |
  +--shared.BaseInducer
        |
        +--shared.Inducer
              |
              +--id3.TDDTInducer
                    |
                    +--id3.ID3Inducer

public class ID3Inducer
extends TDDTInducer

The ID3Class is the Java implementation of the ID3 algorithm. The ID3 algorithm is a top-down decision-tree induction algorithm. This algorithm uses the mutual information (original gain criteria),and not the more recent information gain ratio.

Complexity:

Our split() method uses entropy and takes time O(vy) where v is the total number of attribute values (over all attributes) and y is the number of label values. This can be derived by noting that mutual_info is computed for each attribute.

Node categorizers (for predict) are AttrCategorizer and take constant time, thus the overall prediction time is O(path-length).

See TDDTInducer for more complexity information.

Enhancements:

The ID3Compute entropy once for the node, and pass it along to avoid multiple computations like we do now.

Constructor Detail

ID3Inducer

public ID3Inducer(java.lang.String dscr,
                  CGraph aCgraph)

Constructor.

Parameters:: dscr - The description of this inducer.; aCgraph - A previously developed Cgraph.

ID3Inducer

public ID3Inducer(java.lang.String dscr)

Constructor.

Parameters:: dscr - The description of this inducer.

ID3Inducer

public ID3Inducer(ID3Inducer source)

Copy Constructor.

Parameters:: source - The original ID3Inducer that is being copied.

Method Detail

best_split

public NodeCategorizer best_split(java.util.LinkedList catNames)

Returns the AttrCategorizer that splits on the best attribute found using mutual information(information gain). Returns null if there is nothing good to split on. Ties between this attribute and earlier attributes are broken.

Overrides:: best_split in class TDDTInducer

Parameters:: catNames - The names of the categories that each instance may be catagorized under.
Returns:: The NodeCategorizer that splits on the best attribute found. May be null if no good attribute split is found.

find_splits

public boolean find_splits(SplitAttr[] bestSplit,
                           SplitAttr[] splits)

Fills in the array of splits for current subtree. It does very little, but rarely overriden whereas best_split_info is overridden by subclasses.

Parameters:: bestSplit - This is an array of the best splits found during the splitting process.; splits - This is an array of all splits found during the splitting process.
Returns:: False if there is only one label value, the maximum number of splits is reached, or if there is no reasonable split available.

best_split_info

public void best_split_info(SplitAttr[] bestSplit,
                            SplitAttr[] splits)

Fills in the array of SplitAttr for current subtree. This function is a good candidate to override in subclasses.

Parameters:: bestSplit - This is an array of the best splits found during the splitting process.; splits - This is an array of all splits found during the splitting process.

multi_val_attribute

public boolean multi_val_attribute(int attrNum)

Return true if the attribute has many values according to the C4.5 definition.

Parameters:: attrNum - The number of the attribute being checked.
Returns:: True if this attribute has many values, False otherwise.

pick_best_split

public void pick_best_split(SplitAttr[] bestSplit,
                            SplitAttr[] splits,
                            StatData allMutualInfo,
                            StatData allNonMultiValMutualInfo)

Choose the best attribute to split the on from all possible splits.

Parameters:: bestSplit - The array of the best splits found during splitting process.; splits - The array of all splits found during the splitting process.; allMutualInfo - Statistical information about all instances.; allNonMultiValMutualInfo - Statistical information about instances where an attribute can only have one value at a time.

all_attributes_multi_val

public boolean all_attributes_multi_val()

Checks if all attributes are multi-valued. An attribute is multivalued if an instance can have more than one value at one time. If the attribute contains values that are neither real or nominal, an abort message is issued.

Returns:: False if the values are real numbers or if the nominal values are not multivalued. Otherwise True is returned.

split_info

public void split_info(int attrNum,
                       SplitAttr split)

Compute the split information for a given attribute.

Parameters:: attrNum - The index number of this attribute column.; split - The attribute to be split on.

split_info

public void split_info(int attrNum,
                       SplitAttr split,
                       RealAndLabelColumn[] realColumns)

Compute the split information for a given attribute.

Parameters:: attrNum - The index number of this attribute column.; split - The attribute to be split on.; realColumns - The columns of values specified for each attribute over all instances in a data set.

create_subinducer

public TDDTInducer create_subinducer(java.lang.String descr,
                                     CGraph aCgraph)

Create an Inducer for recursive calls. Since TDDTInducer is an abstract class, it can't do the recursive call.

Overrides:: create_subinducer in class TDDTInducer

Parameters:: descr - The description of the new subinducer.; aCgraph - A previously defined Cgraph for the inducer.
Returns:: The new sub-ID3Inducer created.

split_to_cat

public NodeCategorizer split_to_cat(SplitAttr split,
                                    java.util.LinkedList catNames)

Build categorizer for the given attribute. The specified SplitAttr is assumed to be valid.

Parameters:: split - The attribute to be split on.; catNames - The category names that an instance may be categorized under.
Returns:: The NodeCategorizer containing a categorizer that splits on this attribute.

copy

public Inducer copy()

Returns the reference to the copy of ID3Inducer with the same settings.

Returns:: A reference to an ID3Inducer.

class_id

public int class_id()

Deprecated. This method should be replaced with Java's instanceof operator.

Returns the class id of this of this inducer.

Overrides:: class_id in class BaseInducer

Returns:: Integer assigned to this inducer.

Overview

Package

Class

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: INNER | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

Constructor Summary
`ID3Inducer(ID3Inducer source)` Copy Constructor.
`ID3Inducer(java.lang.String dscr)` Constructor.
`ID3Inducer(java.lang.String dscr, CGraph aCgraph)` Constructor.

Method Summary
`boolean`	`all_attributes_multi_val()` Checks if all attributes are multi-valued.
`void`	`best_split_info(SplitAttr[] bestSplit, SplitAttr[] splits)` Fills in the array of SplitAttr for current subtree.
`NodeCategorizer`	`best_split(java.util.LinkedList catNames)` Returns the AttrCategorizer that splits on the best attribute found using mutual information(information gain).
`int`	`class_id()` Deprecated. This method should be replaced with Java's instanceof operator.
`Inducer`	`copy()` Returns the reference to the copy of ID3Inducer with the same settings.
`TDDTInducer`	`create_subinducer(java.lang.String descr, CGraph aCgraph)` Create an Inducer for recursive calls.
`boolean`	`find_splits(SplitAttr[] bestSplit, SplitAttr[] splits)` Fills in the array of splits for current subtree.
`boolean`	`multi_val_attribute(int attrNum)` Return true if the attribute has many values according to the C4.5 definition.
`void`	`pick_best_split(SplitAttr[] bestSplit, SplitAttr[] splits, StatData allMutualInfo, StatData allNonMultiValMutualInfo)` Choose the best attribute to split the on from all possible splits.
`void`	`split_info(int attrNum, SplitAttr split)` Compute the split information for a given attribute.
`void`	`split_info(int attrNum, SplitAttr split, RealAndLabelColumn[] realColumns)` Compute the split information for a given attribute.
`NodeCategorizer`	`split_to_cat(SplitAttr split, java.util.LinkedList catNames)` Build categorizer for the given attribute.

id3 Class ID3Inducer

ID3Inducer

ID3Inducer

ID3Inducer

best_split

find_splits

best_split_info

multi_val_attribute

pick_best_split

all_attributes_multi_val

split_info

split_info

create_subinducer

split_to_cat

copy

class_id

id3
Class ID3Inducer