|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: INNER | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object | +--shared.BaseInducer | +--shared.Inducer | +--id3.TDDTInducer | +--id3.ID3Inducer
The ID3Class is the Java implementation of the ID3 algorithm. The ID3 algorithm is a top-down decision-tree induction algorithm. This algorithm uses the mutual information (original gain criteria),and not the more recent information gain ratio.
Complexity:
Our split() method uses entropy and takes time O(vy) where v is the total number of attribute values (over all attributes) and y is the number of label values. This can be derived by noting that mutual_info is computed for each attribute.
Node categorizers (for predict) are AttrCategorizer and take constant time, thus the overall prediction time is O(path-length).
See TDDTInducer for more complexity information.
Enhancements:
The ID3Compute entropy once for the node, and pass it along to avoid multiple computations like we do now.
Fields inherited from class id3.TDDTInducer |
allOrNothing,
confidence,
error,
evidenceProjection,
frequencyCounts,
KLdistance,
laplaceCorrection,
linear,
logLoss,
lossConfidence,
lossLaplace,
MSE,
none,
penalty |
Constructor Summary | |
ID3Inducer(ID3Inducer source)
Copy Constructor. |
|
ID3Inducer(java.lang.String dscr)
Constructor. |
|
ID3Inducer(java.lang.String dscr,
CGraph aCgraph)
Constructor. |
Method Summary | |
boolean |
all_attributes_multi_val()
Checks if all attributes are multi-valued. |
void |
best_split_info(SplitAttr[] bestSplit,
SplitAttr[] splits)
Fills in the array of SplitAttr for current subtree. |
NodeCategorizer |
best_split(java.util.LinkedList catNames)
Returns the AttrCategorizer that splits on the best attribute found using mutual information(information gain). |
int |
class_id()
Deprecated. This method should be replaced with Java's instanceof operator. |
Inducer |
copy()
Returns the reference to the copy of ID3Inducer with the same settings. |
TDDTInducer |
create_subinducer(java.lang.String descr,
CGraph aCgraph)
Create an Inducer for recursive calls. |
boolean |
find_splits(SplitAttr[] bestSplit,
SplitAttr[] splits)
Fills in the array of splits for current subtree. |
boolean |
multi_val_attribute(int attrNum)
Return true if the attribute has many values according to the C4.5 definition. |
void |
pick_best_split(SplitAttr[] bestSplit,
SplitAttr[] splits,
StatData allMutualInfo,
StatData allNonMultiValMutualInfo)
Choose the best attribute to split the on from all possible splits. |
void |
split_info(int attrNum,
SplitAttr split)
Compute the split information for a given attribute. |
void |
split_info(int attrNum,
SplitAttr split,
RealAndLabelColumn[] realColumns)
Compute the split information for a given attribute. |
NodeCategorizer |
split_to_cat(SplitAttr split,
java.util.LinkedList catNames)
Build categorizer for the given attribute. |
Methods inherited from class java.lang.Object |
clone,
equals,
finalize,
getClass,
hashCode,
notify,
notifyAll,
toString,
wait,
wait,
wait |
Constructor Detail |
public ID3Inducer(java.lang.String dscr, CGraph aCgraph)
dscr
- The description of this inducer.aCgraph
- A previously developed Cgraph.public ID3Inducer(java.lang.String dscr)
dscr
- The description of this inducer.public ID3Inducer(ID3Inducer source)
source
- The original ID3Inducer that is being copied.Method Detail |
public NodeCategorizer best_split(java.util.LinkedList catNames)
catNames
- The names of the categories that each instance may be
catagorized under.public boolean find_splits(SplitAttr[] bestSplit, SplitAttr[] splits)
bestSplit
- This is an array of the best splits found during the
splitting process.splits
- This is an array of all splits found during the
splitting process.public void best_split_info(SplitAttr[] bestSplit, SplitAttr[] splits)
bestSplit
- This is an array of the best splits found during the
splitting process.splits
- This is an array of all splits found during the
splitting process.public boolean multi_val_attribute(int attrNum)
attrNum
- The number of the attribute being checked.public void pick_best_split(SplitAttr[] bestSplit, SplitAttr[] splits, StatData allMutualInfo, StatData allNonMultiValMutualInfo)
bestSplit
- The array of the best splits found during splitting
process.splits
- The array of all splits found during the splitting
process.allMutualInfo
- Statistical information about all instances.allNonMultiValMutualInfo
- Statistical information about instances
where an attribute can only have one
value at a time.public boolean all_attributes_multi_val()
public void split_info(int attrNum, SplitAttr split)
attrNum
- The index number of this attribute column.split
- The attribute to be split on.public void split_info(int attrNum, SplitAttr split, RealAndLabelColumn[] realColumns)
attrNum
- The index number of this attribute column.split
- The attribute to be split on.realColumns
- The columns of values specified for each attribute over
all instances in a data set.public TDDTInducer create_subinducer(java.lang.String descr, CGraph aCgraph)
descr
- The description of the new subinducer.aCgraph
- A previously defined Cgraph for the inducer.public NodeCategorizer split_to_cat(SplitAttr split, java.util.LinkedList catNames)
split
- The attribute to be split on.catNames
- The category names that an instance may be categorized
under.public Inducer copy()
public int class_id()
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: INNER | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |