ID3, C4.5 and CHAID

CHAID - Chi-Squared Automatic Interaction Detection

C4.5 and CHAID Algorithm

Top down construction of decision tree by recursively selecting the "best attribute" to use at the current node, based on the training data
It can only deal with nominal data
It is not robust in dealing with noisy data sets
It overfits the tree to the training data
It creates unnessarily complex trees without pruning
It does not handle missing data values well

It is one of the oldest tree classification methods originally proposed by Kass in 1980
The first step is to create categorical predictors out of any continuous predictors by dividing the respective continuous distributions into a number of categories with an approximately equal number of observations
The next step is to cycle through the predictors to determine for each predictor the pair of (predictor) categories that is least significantly different with respect to the dependent variable
The next step is to choose the split the predictor variable with the smallest adjusted p-value, i.e., the predictor variable that will yield the most significant split
Continue this process until no further splits can be performed

Dividing the cases that reach a certain node in the tree

Cross tabulate the response variable (target) with each of the explanatory variables
When there are more than two columns, find the best subtable formed by combining column categories
- This is applied to each table with more than 2 columns
- Compute Pearson X^2^ tests for independence for each allowable subtable
- Look for the smallest X^2^ value. If it is not significant, combine the column categories
- Repeat step 2 if the new table has more than two columns
Allows categories combined at step 2 to be broken apart
- For each compound category consisting of at least 3 of the original categories, find the "most signifcant" binary split
- If X^2^ is significant, implement the split and return to step 2
- Otherwise retain the compound categories for this variable, and move on to the next variable
You have now completed the "optimal" combining of categories for each explanatory variable
- Find the most significant of these "optimally" merged explanatory variables
- Compute a "Bonferroni" adjusted chi-squared test of independence for the reduced table for each explanatory variable
Use the "most significant" variable in step 4 to split the node with respect to the merged categories for that variable
- repeat steps 1-5 for each of the offspring nodes
- Stop if
  - no variable is significant in step 4
  - the number of cases reaching a node is below a specified limit