Date: May 19, 2024
Topic: Other Considerations
Recall
When using continuous attributes, it is possible to reuse the same attribute for a split down the tree
Notes
Using Continuous Attributes
- Possible to have continuous attributes like: age, weight, distance
- One split could be to find a value which best splits the outcome
- e.g., $\geq 4$ wheels = large vehicles, $<4$ wheels = small vehicles
- It is possible to repeat the same attribute split later in the tree, depending on the entropy gain
-
Only applicable for continuous attributes
-
For example, we can reuse Age down the tree if it provides the best entropy gain

We should check the tree before expanding to know how much error is there.
If the error is sufficiently low, we may consider stopping the split.
When Do We Stop?
When Everything is Classified Correctly
- Cannot completely trust the training data as noise might exist (incorrectly labelled classes)
- Hence, stopping when everything is classified correctly according to the training set might not be a good idea
- Might cause overfitting (doesn’t generalize well) - happens when a decision tree is too big
Using Cross Validation and Pruning
- Every time we expand a tree, we have a hold out validation set to calculate the error
- If the error is low enough, we can stop expanding the tree
- Can also apply pruning (cut sub-trees to arrive at decision sooner) and check how it affects the tree’s error
For regression tasks, consider the average of the output values for the prediction
Regression (Continuous Outputs)
- In the outcome, we may consider the average values of the leaves instead as the prediction
<aside>
📌 SUMMARY: Other decision tree considerations include using continuous attributes, the stopping point of building the tree and the problem of having continuous outputs (regression)
</aside>