Date: May 19, 2024
Topic: ID3 Algorithm
Recall
Notes
ID3 Algorithm
LOOP:
select_best_attribute_A()
assign_A_as_decision_attribute_for_node()
for each_value_of_A:
create_descendent_node()
sort_training_examples_to_leaves()
if examples_perfectly_created:
break
else:
iterate_over_leaves()
Entropy measures randomness in a dataset
The best attribute for each node maximizes the Entropy Gain
Selecting the best attribute (Information Gain)
- The amount of information gained by picking a particular attribute
- We seek to reduce randomness in the data by knowing the value of a particular attribute
Formula
Information Gain:
$\text{Gain}(S,A) = \text{Entropy}(S)-\sum_V\frac{|S_v|}{|S|}\cdot\text{Entropy}(S_V)$
- $S$ is the collection of training examples
- $A$ is the particular attribute
- For a specific value $S_v$, we get an entropy value as well
- Info Gain = Entropy(Parent) - Entropy(After Split)
Entropy
- Measure of randomness.
- Maximum entropy occurs when the outcome is equally distributed (e.g., flipping a coin that can be heads or tails))
- Minimum entropy occurs when we know the outcome (e.g., flipping a loaded coin that is always heads)
- If we have objects that are more common in one label than another, then entropy decreases
- A bag with 50 red and 50 green marbles has higher entropy
- A bag with 90 red and 10 green marbles has lower entropy
<aside>
💡 In order to choose the best attribute, we want one to maximize the Information Gain
</aside>
For a decision tree to perform well, it has to consider inductive biases in its algorithm
Inductive Bias
- Assumptions or prior knowledge that an ML algorithm uses to learn and generalize from training data
- Consists of restriction bias and preference bias
Restriction Bias ($H$)
- Hypotheses set that we care about (e.g., for lin reg, we only consider all linear functions)
- For decision trees, this includes all possible decision trees
Preference Bias ($n\in H$)
- What source of hypothesis from the hypotheses set we prefer
Applying inductive bias to decision trees
- Tend to produce trees with good splits at the top
- Correct over incorrect splits
- Prefer shorter trees to longer trees (due to good splits at top)
<aside>
📌 SUMMARY: When building a decision tree, we want to obtain the best splits for the attributes and this can be done by maximizing entropy gain
</aside>