Date: May 18, 2024
Topic: Learning in Decision Trees
Recall
When creating splits for decision trees, we want to find the best attribute at each node to split the data evenly based on the eventual outcomes
Notes
20 Questions
When playing 20 questions (e.g., if the answer is Michael Jackson):
- Usually start by asking a broad question (Is it an animal?)
- Then after a few guesses, we can start asking specific questions (Is the person’s name Michael?)
Applying to Decision Trees
We can follow the same principles when trying to build a decision tree
- Pick the best attribute (e.g., best may split the data into half)
- Ask a question
- Follow the answer path
- Repeat the above 3 steps until we get an answer
Making the best split

- Based on the available data, the tree with the best split would be the last tree
- The second tree doesn’t split anything so it is not good
- The first tree, while it splits the data, results in subsets that are still equal in distribution, making it bad as well
Decision trees can also be used to express logic functions like AND, OR, XOR
Expressiveness in Decision Trees
Boolean AND (A AND B)
Decision Trees can be used to represent AND function (e.g., A AND B)

- For the AND function, we se that it is possible to swap attributes A and B while still achieving the same result
Boolean OR (A OR B)

- The Boolean OR function can also be represented, and like above, the attributes can be swapped
Boolean XOR (A XOR B)

- The XOR function actually represents the entire truth table (can just replace the outputs)
- For example, if we want to model OR, the left sub-tree’s output will always be +
- The above OR tree is a simplified representation since the left sub-tree’s output is always the same
- The attributes can also be swapped for XOR
ANY is an example of a simple problem, where we have a linear relationship between the attributes and nodes
<aside>
📌 SUMMARY: The best splits in decision trees should separate the data cleanly based on their eventual outcomes. Decision trees can also grow very deep, so we need a good method to search such trees
</aside>