Date: September 10, 2025
Topic: Overview of Optimization
Recall
Depth is important as it helps the network pick out discriminative features and also acts as a dimensionality reduction technique as we go from high-dimensions to low ones.
This allows the network to recognize increasingly abstract features for classification
Many other design decisions also exist for deep learning
Notes
Overview
Neural Network Depth
- Structure the model to present an inherently compositional world
- Theoretical evidence shows adding depth leads to parameter efficiency
- To represent some structure in a 2-layered network requires much more parameters than a 3-layered one
- Allows for gentle dimensionality reduction
- We often start with very high dimensional data (images/text)
- Want to gently reduce dimensionality to pick out more and more abstract capable features that are discriminative for the target classes
Other Design Decisions
- Architecture
- Data considerations
- Training and optimization
- Machine learning considerations
Different tasks may be more suited for different architectures
Architecture

- What modules/layers should we use (architecture for CV and NLP may be different)
- How should these layers be connected?
- Connecting properly leads to proper gradient flow from back-propagation
- Can we use domain knowledge to add architectural biases?
- The better these biases mirror the reality/structure of data, the easier it is to learn
Data is an important part of deep learning
Data Considerations
- Like in traditional machine learning, data is key
- Should we pre-process data?
- Should we normalize data?
- Can we augment our data by adding noise or perturbations?
Optimization, initialization, regularization and selection of loss function plays a part in ensuring effective training
Optimization Considerations
- Even with a good architecture, we still need a good optimization algo to find good weights
- What optimizer should we use?
- Different optimizers make different weight updates depending on the gradients
- How should we initialize the weights?
- If we initialize well, then we may not need to have many descent steps
- If we optimize poorly, learning becomes much more difficult
- What regularizes should we use?
- What loss function is appropriate?
<aside>
📌 SUMMARY:
For the particular application of deep learning, we need to trade off all of the considerations together.
Have to trade-off between model capacity (e.g., num. parameters) and amount of data.
Add appropriate biases based on domain knowledge.
</aside>
Date: September 10, 2025
Topic: Architectural Considerations