Date: September 10, 2025

Topic: Overview of Optimization

Recall

Depth is important as it helps the network pick out discriminative features and also acts as a dimensionality reduction technique as we go from high-dimensions to low ones.

This allows the network to recognize increasingly abstract features for classification

Many other design decisions also exist for deep learning

Notes

Overview

Neural Network Depth

Structure the model to present an inherently compositional world
Theoretical evidence shows adding depth leads to parameter efficiency
- To represent some structure in a 2-layered network requires much more parameters than a 3-layered one
Allows for gentle dimensionality reduction
- We often start with very high dimensional data (images/text)
- Want to gently reduce dimensionality to pick out more and more abstract capable features that are discriminative for the target classes

Other Design Decisions

Architecture
Data considerations
Training and optimization
Machine learning considerations

Different tasks may be more suited for different architectures

Architecture

What modules/layers should we use (architecture for CV and NLP may be different)
How should these layers be connected?
- Connecting properly leads to proper gradient flow from back-propagation
Can we use domain knowledge to add architectural biases?
- The better these biases mirror the reality/structure of data, the easier it is to learn

Data is an important part of deep learning

Data Considerations

Like in traditional machine learning, data is key
Should we pre-process data?
Should we normalize data?
Can we augment our data by adding noise or perturbations?

Optimization, initialization, regularization and selection of loss function plays a part in ensuring effective training

Optimization Considerations

Even with a good architecture, we still need a good optimization algo to find good weights
What optimizer should we use?
- Different optimizers make different weight updates depending on the gradients
How should we initialize the weights?
- If we initialize well, then we may not need to have many descent steps
- If we optimize poorly, learning becomes much more difficult
What regularizes should we use?
What loss function is appropriate?

<aside> 📌 SUMMARY: For the particular application of deep learning, we need to trade off all of the considerations together. Have to trade-off between model capacity (e.g., num. parameters) and amount of data. Add appropriate biases based on domain knowledge.

</aside>

Date: September 10, 2025

Topic: Overview of Optimization

Recall

Notes

Overview

Neural Network Depth

Other Design Decisions

Architecture

Data Considerations

Optimization Considerations

Date: September 10, 2025

Topic: Architectural Considerations