Super Machine Learning Revision Notes

涉及到机器学习中的基本概念、不同的算法以及主流的模型。

Activation Functions
Gradient Descent
- Computation Graph
- Backpropagation
- Gradients for L2 Regularization (weight decay)
- Vanishing/Exploding Gradients
- Mini-Batch Gradient Descent
- Stochastic Gradient Descent
- Choosing Mini-Batch Size
- Gradient Descent with Momentum (always faster than SGD)
- Gradient Descent with RMSprop
- Adam (put Momentum and RMSprop together)
- Learning Rate Decay Methods
- Batch Normalization
Parameters
- Learnable and Hyper Parameters
- Parameters Initialization
- Hyper Parameter Tuning
Regularization
- L2 Regularization (weight decay)
- L1 Regularization
- Dropout (inverted dropout)
- Early Stopping
Models
- Logistic Regression
- Multi-Class Classification (Softmax Regression)
- Transfer Learning
- Multi-Task Learning
- Convolutional Neural Network (CNN)
  - Filter/Kernel
  - Stride
  - Padding (valid and same convolutions)
  - A Convolutional Layer
  - 1*1 Convolution
  - Pooling Layer (Max and Average Pooling)
  - LeNet-5
  - AlexNet
  - VGG-16
  - ResNet (More Advanced and Powerful)
  - Inception Network
  - Object Detection
    - Classification with Localisation
    - Landmark Detection
    - Sliding Windows Detection Algorithm
    - Region Proposal (R-CNN)
    - YOLO Algorithm
      - Bounding Box Predictions (Basics of YOLO)
      - Intersection Over Union
      - Non-max Suppression
    - Anchor Boxes
  - Face Verification
    - One-Shot Learning (Learning a “similarity” function)
      - Siamese Network
      - Triplet Loss
    - Face Recognition/Verification and Binary Classification
  - Neural Style Transfer
  - 1D and 3D Convolution Generalisations
- Sequence Models
  - Recurrent Neural Network Model
  - Gated Recurrent Unit (GRU)
    - GRU (Simplified)
    - GRU (Full)
  - Long Short Term Memory (LSTM)
  - Bidirectional RNN
  - Deep RNN Example
  - Word Embedding
    - One-Hot
    - Embedding Matrix
    - Learning Word Embedding
    - Word2Vec & Skip-gram
    - Negative Sampling
    - GloVe Vector
    - Deep Contextualized Word Representations (ELMo, Embeddings from Language Models)
  - Sequence to Sequence Model Example: Translation
    - Pick the most likely sentence (Beam Search)
      - Beam Search
      - Length Normalisation
      - Error Analysis in Beam Search (heuristic search algorithm)
    - Bleu Score
    - Combined Bleu
    - Attention Model
- Transformer (Attention Is All You Need)
- Bidirectional Encoder Representations from Transformers (BERT)
Practical Tips
- Train/Dev/Test Dataset
- Over/UnderFitting, Bias/Variance, Comparing to Human-Level Performance, Solutions
- Mismatched Data Distribution
- Input Normalization
- Use a Single Number Model Evaluation Metric
- Error Analysis (Prioritize Next Steps)

网页链接：
https://createmomo.github.io/2018/01/23/Super-Machine-Learning-Revision-Notes