跳转至

Lecture5: Learning 1

Machine Learning

  • How to build computers that:

    • (automatically) imporve their performance (\(P\))
    • at some task \((T)\)
    • with experience \((E)\)
  • Statistical Learning

  • Neural Learning

Framework

An Example: Spam Filtering

Task: binary classification

Feature extraction: by hand or learn features automatically.

Components of Learning

  • Input: \(x \in R^d\)
  • Output: \(y = \{0, 1\}\)
  • Target function: \(f: \mathcal{X} \to \mathcal{Y}\)
  • Data: \((x_1,y_1) \cdots (x_N,y_N)\)
  • Hypothesis: "Possible function to be used for prediction"

Hypothesis Space

  • Includes only those functions that have desired regularity.
    • Continuity
    • Smoothness
    • Simplicity

linear !

Loss Function

L2 Loss: Regression

\[ l(y,h(x)) = (y - h(x))^2 \]

Classification Loss:

\[ l(y,h(x)) = 1[y \neq h(x)] \]

The canonical training procedure of machine learning:

\[ \hat{\epsilon}(h) = \min_{\theta} \sum_{ i =1}^m l(h_\theta(x_i), y_i) \]
  • The main questions
    • What is the hypothesis function?
    • What is the loss function?
    • How do we solve the training problem?

Maching Learning Scenarios

  • Under fitting
  • Approximate fitting
  • Over fitting

Reinforcement Learning

Evaluation Metrics

Classicication

  • Precision \(\frac{TP}{TP + FP}\) : How many selected items are relevant
    • "查准率"
    • Search Engine
  • Recall \(\frac{TP}{TP + FN}\): How many relevant items are selected
    • "查全率"
  • Accuaracy \(\frac{TP + TN}{TP + TN + FP + FN}\): How many items are hit

F_1-score:

\[ \frac{1}{F} = \frac{1}{2} (\frac{1}{p} + \frac{1}{R}) \]

Confusion Matrix

from sklearn.metrics import confusion_matrix

ROC (Receiver Operating Characteristic)

AUC (Area Under the Curve)

Regression Error

  • Squared Error: MSE
  • Absolute Error: MAE

Model Selection

All models are wrong, but some are useful. -G.E.P Box

Training and Test Data

The learning algorithms should never ever have access to test data

Cross Validation

  • 5-fold cross validation

Model Selection: Whole Procedure

  • Combined Algorithm Selection & Hyperparameter optimization problem

AutoML !

The complexity is too high ?

Tuner!!!!

K-Nearest Neighbors (KNN)

Geometric View

  • Assumption: Closer points in feature space have similar semantics

The Effect of K

  • Increasing \(k\) simplifies the decision boundary
    • Smooth

No parameter to learn in KNN, it is a non-parameter model.

Feature Normalization

  • Z-score normalization

Linear Regression

What Model to Choose

  • Exploratory Data Analysis to ease model selection

Hypothesis Space

Loss Function

  • L2 Loss

Optimization

  • For general differentiable loss function, use Gradient Descent (GD)
  • SGD : minibatch

The complexity of computing analytic Solution is too high.

Normalization

Basis Function

  • Polynomial basis functions
  • Radial basis functions (RBF)

Regularization

Question

Overfitting

Solution: Control norm of \(w\)

L2-Regularization

\[ penalty = \lambda ||w||_2^2 \]
  • Smooth Solution

L1-Regularization

\[ penalty = \lambda ||w||_1\]
  • Sparse Solution