Dimensionality Reduction

1 minute read

image-left

Background

Dimensional reduction is a common unsupervised learning approach. It reduces the number of features (input variables) to a manageable size, whilst ensuring that the remaining data is meaningful. Sometimes the number of input variables is too high or adds little value to the model. Further, more features make the modelling more complex - sometimes unncessarily complex. Hence, dimensional reduction addresses this issue.

Main

Dimensional reduction is commonly used during the stage of preprocessing data stage. The following are examples of dimensional reduction methods (found e.g. via sciKit-learn, see link below) are:

  1. Decomposition algorithms
    • Principal Component Analysis
    • Kernel Principal Component Analysis
    • Non-Negative Matrix Factorization
    • Singular Value Decomposition
  2. Manifold learning algorithms
    • t-Distributed Stochastic Neighbor Embedding
    • Spectral Embedding
    • Locally Linear Embedding
  3. Discriminant Analysis
    • Linear Discriminant Analysis

Autoencoders are a type of unsupervised neural network which compresses input data to lower dimension before reconstructing the input back.

Advantages and Disadvatanges

Some advantages of dimensionality reduction include:

  • Reduces the amount of data required and hence storage space.
  • It may produce a more efficient learning algorithm model.
  • It may remove “surplus” information.
  • It tackles the “curse of dimensionality”

Disadvantages of dimensionality reduction:

  • It may reduce the overall accuracy of the model produced.

Further reading / Links