Dimensionality Reduction

1 minute read

image-left

Background

Dimensional reduction is a common unsupervised learning approach. It reduces the number of features (input variables) to a manageable size, whilst ensuring that the remaining data is meaningful. Sometimes the number of input variables is too high or adds little value to the model. Further, more features make the modelling more complex - sometimes unncessarily complex. Hence, dimensional reduction addresses this issue.

Main

Dimensional reduction is commonly used during the stage of preprocessing data stage. The following are examples of dimensional reduction methods (found e.g. via sciKit-learn, see link below) are:

Decomposition algorithms
- Principal Component Analysis
- Kernel Principal Component Analysis
- Non-Negative Matrix Factorization
- Singular Value Decomposition
Manifold learning algorithms
- t-Distributed Stochastic Neighbor Embedding
- Spectral Embedding
- Locally Linear Embedding
Discriminant Analysis
- Linear Discriminant Analysis

Autoencoders are a type of unsupervised neural network which compresses input data to lower dimension before reconstructing the input back.

Advantages and Disadvatanges

Some advantages of dimensionality reduction include:

Reduces the amount of data required and hence storage space.
It may produce a more efficient learning algorithm model.
It may remove “surplus” information.
It tackles the “curse of dimensionality”

Disadvantages of dimensionality reduction:

It may reduce the overall accuracy of the model produced.

Dimensionality Reduction

Background

Main

Advantages and Disadvatanges

Further reading / Links

You May Also Enjoy

5 Visualisation lessons from 2024

Crowd pleaser: Ensemble modelling for accuracy and explainability

Learning about lapse: Using AI to boost lapse prediction accuracy

On the write track: using machine learning to predict underwriting decisions