A Review Of Dimensionality Reduction Methods And Their Applications
ABSTRACT
In the world we live in today, the reduction in data generally has seen a great rise. This is because of the numerous advantages that comes with working with smaller efficient data instead of the original large dataset. With this analogy, we can adopt Dimensionality reduction in computer science emphasizing on reducing computer memory in order to have more storage capacity on a computer. An example of this would be to reduce digital images which are then stored in 2D matrices.
Dimensionality reduction is a process where by given a collection of data points in a high dimensional Euclidean space, it is often helpful to be able to project it into a lower dimensional Euclidean space without suffering great distortion. The result obtained by working in the lower dimensional space becomes a good approximation to the original dataset obtained by working in the high dimensional space.
Dimensionality Reduction has two categories:
In the first category includes those in which each attribute in the reduced set is a linear combination of the attributes in the original dataset. These include RP and PCA. While the second category includes those in which the set of attributes in the reduced set is a proper subset of the attributes in the original dataset. These include all the other six techniques I implemented such as New Random Approach, Variance Approach, The first Novel Approach, The second Novel Approach, The Third Novel Approach and the LSA-Transform Approach.
Also, I compared these techniques mentioned above by how they preserve their images. Furthermore, I looked at the various applications we can use Dimensionality reduction example include:
TABLE OF CONTENTS
DECLARATION ………………………………………………………………………………………………………………………. I
ACKNOWLEDGEMENT ……………………………………………………………………………………………………….. 2
ABSTRACT ……………………………………………………………………………………………………………………………….. 3
CHAPTER 1 …………………………………………………………………………………………………………………………….. 7
1.1 PROJECT OBECTIVE ……………………………………………………………………………………………………….. 7
1.2 BACKGROUND …………………………………………………………………………………………………………………… 8
1.3 ADVANTAGES ……………………………………………………………………………………………………………………. 8
CHAPTER 2 ……………………………………………………………………………………………………………………………. 10
2.0 DIMENSIONALITY REDUCTION TECHNIQUES ………………………………………………………………………… 10
2.1 RANDOM PROJECTION (RP) …………………………………………………………………………………………. 10
2.2 PRINCIPAL COMPONENT ANALYSIS ……………………………………………………………………………………… 11
2.3 NEW RANDOM APPROACH (NRA) ……………………………………………………………………………….. 12
2.4 SINGULAR VALUE DECOMPOSITION (SVD) ………………………………………………………………………….. 13
2.5 VARIANCE APPROACH …………………………………………………………………………………………………… 15
2.6 LATENT SEMANTIC ANALYSIS (LSA)-TRANSFORM ………………………………………………………………. 15
2.7 FIRST NOVEL APPROACH ……………………………………………………………………………………………………. 17
2.8 SECOND NOVEL APPROACH …………………………………………………………………………………………………. 18
2.9 THIRD NOVEL APPROACH …………………………………………………………………………………………………… 19
CHAPTER 3 ……………………………………………………………………………………………………………………………. 20
3.0 APPLICATIONS ……………………………………………………………………………………………………………….. 20
3.1 TEXT DATA ……………………………………………………………………………………………………………………… 20
3.2NEAREST NEIGHBOR SEARCH ………………………………………………………………………………………. 21
3.3 SIMILARITY SEARCH IN A TIME SERIES ………………………………………………………………………. 22
3.4 CLUSTERING …………………………………………………………………………………………………………………… 22
3.5 CLASSIFICATION ……………………………………………………………………………………………………………. 23
3.6 K-NEARSEST NEIGHBOR ……………………………………………………………………………………………….. 23
CHAPTER 4 ……………………………………………………………………………………………………………………………. 26
4.0 IMPLEMENTATION AND RESULTS OF DIMENSIONALITY REDUCTION ON IMAGES 26
4.1REDUCTION WITH PRINCIPAL COMPONENT ANALYSIS …………………………………………………………… 26
4.2REDUCTION WITH RANDOM PROJECTION …………………………………………………………………………….. 27
4.3REDUCTION WITH THE NEW RANDOM APPROACH. ……………………………………………………………….. 28
4.4REDUCTION WITH VARIANCE……………………………………………………………………………………………….. 29
4.5REDUCTION WITH THE FIRST NOVEL APPROACH. …………………………………………………………………. 30
CHAPTER ONE
1.1 PROJECT OBECTIVE
This project is mainly a survey on dimensionality reduction discussing different motives why we might want to reduce the dimensionality of a dataset. Outlining various works done, methods used and finally their applications in different domains of life. This project goes further in depth to look at different dimensionality reduction methods and ways in which we can implement a few of them. Finally, this project goes further to compare these techniques to the extent in which they preserve images and outlines the various applications in random projection.
8
1.2 BACKGROUND
Assume a data set D contains n points in a high dimensional space, this can be mapped out onto a lower dimensional space with minimal distortion. (see Nsang, Novel Approaches to Dimensionality Reduction and Applications). For example, a data set with 30,000 columns will be difficult to inspect. Evidently, it will of great assistance to obtain 1,500 columns, which will make it a lot easier to analyze the result obtained compared to the original data set with 30,000 columns. Such that after conducting an analysis of the dataset, the result obtained is a good approximation when put in contrast to the result obtained by analyzing the original data set.
1.3 ADVANTAGES
Some advantages of reduction of dimensionality d, of a given set of n points will include:
1. Dimensionality reduction, will act as a catalyst in speeding up a given algorithm whose runtime depends exponentially on the dimension of the working space. For instance, if the dimensionality of the data set, d, is too large, a complex control system will be needed to avoid over fitting of the training data in machine learning.
2. “High dimensionality data set may hinder variation of available choices of the data processing methods” (see Nsang, Novel Approaches to Dimensionality Reduction and Applications). These include image data, clustering and analysis of text files etc. In continuation to the given examples, dimensionality tends to be
9
large due to variety in various products, wide range of phraseology, or the large image window.
3. Data sets in high dimensions have a tendency to displace sporadically. Thus, an algorithm will take a long time find any structure in the given data set.
4. Dimensionality reduction carries along the noise and other irrelevant specs of the image reduced. It is due to the variety in high dimensional data set.
5. Dimensionality reduction helps in making visualization of the data easier when its reduced to low dimensions such as 2D or 3D.
6. Finally, dimensionality reduction helps us to conserve time and most importantly memory space.
Even though various expensive programmed methods can produce the similar models from the same type of high-dimensional datasets, reductions of dimensionality is still recommended as the initial process before any modeling of the data.
IF YOU CAN'T FIND YOUR TOPIC, CLICK HERE TO HIRE A WRITER»