• Format: ms-word (doc)
  • Pages: 65
  • Chapter 1 to 5
  • With abstract reference and questionnaire
  • Preview abstract and chapter 1 below

 5,000

ABSTRACT

 

In this project, we shall implement the hierarchical clustering algorithm and apply it to various data sets such as the weather data set, the student data set, and the patient data set. We shall then reduce these datasets using the following dimensionality reduction approaches: Random Projections (RP), Principal Component Analysis (PCA), Variance (Var), the New Random Approach (NRA), the Combined Approach (CA) and the Direct Approach (DA).
The rand index and ARI will be implemented to measure the extent to which a given dimensionality reduction method preserves the hierarchical clustering of a data set. Finally, the six reduction methods will be compared by runtime, inter-point distance preservation, variance preservation and hierarchical clustering preservation of the original data set.

 

TABLE OF CONTENTS

 

DECLARATION ……………………………………………………………………………………………………. i
ABSTRACT ………………………………………………………………………………………………………….. ii
ACKNOWLEDGEMENT …………………………………………………………………………………….. iii
DEDICATION ……………………………………………………………………………………………………… iv
LIST OF FIGURES ………………………………………………………………………………………………. vi
LIST OF TABLES ……………………………………………………………………………………………….. vii
1 INTRODUCTION ………………………………………………………………………………………….. 1
2 HIERARCHICAL CLUSTERING ……………………………………………………………………. 2
1.1 SNIPPET OF CLUSTERED DATA ……………………………………………………………….. 2
3 DIMENSIONALITY REDUCTION TECHNIQUES ………………………………………….. 4
3.1.1 RANDOM PROJECTIONS (RP) …………………………………………………………. 4
3.1.2 PRINCIPAL COMPONENT ANALYSIS (PCA) …………………………………… 4
3.1.3 NEW RANDOM APPROCAH…………………………………………………………….. 5
3.1.4 VARIANCE ………………………………………………………………………………………. 6
3.1.5 COMBINED APPROACH ………………………………………………………………….. 6
3.1.6 DIRECT APPROACH ………………………………………………………………………… 7
4 IMPLEMENTATION ……………………………………………………………………………………… 9
4.1.1 RANDOM PROJECTION (RP) …………………………………………………………. 10
4.1.2 PRINCIPAL COMPONENT ANALYSIS (PCA) …………………………………. 11
4.1.3 NEW RANDOM APPROACH…………………………………………………………… 12
4.1.4 VARIANCE …………………………………………………………………………………….. 13
4.1.5 DIRECT APPROACH ………………………………………………………………………. 14
4.1.6 COMBINED APPROACH ………………………………………………………………… 15
5 RAND INDEX ……………………………………………………………………………………………… 16
6 CONCLUSION …………………………………………………………………………………………….. 17
7 REFERENCES ……………………………………………………………………………………………… 18
8 Appendix A MATLAB CODES USED FOR IMPLEMENTATION …………………… 19

 

CHAPTER ONE

 

INTRODUCTION
Given a data set containing n points in high dimensional space, it is often helpful if it can be projected onto a lower dimensional space without suffering great distortion. This process is called dimensionality reduction. Essentially, dimensionality reduction reduces the number of variables to be considered in a way that the relevant data is retained while reducing the amount of the data.
Dimensionality reduction helps to reduce the runtime of algorithms whose runtime depends on the dimensions of the working space. It also broadens the scope for the choice of method for data processing. It provides complexity control which avoids overfitting of the training data.
Dimensionality can be applied in several domains which include text data, image data, nearest neighbor search and in the domain of clustering and classification. Clustering is the assignment of a set of observations into subsets (called clusters) so that observations in the same cluster are similar in some sense. Clustering is a method of unsupervised learning. Classification, on the other hand, is a method of supervised learning. The task of the supervised learner is to predict the value of the function for any valid input after having seen a number of training examples (i.e. pair of input and target output). As mentioned above, this project focuses on the categorization of data using hierarchical clustering.

GET THE COMPLETE PROJECT»

Do you need help? Talk to us right now: (+234) 08060082010, 08107932631 (Call/WhatsApp). Email: [email protected].

IF YOU CAN'T FIND YOUR TOPIC, CLICK HERE TO HIRE A WRITER»

Disclaimer: This PDF Material Content is Developed by the copyright owner to Serve as a RESEARCH GUIDE for Students to Conduct Academic Research.

You are allowed to use the original PDF Research Material Guide you will receive in the following ways:

1. As a source for additional understanding of the project topic.

2. As a source for ideas for you own academic research work (if properly referenced).

3. For PROPER paraphrasing ( see your school definition of plagiarism and acceptable paraphrase).

4. Direct citing ( if referenced properly).

Thank you so much for your respect for the authors copyright.

Do you need help? Talk to us right now: (+234) 08060082010, 08107932631 (Call/WhatsApp). Email: [email protected].

//
Welcome! My name is Damaris I am online and ready to help you via WhatsApp chat. Let me know if you need my assistance.