• Format: ms-word (doc)
  • Pages: 65
  • Chapter 1 to 5
  • With abstract reference and questionnaire
  • Preview abstract and chapter 1 below

 5,000

somdn_product_page

ABSTRACT

 

In this project, we shall implement the hierarchical clustering algorithm and apply it to various data sets such as the weather data set, the student data set, and the patient data set. We shall then reduce these datasets using the following dimensionality reduction approaches: Random Projections (RP), Principal Component Analysis (PCA), Variance (Var), the New Random Approach (NRA), the Combined Approach (CA) and the Direct Approach (DA).
The rand index and ARI will be implemented to measure the extent to which a given dimensionality reduction method preserves the hierarchical clustering of a data set. Finally, the six reduction methods will be compared by runtime, inter-point distance preservation, variance preservation and hierarchical clustering preservation of the original data set.

 

TABLE OF CONTENTS

 

DECLARATION ……………………………………………………………………………………………………. i
ABSTRACT ………………………………………………………………………………………………………….. ii
ACKNOWLEDGEMENT …………………………………………………………………………………….. iii
DEDICATION ……………………………………………………………………………………………………… iv
LIST OF FIGURES ………………………………………………………………………………………………. vi
LIST OF TABLES ……………………………………………………………………………………………….. vii
1 INTRODUCTION ………………………………………………………………………………………….. 1
2 HIERARCHICAL CLUSTERING ……………………………………………………………………. 2
1.1 SNIPPET OF CLUSTERED DATA ……………………………………………………………….. 2
3 DIMENSIONALITY REDUCTION TECHNIQUES ………………………………………….. 4
3.1.1 RANDOM PROJECTIONS (RP) …………………………………………………………. 4
3.1.2 PRINCIPAL COMPONENT ANALYSIS (PCA) …………………………………… 4
3.1.3 NEW RANDOM APPROCAH…………………………………………………………….. 5
3.1.4 VARIANCE ………………………………………………………………………………………. 6
3.1.5 COMBINED APPROACH ………………………………………………………………….. 6
3.1.6 DIRECT APPROACH ………………………………………………………………………… 7
4 IMPLEMENTATION ……………………………………………………………………………………… 9
4.1.1 RANDOM PROJECTION (RP) …………………………………………………………. 10
4.1.2 PRINCIPAL COMPONENT ANALYSIS (PCA) …………………………………. 11
4.1.3 NEW RANDOM APPROACH…………………………………………………………… 12
4.1.4 VARIANCE …………………………………………………………………………………….. 13
4.1.5 DIRECT APPROACH ………………………………………………………………………. 14
4.1.6 COMBINED APPROACH ………………………………………………………………… 15
5 RAND INDEX ……………………………………………………………………………………………… 16
6 CONCLUSION …………………………………………………………………………………………….. 17
7 REFERENCES ……………………………………………………………………………………………… 18
8 Appendix A MATLAB CODES USED FOR IMPLEMENTATION …………………… 19

 

CHAPTER ONE

 

INTRODUCTION
Given a data set containing n points in high dimensional space, it is often helpful if it can be projected onto a lower dimensional space without suffering great distortion. This process is called dimensionality reduction. Essentially, dimensionality reduction reduces the number of variables to be considered in a way that the relevant data is retained while reducing the amount of the data.
Dimensionality reduction helps to reduce the runtime of algorithms whose runtime depends on the dimensions of the working space. It also broadens the scope for the choice of method for data processing. It provides complexity control which avoids overfitting of the training data.
Dimensionality can be applied in several domains which include text data, image data, nearest neighbor search and in the domain of clustering and classification. Clustering is the assignment of a set of observations into subsets (called clusters) so that observations in the same cluster are similar in some sense. Clustering is a method of unsupervised learning. Classification, on the other hand, is a method of supervised learning. The task of the supervised learner is to predict the value of the function for any valid input after having seen a number of training examples (i.e. pair of input and target output). As mentioned above, this project focuses on the categorization of data using hierarchical clustering.

DOWNLOAD COMPLETE WORK
DISCLAIMER:
  • For Reference Only: Materials are for research, citation, and idea generation purposes and not for submission as your original final year project work.
  • Avoid Plagiarism: Do not copy or submit this content as your own project. Doing so may result in academic consequences.
  • Use as a Framework: This complete project research material should guide the development of your own final year project work.
  • Academic Access: This platform is designed to reduce the stress of visiting school libraries by providing easy access to research materials.
  • Institutional Support: Tertiary institutions encourage the review of previous academic works such as journals and theses.
  • Open Education: The site is maintained through paid subscriptions to continue offering open access educational resources.
//
Welcome! My name is Damaris I am online and ready to help you via WhatsApp chat. Let me know if you need my assistance.