This research work is aimed at the development of an optimal extracted feature classification scheme in
a voice recognition system using dynamic cuckoo search algorithm. This minimized error mismatch in
the recognition process and increased accuracy of recognition. Standard voice dataset was obtained from
English Language Speech Database for Speaker Recognition (ELSDSR) of the Technical University of
Denmark (DTU), processed and key features of these voice data were extracted. A dynamic Cuckoo
Search Algorithm (dCSA) was developed, which optimally classify the extracted feature vectors of the
speech signals from the voice data for the voice recognition system (using the dataset obtained from
ELSDSR database of the DTU. The performance of the developed Voice Recognition System (VRS)
with dCSA-based scheme was compared with that of the standard CSA-based scheme using accuracy as
performance metrics. The results of the dCSA-based classification scheme showed a recognition
accuracy of 93.18% in the VRS when compared with that of the standard CSA-based classification
scheme which records 90% accuracy. Simulation was carried out using MATLAB 2013b.
TABLE OF CONTENTS
TABLE OF CONTENTS vii
LIST OF FIGURES xv
LIST OF TABLES xiii
LIST OF APPENDICES xiv
LIST OF ABBREVIATIONS xv
CHAPTER ONE: INTRODUCTION
1.1 Background of the Research 1
1.2 Motivation 4
1.3 Significance of Research 4
1.4 Statement of Problem 5
1.5 Aim and Objectives 5
1.6 Methodology 6
1.7 Dissertation Organization 7
CHAPTER TWO: LITERATURE REVIEW
2.1 Introduction 8
2.2 Review of Fundamental Concepts 8
2.2.1 A voice 8
2.2.2 Speech production 9
2.2.3 Voice recognition system 11
22.214.171.124 Categories of voice recognition system 11
126.96.36.199 Speaker recognition 12
188.8.131.52 Processes in speaker recognition system 13
184.108.40.206 Speech signal acquisition process in ELSDSR voice database 14
220.127.116.11 Speech processing 15
2.2.4 Speech Feature extraction 16
2.2.5 Classification and feature matching 17
2.2.6 Cuckoo bird and its breeding behavior 20
2.2.7 Lѐvy flight behaviour 21
2.2.8 Cuckoo search algorithm (CSA) 22
2.2.9 Inertia weight factor 26
2.2.10 CSA based classification 27
2.2.11 Matching technique 28
2.2.12 Decision theory 29
2.2.13 Optimization test functions 29
2.3 Review of Similar Works 33
2.3.1 Review of works based on voice recognition system 33
2.3.2 Research works on the cuckoo search algorithm modification 38
CHAPTER THREE: MATERIALS AND METHODS
3.1 Introduction 42
3.2 Development of Speakers‘ Database 42
3.2.1 Obtaining standard voice dataset from ELSDSR of DTU 43
3.2.2 Recording environment 44
3.2.3 Recording equipment 45
3.2.4 Extraction of voice features 45
3.2.5 Training of speakers extracted features 46
3.3 Development of dynamic Cuckoo Search Algorithm (dCSA) 46
3.3.1 Initialization of dCSA parameters 47
3.3.2 Introduction of inertia weight factor 47
3.3.3 Generation of new solution by lévy flight and updating cuckoo position 48
3.3.4 Evaluation and comparison of solutions 49
3.3.5 Replacement of worst solutions 50
3.4 Performance Evaluation of the Algorithms (CSA and dCSA) 51
3.4.1 Visualization of the optimization test function 51
18.104.22.168 Ackley function 49
22.214.171.124 De Jong function 50
126.96.36.199 Easom function 50
188.8.131.52 Griewangk function 51
184.108.40.206 Michalewicz function 52
220.127.116.11 Rastrigin function 52
18.104.22.168 Rosenbrock funtion 53
22.214.171.124 Schwefelfunction 53
126.96.36.199 Shubert function 54
188.8.131.52 Sphere function 54
3.4.2 Percentage improvement 56
3.5 Application of dCSA into Voice Recognition System (VRS) 57
3.5.1 Testing of speakers for recognition 58
3.6 Validation of Performance of CSA and dCSA Scheme in VRS 58
3.6.1 Accuracy 59
CHAPTER FOUR: RESULTS AND DISCUSSION
4.1 Introduction 60
4.2 Speech Signal Representation and Analysis 60
4.2.1 Feature extraction with Mel Frequency Celptral Coefficients (MFCC) 61
4.3 Results of the dCSA 65
4.3.1 Performance Evaluation of dCSA over CSA 66
4.4 Application of dCSA in Voice Recognition System 675
4.5 Testing of Speakers for Recognition 68
4.5.1 VRS GUI Usage Procedure 67
CHAPTER FIVE: SUMMARY, CONCLUSION AND RECOMMENDATIONS
5.1 Summary 71
5.2 Conclusion 71
5.3 Significant Contribution 72
5.4 Recommendation for Further Work 72
1.1 Background of the Research
Voice denotes to sound produced in a person‘s larynx and articulated through the mouth, as speech or
song, while speech refers to the ability to express thoughts and feelings by articulate sounds (Das &
Nahar, 2016). Voice is used to express certain opinion or interest using specific words. These words are
used for communication among individuals, which is the bridge that lays the foundation for the
improved human relationships (Amarasinghe & Wimalaratne, 2017). In addition to human-human
interaction, the spoken word is now extended through technological mediation such as telephony,
movies, radio, television, computers and the Internet to finds a reflection in human-machine interaction
as well. This gives rise to other interesting research topics like speech recognition, speaker
identification, and voice recognition (Huang et al., 2001). Research into voice recognition begun since
the early 1960‘s (Juang & Rabiner, 2005).
Voice recognition is a binary classification problem in which a person‘s identity is verified based on
his/her voice (Zhang et al., 2017). It has wide range of application area and plays a crucial role in the
arena of forensics, security and biometric authentication for verifying or detecting the voice of a speaker
from the group of speakers (Das & Nahar, 2016).
Human voice in general, carries much information such as gender, emotion and identity of the speaker.
The objective of voice recognition is to decide which speaker is present based on the individual‘s
utterance. A voice analysis is done after taking a sample of voice through microphone from a speaker
(Muda et al., 2010). The design of the system at the highest level contain two modules, feature
extraction and feature matching. Feature extraction (which consist of data processing and extraction) is
the process of extracting unique information from voice data that can later be used to identify the
speaker. Feature matching (which consist of feature classification and pattern matching) is the actual
procedures of identifying the speaker by comparing the extracted voice data with a database of known
speakers, and based on this a suitable decision is made (Price & Eydgahi, 2006).
In speaker recognition system, the main problem lies in the pattern recognition, and in a much broader
view, this problem belongs to a generic topic (i.e. pattern recognition) in science and engineering with
the aim of minimizing mismatch error and improve recognition accuracy (Kumar & Rao, 2011). The
goal of pattern recognition is to classify objects of interest into one of a number of categories or classes.
The objects of interest are generically called patterns, and in this research they are the sequences of
acoustic vectors that are extracted from an input speech signal. The classes here refer to individual
speakers (Kinnunen et al., 2011).
Classification is the problem of identifying to which set of categories (sub-populations) a new
observation belongs, on the basis of a training set of data containing observations (or instances) whose
category membership is known (Tang et al., 2014). Classification is an unsupervised technique in data
clustering that aims at grouping similar samples into groups called clusters, each cluster has maximum
within-cluster similarity and minimum between-cluster similarity based on certain similarity index
(Aggarwal & Reddy, 2014). Hence, classification technique in any feature matching of voice recognition
is an integral part that cannot be ignored, as it determines the recognition accuracy and performance of
the system. However, most of the existing classification techniques used in voice recognition systems
(VRS) were either classical or statistical methods that are prone to some challenges such as;
determination of best sequence of a model states, adjustment of model parameters so as to best account
for the observed signal, determination of the optimal training values etc.
However, nature inspired metaheuristic optimization algorithms are known to have a proven efficiency
in solving many optimization problems (Yang, 2012). Optimization is a process of producing solutions
to a problem under constrained situations. Optimization methods were developed with the zeal to utilize
available resources in the best way possible (Yılmaz & Küçüksille, 2015). Nature inspired computation
techniques are derived from the study of natural system. Candidate solutions to the optimization
problem play the role of individuals in a population, and the fitness function determines the quality of
the solutions (Kamat & Karegowda, 2014). Nature inspired metaheuristic algorithms forms a significant
part of modern global optimization algorithms, computational intelligence and soft computing. The
growing reputation of metaheuristics and swarm intelligence has fascinated a great deal of consideration
in engineering and industry, one of the reasons for this admiration is that nature-inspired metaheuristics
are flexible and efficient, and such seemingly simple algorithms can deal with very complex
optimization problems (Yang, 2012).
Cuckoo search algorithm (CSA) is also one of the nature inspired metaheuristic algorithm developed by
Yang & Deb in 2009, based on the obligate brood parasitic behaviour of some cuckoo species in
combination with the Levy flight behaviour of some birds and fruit flies. CSA has been proved to be an
effective optimization algorithm when compared with other algorithms. It has been applied as an
optimization algorithm for various tasks including finding optimal features, optimizing the parameters of
various classifiers including Artificial Neural Network (ANN), Support Vector Machines (SVM)
parameters, etc. (Kamat & Karegowda, 2014).
However, the standard CSA uses fixed value for both pa and and the main drawback of this method
appears in the number of iterations to find an optimal solution (Valian et al.,, 2011). A dynamic Cuckoo
Search Algorithm (dCSA) will be developed to address these problems in the standard CSA by
introducing inertia weight factor to the control parameters and increase its accuracy.
To extract features from voice signals in the Voice Recognition System (VRS), Mel-Frequency Cepstral
Coefficients (MFCC) technique will be used to produce set of feature vectors. Subsequently, dynamic
CSA will be employed at the classification level of the feature matching stage of this research work,
where it will be used to optimally classify the extracted feature vectors in order to improve recognition
accuracy of the VRS.
Speech is a complex signal produced as a result of numerous transformations arising at several different
levels, due to mixture of anatomical variances inherent in the vocal tracts of different individuals. These
inherent differences (unique features) are extracted from the speech signal for further analysis. For the
past six decades, researchers explored the utilization of these differences from the speech signal for
various applications such as, forensics investigation, security system, biometric check, voice
recognition, crime detection, etc. Methodologies adopted by the researchers for classification and
matching of these unique features in order to reduce mismatch error were mostly classical and statistical
methods, this include; Hidden Markov Model (HMM), Voice Quantization (VQ) and Dynamic Time
Warping (DTW). These techniques have some associated problems that hinders classification process,
which in turn affect recognition accuracy. However, metaheuristic algorithms, especially those based on
swarm intelligence are remarkably efficient and have many advantages over traditional and deterministic
methods. Also metaheuristic algorithms are problem independent, as they can be applied to solve
different kind of problems. Hence, better classification is expected.
Thus, this research work offers the development of a metaheuristic search algorithm named as the
dynamic cuckoo search algorithm to determine an optimal extracted feature classification in voice
1.3 Significance of Research
The significance of the research is the development of an optimal extracted classification scheme in
voice recognition system using dynamic cuckoo search algorithm, which improve the clustering in the
ion technique and better recognition accuracy. This has not been done by previous researchers.
1.4 Statement of Problem
Classification is an integral part of VRS that identifies to which set of categories a new observation
belongs. Existing classification techniques have some challenges in the determination of best sequence
of model states, determination of optimal training values and adjustment of model parameters to best
account for the observed signal.
Hence, to address these challenges of classification in voice recognition system, a Computational
Intelligence based classification scheme using a dynamic cuckoo search algorithm is developed in order
to increase the accuracy of the system.
1.5 Aim and Objectives
The aim of this research work is to develop an optimal extracted feature classification scheme in voice
recognition system using dynamic cuckoo search algorithm.
This aim was accomplished by the following objectives:
1. Obtaining a standard voice data from English Language Speech Database for Speaker
Recognition (ELSDSR) database of the Technical University of Denmark (DTU), process and
extract key features for voice recognition system (VRS).
2. Developing a dynamic Cuckoo Search Algorithm (dCSA) for optimal extracted feature
classification scheme in voice recognition system using same dataset obtained from ELSDSR
database of DTU.
3. Validating by comparing the performance of the VRS with a standard CSA-based scheme and
the dCSA-based scheme using accuracy as performance metrics in order to determine
improvement in the VRS.
The methodologies adopted are as follows:
1. Development of speakers‘ database for voice recognition system by:
a) Obtaining a standard voice dataset from ELSDSR database of DTU for training.
b) Extracting key features of the voice signal with MFCC.
c) Training of the extracted features for storage in a voice database.
2. Development of dynamic Cuckoo search algorithm (dCSA) by:
a) Initializing random population of n host nest and Cuckoo parameters.
b) Introducing random inertia weight to the control parameters (pa and α)
c) Generating new solutions by Lѐvy flight.
d) Evaluating fitness of the new solution and comparing with a randomly chosen nest,
retaining the best solution.
e) Performing local search to replace worst nest with new one and keeping the best solution.
3. Comparison of the standard CSA with the dynamic CSA using ten (10) standard optimisation test
functions (i.e. Michaelwicz, De Jong, Easom, Shubert, Griewangk, Ackley, Rastrigin, Sphere,
Rosenbrock and Schwefel).
4. Application of the dynamic CSA to VRS by:
a) Repeating step 1(a & b) above,
b) Classifying the extracted features using dCSA for matching and identification.
c) Testing of speakers for recognition.
5. Validation of the performance of VRS with CSA-based scheme and dCSA-based scheme using
Accuracy as the performance metric.
1.7 Dissertation Organization
The general introduction has been presented in Chapter One. The rest of the chapters are structured as
follows: Detailed review of related literature and relevant fundamental concepts about Voice itself, how
a speech or sound is produced. Voice recognition systems, categories and types of voice recognition
system, speech processing, speech signal acquisition, process of speaker recognition. Feature extraction
process and its techniques, classification and feature matching, classification process and its different
techniques in VRS. Metaheuristic optimization algorithms, Cuckoo search algorithm (CSA), inertia
weight factor, dynamic Cuckoo search algorithm (dCSA), optimization test functions are carried out in
Chapter Two. Likewise, an in-depth approach and relevant mathematical models describing the
development of an optimal extracted feature classification scheme in VRS using dCSA are presented in
Chapter Three. Furthermore, analysis, performance and discussion of the results obtained are shown in
Chapter Four. Finally, summary, conclusion and recommendations for further work makes up Chapter
Five. The list of cited references, transcript of audio messages used during the training/testing session
and MATLAB codes are all provided at the appendices of this dissertation.