ABSTRACT
This research is aimed at the development of an improved Hidden Markov Model (HMM) based fuzzy time series (FTS) forecasting model using Genetic Algorithm (GA). In order to improve forecasting performance, a GA and HMM is developed to optimize and properly estimate membership values in the fuzzy relationship matrix in the fuzzy inference stage. Monte Carlo simulation was applied to estimate the stochastic outcome of the data and further improve the model reflection of real data and randomness. The developed model was implemented in MATLAB R2015a and tested with the Cheng and Sheng‘s data of the daily average temperature and cloud density of Taipei as a benchmark data for bivariate FTS. The performance of the proposed GA-HMM based FTS was evaluated using Mean Square Error (MSE) and the Average Forecasting Error Percentage (AFEP) as metrics. The results showed that the developed model had an MSE of 0.5976 and AFEP of 1.8673 for the bivariate benchmark HMM-FTS data of the daily average temperature and cloud density of Taipei, Taiwan as against 0.933 and 2.7464 respectively obtained from (Li and Cheng, 2012). This amounts to an improvement of 35% and 32% for the MSE and AFEP respectively. The model was also applied to forecast the short term Internet traffic data of ABU, Zaria. Simulation result shows an MSE and AFEP values of 68.32392 and 0.08904 respectively, indicating a good forecasting performance considering the large size of these traffics and their randomness. Thus, these results demonstrate both the superiority of the proposed GA-HMM based FTS model at making good forecasts considering the large sizes of these traffics and their randomness and also its robustness in adaptation to time series of different structural and statistical characteristics.
Title Page . . . . . . . . . . . i
Declaration . . . . . . . . . . . ii
Certification . . . . . . . . . . . iii
Dedication . . . . . . . . . . . iv
Acknowledgement . . . . . . . . . . v
Table of Contents . . . . . . . . . . vii
List of Tables . . . . . . . . . . xi
List of Figures . . . . . . . . . . xii
List of Abbreviations . . . . . . . . . . xiii
List of Appendices . . . . . . . . . . xiv
Abstract . . . . . . . . . . . xvi
CHAPTER ONE: INTRODUCTION
1.1 Background . . . . . . . . . . 1
1.2 Motivation . . . . . . . . . . 3
1.3 Statement of Problem. . . . . . . . . . 3
1.4 Significance of Research. . . . . . . . . . 4
1.5 Aim and Objective. . . . . . . . . . 4
1.6
Methodology . . . 5
1.7 Dissertation Organization. . . . . . . . . 5
viii
CHAPTER TWO: LITERATURE REVIEW
2.1INTRODUCTION. . . . . . . . . . . 7
2.2 Review of Fundamentals Concepts. . . . . . . . 7 2.2.1 Times Series Models and Analysis. . . . . . . 7 2.2.1.1 Classification of Time Series. . . . . . 7
2.2.1.2 Components of time series. . . . . . . 8
2.2.2 Basic Concepts of Fuzzy Time Series. . . . . . 9
2.2.3 Basic Steps of Fuzzy Time Series Forecasting. . . . . 10
2.2.4 Hidden Markov Model (HMM) and its approach to Time Series Data. . . 11
2.2.4.1 Hidden Markov model. . . . . . . 11
2.2.4.2 Components of HMM. . . . . . . 14
2.2.5 The Standard HMM based FTS Forecasting Approach . . . . 15
2.2.6 Parameter Estimation Methods of HMM. . . . . . 20
2.2.7 Smoothing. . . . . . . . . . 22
2.2.8 Performance Metrics. . . . . . . . . 25
2.3 Review of Similar Works.. . . . . . . . . 26
CHAPTER THREE: MATERIAL AND METHODS
3.1 Introduction. . . . . . . . . . . 32
3.2 Development of the Improved HMM based FTS Forecasting Model using
GA for the Re-estimation of the Inner Fuzzy Relations . . . . 32
3.2.1 Obtaining Historical Time Series Data. . . . . . 33 3.2.2 Determination of the Universe of Discourse . . . . 34
ix
3.2.3 Partition the Universe of Discourse into Several Even-Length Intervals. . . 34
3.2.4 Defining the Fuzzy Sets on the Universe of Discourse. . . . 35
3.2.5 Fuzzifying the Time Series Data. . . . . . . 35
3.2.6 Building the HMM Model to Estimate the Fuzzy Relations. . . . 36
3.2.7 Developing the GA-HMM Model to Re-estimate Parameters. . . 36
3.2.8 Smoothing HMM Model Parameters. . . . . . . 38
3.2.9 Calculating Forecast Outputs. . . . . . . . 38
3.2.10 Defuzzifying the Forecasting Outputs. . . . . . 41
3.3 Performance Evaluation. . . . . . . . 41
3.4 Application of the Developed GA-HMM based FTS Model
to Forecast Short-term Internet Traffic Data of ABU, Zaria . . . 42
3.4.1 Obtaining Historical Time Series Data. . . . . 42 3.4.2 Determination of the Universe of Discourse . . . . 42
3.4.3 Partition the Universe of Discourse into Several Even-Length Intervals. 43
3.4.4 Defining the Fuzzy Sets on the Universe of Discourse. . . 43
3.4.5 Fuzzifying the Time Series Data. . . . . . 44
3.4.6 Building the HMM Model to Estimate the Fuzzy Relations. . . 44
3.4.7 Developing the GA-HMM Model to Re-estimate Parameters. . 45
3.4.8 Smoothing HMM Model Parameters. . . . . . 46
3.4.9 Calculating Forecast Outputs. . . . . . . 46
3.4.10 Defuzzifying the Forecasting Outputs. . . . . . 46
x
CHAPTER FOUR: RESULT AND DISCUSSION
4.
4.1 Introduction1 Introduction.. .. .. .. .. .. .. .. .. .. . . 4747
4.2 Obtained Results for the Developed GA-HMM based FTS Forecasting Model . . 47
4.2.1 Forecast Outputs of the Developed Model. . . . . . . 47
4.2.2 Results of the Defuzzified the Forecasting Outputs. . . . . 47
4.3 Performance Evaluation on the Bivariate HMM-FTS data. . . . . 50
4.4 Application of the GA
4.4 Application of the GA–HMM FTS Model on the Internet TraffiHMM FTS Model on the Internet Traffic of ABU, Zaria.c of ABU, Zaria… . . 5151
4.4.1 Forecast outputs of the developed model. . . . . . 51
4.4.2 Results of the Defuzzified the Forecasting Outputs . . . . . 51
4.4.3 Performance Evaluation on the Bivariate HMM-FTS data. . . . . 52
4.5 Computer Specification. . . . . . . . . . 53
4.6 Genetic Algorithm Parameter Specification. . . . . . . 53
CHAPTER FIVE: CONCLUSION SUMMARY AND RECOMMENDATION
5.1 Summary
5.1 Summary .. .. .. .. .. .. .. .. .. .. . . 5454
5.2 Conclusion. . . . . . . . . . . 54
5.3 Significant Contributions. . . . . . . . . 55
5.4 Limitations. . . . . . . . . . . . 55
5.5 Recommendations for Further Work . . . . . . . 55
References. . . . . . . . . . . . 56
xi
TABLE OF CONTENTS
Title Page . . . . . . . . . . . i
Declaration . . . . . . . . . . . ii
Certification . . . . . . . . . . . iii
Dedication . . . . . . . . . . . iv
Acknowledgement . . . . . . . . . . v
Table of Contents . . . . . . . . . . vii
List of Tables . . . . . . . . . . xi
List of Figures . . . . . . . . . . xii
List of Abbreviations . . . . . . . . . . xiii
List of Appendices . . . . . . . . . . xiv
Abstract . . . . . . . . . . . xvi
CHAPTER ONE: INTRODUCTION
1.1 Background . . . . . . . . . . 1
1.2 Motivation . . . . . . . . . . 3
1.3 Statement of Problem. . . . . . . . . . 3
1.4 Significance of Research. . . . . . . . . . 4
1.5 Aim and Objective. . . . . . . . . . 4
1.6
Methodology . . . 5
1.7 Dissertation Organization. . . . . . . . . 5
viii
CHAPTER TWO: LITERATURE REVIEW
2.1INTRODUCTION. . . . . . . . . . . 7
2.2 Review of Fundamentals Concepts. . . . . . . . 7 2.2.1 Times Series Models and Analysis. . . . . . . 7 2.2.1.1 Classification of Time Series. . . . . . 7
2.2.1.2 Components of time series. . . . . . . 8
2.2.2 Basic Concepts of Fuzzy Time Series. . . . . . 9
2.2.3 Basic Steps of Fuzzy Time Series Forecasting. . . . . 10
2.2.4 Hidden Markov Model (HMM) and its approach to Time Series Data. . . 11
2.2.4.1 Hidden Markov model. . . . . . . 11
2.2.4.2 Components of HMM. . . . . . . 14
2.2.5 The Standard HMM based FTS Forecasting Approach . . . . 15
2.2.6 Parameter Estimation Methods of HMM. . . . . . 20
2.2.7 Smoothing. . . . . . . . . . 22
2.2.8 Performance Metrics. . . . . . . . . 25
2.3 Review of Similar Works.. . . . . . . . . 26
CHAPTER THREE: MATERIAL AND METHODS
3.1 Introduction. . . . . . . . . . . 32
3.2 Development of the Improved HMM based FTS Forecasting Model using
GA for the Re-estimation of the Inner Fuzzy Relations . . . . 32
3.2.1 Obtaining Historical Time Series Data. . . . . . 33 3.2.2 Determination of the Universe of Discourse . . . . 34
ix
3.2.3 Partition the Universe of Discourse into Several Even-Length Intervals. . . 34
3.2.4 Defining the Fuzzy Sets on the Universe of Discourse. . . . 35
3.2.5 Fuzzifying the Time Series Data. . . . . . . 35
3.2.6 Building the HMM Model to Estimate the Fuzzy Relations. . . . 36
3.2.7 Developing the GA-HMM Model to Re-estimate Parameters. . . 36
3.2.8 Smoothing HMM Model Parameters. . . . . . . 38
3.2.9 Calculating Forecast Outputs. . . . . . . . 38
3.2.10 Defuzzifying the Forecasting Outputs. . . . . . 41
3.3 Performance Evaluation. . . . . . . . 41
3.4 Application of the Developed GA-HMM based FTS Model
to Forecast Short-term Internet Traffic Data of ABU, Zaria . . . 42
3.4.1 Obtaining Historical Time Series Data. . . . . 42 3.4.2 Determination of the Universe of Discourse . . . . 42
3.4.3 Partition the Universe of Discourse into Several Even-Length Intervals. 43
3.4.4 Defining the Fuzzy Sets on the Universe of Discourse. . . 43
3.4.5 Fuzzifying the Time Series Data. . . . . . 44
3.4.6 Building the HMM Model to Estimate the Fuzzy Relations. . . 44
3.4.7 Developing the GA-HMM Model to Re-estimate Parameters. . 45
3.4.8 Smoothing HMM Model Parameters. . . . . . 46
3.4.9 Calculating Forecast Outputs. . . . . . . 46
3.4.10 Defuzzifying the Forecasting Outputs. . . . . . 46
x
CHAPTER FOUR: RESULT AND DISCUSSION
4.
4.1 Introduction1 Introduction.. .. .. .. .. .. .. .. .. .. . . 4747
4.2 Obtained Results for the Developed GA-HMM based FTS Forecasting Model . . 47
4.2.1 Forecast Outputs of the Developed Model. . . . . . . 47
4.2.2 Results of the Defuzzified the Forecasting Outputs. . . . . 47
4.3 Performance Evaluation on the Bivariate HMM-FTS data. . . . . 50
4.4 Application of the GA
4.4 Application of the GA–HMM FTS Model on the Internet TraffiHMM FTS Model on the Internet Traffic of ABU, Zaria.c of ABU, Zaria… . . 5151
4.4.1 Forecast outputs of the developed model. . . . . . 51
4.4.2 Results of the Defuzzified the Forecasting Outputs . . . . . 51
4.4.3 Performance Evaluation on the Bivariate HMM-FTS data. . . . . 52
4.5 Computer Specification. . . . . . . . . . 53
4.6 Genetic Algorithm Parameter Specification. . . . . . . 53
CHAPTER FIVE: CONCLUSION SUMMARY AND RECOMMENDATION
5.1 Summary
5.1 Summary .. .. .. .. .. .. .. .. .. .. . . 5454
5.2 Conclusion. . . . . . . . . . . 54
5.3 Significant Contributions. . . . . . . . . 55
5.4 Limitations. . . . . . . . . . . . 55
5.5 Recommendations for Further Work . . . . . . . 55
References. . . . . . . . . . . . 56
xi
CHAPTER ONE
INTRODUCTION
1.1 Background
Time series is simply a collection of quantitative variables at regular intervals of time. Either discrete or continuous, time series are always both nonlinear and non-stationary since they are sample functions realized from processes that are always stochastic (Subanar & Abadi, 2011). Time series forecasting plays an important role in a great variety of applications, such as predicting university enrollments, stock prices, rainfall, blood pressure, and so on. Such forecasting usually uses a sequence of past data points which are typically measured successively for forecasting future outcomes (Sheng et al., 2009). Various techniques for time series forecasting have been evolved in recent decades. Compared with other models, Autoregressive Moving Average (ARMA) and Autoregressive Integrated Moving Average ARIMA-based models are prominent and highly useful. However, they cannot deal with time series vagueness and linguistic terms (Song & Chissom, 1993). In addition, these statistical methods could not perform appropriately on time series with a small amount of data (Tsaur, et al., 2005). Furthermore, the necessary conditions for applying the conventional time series with probabilistic models which requires some assumptions such as number of observations, normal distribution, and linearity (Egrioglu, 2014). Thus, these approaches results in misleading forecasting results when these assumptions are not satisfied. Therefore, non-probabilistic approaches have been put forward as an alternative to probabilistic time series forecasting models (Egrioglu, 2015).
To deal with such deficiencies, fuzzy time series (FTS) has been developed and widely applied (Radmehr & Gharneh, 2012). In recent years, FTS models have attracted the attention of many researchers because of their advantages: better performance in some real forecasting
2
problems (Song & Chissom, 1993), dealing with data in linguistic terms (Song & Chissom, 1993), and their ability to integrate with heuristic knowledge and models (Huarng, 2001).
One of the most important issues in FTS models is the determination of the fuzzy relations (Egrioglu et.al., 2013). In the literature, many methods have been used for determining fuzzy relations. These include fuzzy logic relationship group (FLRG), artificial neural networks, fuzzy relation matrices obtained from some fuzzy set operations, particle swarm optimization and genetic algorithms (Egrioglu, 2014). The most commonly used method is the fuzzy logic relationship group as it does not need to perform complex matrix operations when the FLRG tables are formed. However, when the FLRG tables are exploited, membership values of fuzzy sets are ignored as only fuzzy sets’ elements with the highest membership value are considered (Aladag et al., 2012). This situation causes information loss and it may affect the forecasting performance, negatively. Since the fuzzy relationships can be nonlinear and complex, an intelligent method is needed to calculate these relationships.
To deal with such deficiencies, Hidden Markov Model (HMM) have been developed and applied in formulating the fuzzy relationship, where the model parameters are estimated using a conventional search technique, known as the Baum–Welch algorithm (Li and Cheng, 2010). Since parameter learning in Hidden Markov Model using the Bawm-Welch algorithm is prone to be trapped in the local optima, it has become imperative that a technique for finding enhanced estimates of the fuzzy relations and also avoiding the local optima is required.
In recent years, artificial intelligence techniques have been used in different stages of fuzzy time series methods (Egrioglu, 2014). In this study, a Genetic Algorithm (GA) method was applied to obtain the optimal estimate of the inner fuzzy relations. GA is a well-known search heuristic that mimics the process of natural evolution. This heuristic is widely used to generate useful solutions to optimization and search problems including the partition problem in fuzzy time series (Cai et al., 2013). In general, GA consists of populations, chromosome,
3
fitness function and genetic operations. The population represents a set of proper solutions. And each individual in the population represents a potential solution to a specified object problem. The search space for the problem solution is defined in this population representation. Each of the variables that compose an individual is known as chromosome. The chromosomes are commonly coded into a string to form the individual. Each individual in the population is evaluated by a fitness function in order to determine how fit is the solution. The GA maintains a population of n possible solutions, i.e., individuals, with associated fitness values evaluated according to the fitness function (Koo et. al., 1990).
1.2 Motivation
1.2 Motivation
A lot of research works has been done on improving the accuracy of fuzzy time series forecasting models (Uslu et al, 2013; Bas et al, 2014; Zhang et al, 2013). Many researches have also been conducted on improving on the accuracy of FTS models using artificial intelligence optimization algorithms (Yolcu, 2014; Aladag et al, 2013, Haneen et. al., 2014). However, addressing the challenge of effectively capturing the relations properly and consequently improving the forecasting accuracy of the model still remains.
1.3 Statement of Problem
Fuzzy time series methods are effective techniques to forecast time series. Since its emergence, the study of fuzzy time series (FTS) has attracted more attention because of its ability to deal with the uncertainty and vagueness that are often inherent in real world data resulting from inaccuracies in measurements, incomplete sets of observations, or difficulties in obtaining measurements under uncertain circumstances.
The representation of fuzzy relations that are obtained from a fuzzy time series plays a key role in forecasting. In the analysis of time invariant fuzzy time series, fuzzy logic group relationships tables have been generally preferred for determination of fuzzy logic relationships. The reason of this is that it does not need to perform complex matrix operations when these tables are used. On the other hand, when fuzzy logic group relationships tables
4
are exploited, membership values of fuzzy sets are ignored. Thus, in defiance of fuzzy set theory, fuzzy sets’ elements with the highest membership value are only considered. This situation causes information loss, thus decreasing the forecasting accuracy of the model. Secondly, it is also apt to encounter the problem of rule redundancy and computational overhead. Consequently, there is the need for a technique that which can capture the relationships more properly regardless of the non-linear nature of the fuzzy time series data. Furthermore, the inherent uncertainty involving time evolution usually makes the transition of states in a system probabilistic.
As a means of addressing these limitations that are inherent in the existing FTS models, a forecasting model based on Hidden Markov Model (HMM) for fuzzy time series was employed to realize the probabilistic state transition. Typically, relationship (parameter) estimation for a HMM is performed using iterative scheme that are well-defined but is prone to being trapped into a local minima. Genetic Algorithm (GA) has been popular due to their capabilities in handling nonlinear relationships. To improve on the relationship representation, a GA-HMM based model was applied to effectively capture the relations properly and therefore, improve the forecasting accuracy of the model.
1.4 Significance of Research
The significance of the research is to develop an improved hidden markov model based fuzzy time series that can improve the forecasting accuracy through effective estimation of the fuzzy relations existing amongst the states of the historical time series data.
1.5 Aim and Objectives
The aim of this research is the development of an improved HMM based FTS forecasting model using Genetic Algorithm.
To accomplish the above aim, the following objectives were employed:
a) Development of the HMM based FTS model using Bauw-Welch estimation procedure.
5
b) Development of the improved HMM-based FTS model by optimizing the model parameters using GA.
c) Model validation using the bivariate benchmark FTS data of the daily average temperature and cloud density of Taipei and comparing results with those obtained by (Li & Cheng, 2012), using the performance measures of MSE and AFEP.
d) Application of the developed model on the Internet traffic data of ABU Zaria.
1.6 Methodology
The methodology adopted for this research towards developing an improved hidden Markov model based fuzzy time series forecasting model using genetic algorithm is highlighted as follows.
a) Development of the standard HMM based FTS forecasting model using the relative frequency Bawm-Welch estimation procedure.
b) Improvement of the developed HMM based FTS model through re-estimation of the model parameters using GA.
c) Model validation using the bivariate benchmark FTS data of the daily average temperature and cloud density of Taipei and comparing results with those obtained by (Li & Cheng, 2012), using the performance measures of MSE and AFEP .
d) Application of the developed model to forecast short-term Internet traffic data of ABU Zaria using Internet traffic data from 29th February to 31st March, 2016, obtained from ABU, Zaria data center and evaluate its performance.
1.7
1.7 Dissertation OrganizationDissertation Organization
The general introduction has been presented in Chapter One. The rest of the chapters are structured as follows: First, detail review of related literature and relevant fundamental concepts about Time series, fuzzy time series forecasting, Markov Model (MM), Hidden Markov Models (HMM) and Genetic Algorithms (GA) are carried out in Chapter Two.
6
Second, an in-depth approach and relevant mathematical models describing the development of the improved hidden markov model based fuzzy time series forecasting model using genetic algorithm are presented in Chapter Three. Third, the analysis, performance and discussion of the result are shown in Chapter Four. Finally, conclusion and recommendations of further work makes up the Chapter Five. The complete MATLAB codes are provided in the appendices.
7
IF YOU CAN'T FIND YOUR TOPIC, CLICK HERE TO HIRE A WRITER»