A Real-Time Data Stream Processing Model For A Smart City Application Leveraging Intelligent Internet Of Things (Iot) Concepts
ABSTRACT
Due to the vast amount of data that is being generated by the sensors through the smart devices in smart cities, streams of data must be processed in real time to gain insight quickly and to make decisions that are in most cases critical and time sensitive. The difficulty is diminished by using big data methods such as Cassandra, Hadoop, Kafka and Spark to perform real-time stream processing in an Internet of Things (IoT) environment, such as traffic monitoring in a smart city environment. Among the different dimensions that improve the quality of life of people in a smart city, one of the very important one is transportation. Intelligent Traffic Monitoring System (ITMS) in a smart city, monitors traffic by detecting and displaying what is occurring on a particular road. In this thesis, a real-time data stream processing model was developed and used data streaming trends to monitor traffic in an ITMS.
Keywords: Smart cities, Real-time processing, Intelligent Traffic Monitoring System, real-time data stream processing model
TABLE OF CONTENTS
CERTIFICATION …………………………………………………………………………………………………….. ii
ABSTRACT ……………………………………………………………………………………………………. v
ACKNOWLEDGEMENT ………………………………………………………………………………………………. vi
DEDICATION ……………………………………………………………………………………………………… vii
LIST OF FIGURES ……………………………………………………………………………………………………. xi
CHAPTER ONE INTRODUCTION ……………………………………………………………………………. 1 1.1 Problem Statement …………………………………………………………………………… 5 1.2 Objectives ……………………………………………………………………………………….. 6
1.3 Thesis Organization ………………………………………………………………………….. 6
CHAPTER TWO LITERATURE REVIEW …………………………………………………………………… 7
2.1 Smart City Concepts …………………………………………………………………………. 7
2.1.1 Smart Mobility ………………………………………………………………………………………….. 7
2.1.2 Smart Grid ………………………………………………………………………………………………. 8
2.1.3 Smart Buildings………………………………………………………………………………………… 8
2.1.4 Smart Water ……………………………………………………………………………………………. 9 2.1.5 Smart goods ………………………………………………………………………………………….. 10
2.1.6 Smart Industry ……………………………………………………………………………………….. 10
2.1.7 Smart Lightning………………………………………………………………………………………. 10
2.1.8 Smart Waste Management ………………………………………………………………………. 10
2.1.9 Intelligent Traffic Monitoring Systems ………………………………………………………… 10
2.1.10 Smart Energy Management ……………………………………………………………………… 11
2.2 Architecture of Smart Cities ………………………………………………………………. 11
2.2.1 Urban Area ……………………………………………………………………………………………. 11
2.2.2 Dense and heterogeneous devices ……………………………………………………………. 12
2.2.3 Types of Data ………………………………………………………………………………………… 12
2.2.4 Communication Techniques……………………………………………………………………… 12
2.2.5 Control Centre Server ……………………………………………………………………………… 12
2.3 IoT for Smart Cities …………………………………………………………………………. 13
2.4 IoT Architecture ………………………………………………………………………………. 14
2.4.1 Message Queue/Stream processing block ………………………………………………….. 15
2.4.2 The Database Block ……………………………………………………………………………….. 15
2.4.3 The Distributed File System Block ……………………………………………………………………. 15
ix
2.5 Challenges of IoT ……………………………………………………………………………. 15
2.6 Traffic management system ……………………………………………………………… 16
2.6.1 Smart Parking System …………………………………………………………………………….. 16
2.6.2 Smart Street Lights …………………………………………………………………………………. 17
2.6.3 Public Transport ……………………………………………………………………………………… 17
2.7 Real-Time Data stream processing model …………………………………………… 18
2.8 Related works ………………………………………………………………………………… 19
2.8.1 Cost effective road traffic predictive model using Apache Spark …………………….. 19
2.8.2 Advanced traffic management system using IoT ………………………………………….. 22
2.8.3 Smart traffic light in terms of the Cognitive Road Traffic Management System (CTMS) based on the IoT ……………………………………………………………… 24
2.8.4 Traffic accident analysis using neural networks and decision trees …………………. 25
2.8.5 Big Data Analytics Architecture for Real-Time Traffic Control ………………………… 26
CHAPTER THREE METHODOLOGY …………………………………………………………………………. 27
3.1 Apache Kafka …………………………………………………………………………………. 27
3.2 Apache Spark ………………………………………………………………………………… 28
3.3 Spark Streaming …………………………………………………………………………….. 28
3.4 Real-time Integration of Apache Kafka with Apache Spark …………………….. 29
3.5 Cassandra ……………………………………………………………………………………… 29
3.6 Spring boot …………………………………………………………………………………….. 30
3.7 Architecture of the proposed system ………………………………………………….. 30
3.8 The Producers ……………………………………………………………………………….. 31
3.9 The Consumers………………………………………………………………………………. 32
3.10 The results …………………………………………………………………………………….. 32
3.11 Java ……………………………………………………………………………………………… 32
CHAPTER FOUR IMPLEMENTATION AND RESULTS ……………………………………………….. 33
4.1 Introduction ……………………………………………………………………………………. 33
4.2 Apache Zookeeper ………………………………………………………………………….. 33
4.3 Kafka and Zookeeper servers …………………………………………………………… 34
4.4 Spark and Cassandra ……………………………………………………………………… 36
4.5 Kafka producer ……………………………………………………………………………….. 37
4.6 Spark Streaming …………………………………………………………………………….. 37
4.7 Streaming statistics …………………………………………………………………………. 40
4.8 CHALLENGES ……………………………………………………………………………….. 41
CHAPTER FIVE SUMMARY, CONCLUSION AND RECOMMENDATION …………………….. 42
CHAPTER ONE
INTRODUCTION
Quintillion bytes of data are being generated daily and handling this enormous amount of data is becoming more tedious every day. These bytes of data are generated by people using devices such as mobile phones, laptops, smart devices and these devices are connected to the internet so as to be able to identify themselves to other devices. These devices are found everywhere in a smart city (Gehlot, 2016). According to (Santana, Chaves, Gerosa, Kon & Milojicic, 2016), a Smart City is a city in which social, business, communication, and technological aspects are supported by Information and Communication Technologies such as intelligent IoT and data collection sensors to improve the experience of the citizen within the city. To achieve that, the city provides public and private services that operate in an integrated and sustainable way. The bytes of data generated by the sensors must be used to make data – driven decisions as they are generated in real time proactively. As the sensors sense environments continuously, data streams generated must be processed in real-time to gain insight quickly because the data generated are in many cases critical and time sensitive. Smart cities are built on the Internet of Things. According to (Hahanov, 2015), the Internet of Things (IoT) is a new paradigm which involves exchange of data between different things through devices without human arbitration, automated collection, processing and analysis of large amounts of data, generated by sensors in an IoT environment. (Al Nuaimi, Al Neyadi, Mohamed & Al-Jaroodi, 2015) said that big data systems will store, process, and mine smart city applications information in an efficient manner to generate information to improve and enhance different smart city services. In addition, big data will help decision-makers who will use data generated by sensors, to plan for any expansion and extension such as making data-driven decisions in smart city services, resources, or areas.
2
The IoT environment has three components namely sensors which senses the movement of objects, actuators and embedded communication hardware; a middleware, which analyses and stores data and information generated by the hardware; and a presentation layer, in which users access, view, manipulate, and visualize data extracted from the hardware (Santana et al., 2016). There is a wide range of services and applications which covers fields such as transportation (intelligent road networks, smart mobility, smart traffic lights, smart parking systems, connected cars and public transport), public utilities (smart electricity, water and gas distribution), education, technology, health and social care, public safety (Radek Kuchta, Kuchta & Kadlec, 2014). This thesis will focus on transportation which is an important part of a smart city and will cover all areas of transportation.
Figure 1.1: Areas of smart city applications
(To & Cited, 2016)
3
One of the problems that most cities face is inefficient and ineffective traffic management. On a daily basis, the world’s population increases which leads to congestion on the roads because of the escalating population of people migrating to urban areas. The ITMS makes life easier by leveraging the IoT concepts to monitor the traffic; reduce the traffic congestion on the road; avoid traffic jam; improve public transportation; and provide better services within the city such as the smart traffic signals, traffic monitoring and control, and smart traffic lights. In the traffic monitoring system, if an accident occurs, an alert is sent out immediately and the remote monitoring system provides instant updates on the situation, drivers can receive warnings on their Global Positioning Systems (GPS) and through connected road signs, and traffic lights can adjust automatically to control the traffic appropriately in order to manage traffic flow and prevent traffic jams. The data generated by the sensors are used to monitor the traffic flows to determine how the traffic jam can be prevented. Based on the data generated, if it reports to the road safety, and if an accident occurs frequently on a particular road, it sends a message to the medical center to send an ambulance to the location in real time and this can save the life of a citizen if something is done immediately. To perform the real time data stream processing model, big data methods such as Apache Cassandra and some other sub projects of Hadoop such as Kafka and Spark are used. These methods enable data driven decisions to be made. Apache Cassandra is an open source No SQL (not only SQL) database used for distributed processing. Hadoop services provides for data storage, data processing, data access, data governance, security, analytics and operations (“What is Apache Hadoop_,” n.d.). According to (Apache Kafka, n.d.), Kafka is a solution to the real-time problems of any software solution, that is, to deal with real-time volumes of information and route it to multiple consumers quickly so that streaming data can be processed immediately as they are received.
4
Apache Spark is a fast, in-memory data processing engine with a classic and expressive development APIs to allow data workers to masterly execute streaming, machine learning or SQL workloads, graph processing that require fast continual access to dataset. In real time data stream processing, the data needs to be processed as fast as possible, and to achieve this, a fast platform is needed (“What is Apache Spark,” n.d.). After the big data methods have been applied, then the data is used to make data driven decision to determine actions to be taken when an event occurs and also to predict probable future occurrences and outcomes. In Prathilothamai, Lakshmi & Viswanthan (2016), the sensor data collected was converted to Comma Separated Values (CSV) files to retrieve the count of the vehicles that passed through and the speed of the vehicles within a particular duration. After the retrieval of these parameters, the file is uploaded onto Apache Spark and used to predict the traffic congestion status which is categorized as high congestion, medium congestion and low congestion. The main component of Apache Spark is the Spark SQL, where queries are processed on the sensor data (CSV file) to detect the number of vehicles, human count, and traffic status. In the proposed system, a message broker known as Kafka is used as a collector for monitoring events and as a tracker of users’ consumption of data streams processing in real time. Kafka is a distributed publish-subscribe system that is designed to be fast, reliable, scalable, and durable. Apache Spark is a fast and general engine for big data processing, with built-in modules for streaming SQL, spark streaming and it is it is faster than map although they largely perform the same task. Apache Spark performs at a rate that is hundred times faster than map and is therefore a more suitable alternative. The proposed system will be used to monitor the traffic congestion and to decide which actions to take when there is traffic congestion, actions to take when vehicles are disobeying the traffic rules and signals and to regulate traffic flow to prevent traffic jams.
5
Figure 1.2: Processing and storing data (Gehlot, 2016)
1.1 Problem Statement
In real-time data stream processing models, data driven decisions can be made in real time if real time streaming data is available in order to test the performance of the system. In an IoT environment such as for smart city application, the challenge does not lie in the ability to generate vast amount of data but the problem is that as the data is received, it needs to be processed rapidly because the data generated are in most cases, critical and time sensitive. To model a system that will not only generate the data in real time but will be able to determine how to use the data as it is received and process the data accordingly. Another problem lies in ensuring that the data generated by the sensors is not lost. Data driven decisions can be enhanced if data can be replicated for fault tolerance so that no data is lost. However, big data methods such as Cassandra, apache Kafka, Spark, and Zookeeper will be leveraged to perform real-time data stream processing.
6
The main concern is to ensure that no data is lost and that data displayed can be monitored as well as used to determine succeeding processes. 1.2 Objectives The main objective of this thesis is to demonstrate the ability to leverage big data methods such as Cassandra, Kafka, Zookeeper and Spark to perform real-time stream processing; to ensure that no data is lost ;monitor displayed data; determine succeeding data processes and monitor road traffic within an IoT environment for a traffic monitoring system in a smart city environment. The expected result is the integration of Kafka and Spark to perform real-time data stream processing, to process the data by Apache Spark, and to forward the data to the database. A big data tool is expected to be used to query the data from the database.
1.3 Thesis Organization
This thesis is organized as follows: Chapter One introduces the topic, defines the problem, sets out expectations as well as objectives, and also attempts to clarify what the thesis aims to achieve. Chapter Two presents the literature review in which the concepts of a Smart City are presented ensuring that the different areas of a smart city are explained, the real-time data streaming and analytical model for making data driven decisions for the ITMS is described, the IoT environment is explained and related works that leverage Intelligent IoT concepts are introduced. Chapter three introduces the big data methods such as Cassandra, Hadoop, Apache Kafka, Zookeeper, and Apache Spark which are used to implement the system and demonstrates the ability to leverage big data methods for the ITMS. Chapter Four explains how the real-time streaming model is implemented using the tools described in Chapter Three. The work is concluded in Chapter Five.
IF YOU CAN'T FIND YOUR TOPIC, CLICK HERE TO HIRE A WRITER»