ABSTRACT
Not all kinds of data can find efficient storage and manipulation in Relational (or SQL) database and neither is NoSQL database the best fit for all kinds of data. A hybrid database (a combination of both SQL and NoSQL databases for storage) is a better alternative where structured data are kept in the relational database and the rest in NoSQL database. The hybrid database comes with its challenges; among them is the necessity for the database administrator to learn the query languages of the databases that constitute the hybrid database. This research is focused on using one query language to query the hybrid database via a software layer. NoSQL (MongoDB) query language is adopted as the query language for the hybrid database in this research because it is the fastest growing query language and it is less vulnerable to injections as compared to SQL. The scope of operations supported by the software layer is limited to Create, Read, Update and Delete (CRUD) and MongoDB syntax had to be extended to cater for SQL functionalities necessary to execute CRUD operations in relational databases. The software layer was developed using Java Programming Language; it translates the MongoDB query language syntax to SQL for execution. For evaluation, both databases were fed with the same data via the software layer simultaneously and the CRUD operations were tested simultaneously on the hybrid database and the same result sets were obtained for each database; this asserts that the translation and execution were successful.
TABLE OF CONTENTS
DECLARATION ……………………………………………………………………………………………………………. ii
CERTIFICATION …………………………………………………………………………………………………………. iii
DEDICATION ………………………………………………………………………………………………………………. iv
ACKNOWLEDGEMENT ……………………………………………………………………………………………….. v
ABSTRACT ………………………………………………………………………………………………………………….. vi
LIST OF FIGURES ………………………………………………………………………………………………………… x
LIST OF TABLE ………………………………………………………………………………………………………….. xii
CHAPTER ONE: INTRODUCTION ………………………………………………………………………………… 1
1.1 Background to the Study …………………………………………………………………………………….. 1
1.2 Problem Statement …………………………………………………………………………………………………. 2
1.3 Aim ………………………………………………………………………………………………………………………. 2
1.4 Objectives ……………………………………………………………………………………………………………… 2
1.5 Methodology …………………………………………………………………………………………………………. 3
1.5 Structure of the Dissertation …………………………………………………………………………………….. 4
CHAPTER TWO: LITERATURE REVIEW ……………………………………………………………………… 5
2.1 Introduction …………………………………………………………………………………………………………… 5
2.2 Overview of the Relational Database Management System …………………………………………. 5
2.3 Overview of the NoSQL database …………………………………………………………………………….. 7
2.4 What is causing the migration to NoSQL ………………………………………………………………….. 8
2.4.1 Big Users: ……………………………………………………………………………………………………….. 8
2.4.2 The Internet of things: ………………………………………………………………………………………. 9
2.4.3 Big data: ………………………………………………………………………………………………………….. 9
2.4.4 The Cloud: ………………………………………………………………………………………………………. 9
2.5 Types of NoSQL ………………………………………………………………………………………………….. 10
2.5.1 Key/Value Database ……………………………………………………………………………………….. 10
2.5.2 Document Database ………………………………………………………………………………………… 10
2.5.3 Column Family Database ………………………………………………………………………………… 11
2.5.4 Graph Database ………………………………………………………………………………………………. 12
2.6 Related Works ……………………………………………………………………………………………………… 13
2.7 Approach …………………………………………………………………………………………………………….. 14
viii
2.7.1 Separate Software layer ……………………………………………………………………………… 15
2.7.2 Load SQL data in NoSQL ……………………………………………………………………………… 15
2.7.3 Load NoSQL data in SQL ……………………………………………………………………………… 15
2.8 Selection of NoSQL Database Candidate ………………………………………………………………… 16
2.9 Selection of SQL Database Candidate …………………………………………………………………….. 19
2.10 Why use NoSQL syntax instead of the familiar SQL ………………………………………………. 19
2.11 Summary …………………………………………………………………………………………………………… 20
CHAPTER THREE: DESIGN OF SQL AND NoSQL QUERY PROGRAM USING NoSQL SYNTAX …………………………………………………………………………………………………………………….. 21
3.1 Introduction …………………………………………………………………………………………………………. 21
3.2 Conversion of MongoDB syntax to SQL …………………………………………………………………. 21
3.2.1 Conversion of MongoDB‟s „Find‟ statement to SQL‟s „SELECT‟ statement …………. 23
3.2.2 Conversion of „Insert‟ statment in mongo to SQL‟s „INSERT‟ statement ……………… 25
3.2.3 Conversion of „Update‟ statement in mongo to SQL‟s „UPDATE‟ statement ………… 26
3.2.4 Conversion of „Remove‟ Statement in mongo to SQL‟s „DELETE‟ statement ………. 27
3.3 Extended Mongo Queries ………………………………………………………………………………………. 27
3.3.1 Create Table …………………………………………………………………………………………………… 27
3.3.2 Join……………………………………………………………………………………………………………….. 28
3.3.3 Sub query ………………………………………………………………………………………………………. 29
3.4 Syntax for Querying SQL, NoSQL and both ……………………………………………………………. 30
3.5 The architecture of the system ……………………………………………………………………………….. 31
3.6 Summary …………………………………………………………………………………………………………….. 32
CHAPTER FOUR: IMPLEMENTATION AND EVALUATION OF THE SYSTEM ………….. 33
4.1 Introduction …………………………………………………………………………………………………………. 33
4.2 Tools and technologies ………………………………………………………………………………………….. 33
4.2.1 Java ………………………………………………………………………………………………………………. 33 4.2.2 NetBeans ……………………………………………………………………………………………………….. 33 4.2.3 MongoDB ……………………………………………………………………………………………………… 34
4.2.4 MySQL …………………………………………………………………………………………………………. 34 4.2.5 JDBC ……………………………………………………………………………………………………………. 35
4.3 System Interface …………………………………………………………………………………………………… 35
ix
4.4 Evaluation of System ……………………………………………………………………………………………. 36
4.4.1 Evaluation of CRUD queries ……………………………………………………………………………. 37
4.4.2 Evaluation of Extended Queries ……………………………………………………………………….. 44
4.5 Summary …………………………………………………………………………………………………………….. 49
CHAPTER FIVE: SUMMARY, CONCLUSION AND RECOMMENDATIONS………………… 50
5.1 Summary and Conclusion ……………………………………………………………………………………… 50
5.2 Recommendations ………………………………………………………………………………………………… 51
REFERENCES …………………………………………………………………………………………………………….. 52
CHAPTER ONE
INTRODUCTION
1.1 Background to the Study
Relational database has dominated the database market and had catered for organizational data management needs for decades. It was almost the default choice for any serious data storage in most enterprises. The contemplation was which relational database is to be adopted for a project and not what kind of database will better suit the data in question. The relational database grew and matured over time that it is still considered dependable and reliable (Sadalage et al, 2012). Lately, enterprises have acquired vast volumes of data to store and make available at a very high operation rate to a heavy traffic of users posing a demand that makes the relational database inefficient. This demand can be attributed to the advent of the Web 2.0 that gave rise to very busy websites like social network, e-commerce and the likes. The relational database cannot meet the demands of the new trend. Nance (2013) stated that “relational databases are not well suited for modern web applications that can support millions of concurrent users by spreading the load across a collection of application servers”. This challenge made the category of database called Not Only SQL (NoSQL) to emerge and was easily adopted to cater for the prevalent data storage needs.
The NoSQL database found express acceptance but could not become a replacement of the relational database. They have some weaknesses that make the relational database outstanding by the solutions they proffered. Nance et al., (2013) explained that NoSQL databases can be used with applications that have: large transaction volumes, have the need for low-latency access to massive datasets, and have the need for nearly perfect service availability while operating in an
2
unreliable environment. This implies that both types of database (relational and NoSQL) have their strengths and weakness; none is a perfect fit for all kinds of data. There are situations where an enterprise has part of its data suitable for relational database storage and the other part that will fit well into the NoSQL database. Forcing the entire data into a single database will not serve the data management needs efficiently. It can only be better when the data are segregated according to the data model that best suits their usage and are stored in the database system that supports the data models. The application (software) of the enterprise will have to interact with more than a single database which Jafarpour et al., (2015) have termed it as hybrid database.
1.2 Problem Statement
The hybrid database has its demerits. The most crucial problem to this research is that the hybrid database requires the programmer(s) to learn the query languages for each database system that make up the hybrid database; for the programmer to be able to access and manipulate the databases that make up the hybrid database.
1.3 Aim
This research is aimed at bridging the gap between the two data stores by providing a platform for querying SQL and NoSQL databases using NoSQL query language.
1.4 Objectives
The objectives of this research are to:
a. Create a software layer that serves as a common interface for accessing both SQL and NoSQL databases.
3
b. Design an interpreter to translate NoSQL query syntax to SQL syntax in the software layer.
c. Implement and evaluate the software layer (system).
1.5 Methodology
Below are the procedures that will be followed to carry out this research:
a. Literature of researches that relates to bridging the gap between the SQL (Relational) databases and the NoSQL databases has been reviewed.
b. The selection of a candidate database to represent SQL databases and another candidate database to represent NoSQL databases. Both types of databases (SQL and NoSQL) are made up of several database management systems produced by different vendors. One representative from both categories was adopted for this research.
c. Adoption of the query language of one of the candidate databases query language to be the language for querying the hybrid system via the software layer. Developing a conversion scheme for translating the adopted language to the native query language of the other candidate database (SQL).
d. The design of the architecture of the system (software layer); it served as a framework that guided the implementation of the software layer.
e. The system was tested using the same data set for both databases to ensure the queries that are run on software layer produces the same result from both databases. This confirmed that queries were correctly converted and executed.
4
1.5 Structure of the Dissertation
Here is the organization of the rest of the dissertation. Chapter 2 covers the overview of the two types of databases (MySQL and NoSQL) and the review of related researches that were conducted in an attempt to bridge the gap between SQL and NoSQL databases as well as the approach to be taken to achieve the aim of the research. Chapter 3 focuses on the design of the of software layer (system). It covers the comparison of both query syntaxes and how to map out a conversion scheme. Chapter 4 has the implementation and evaluation of the system. This includes the technology and tools that are used to implement the system and the result obtained in the course of testing the system. Chapter 5 has the conclusion and the possible future research that can extend this research.
5
IF YOU CAN'T FIND YOUR TOPIC, CLICK HERE TO HIRE A WRITER»