ABSTRACT
Scheduling in distributed systems generally aims to leverage the power of diverse heterogeneous, geographically distributed, multiple-domain-spanning computational resource to provide optimal system performance, high throughput computing and maximum resource utilization. To achieve this goal, an efficient and effective scheduling system is fundamentally important. However, it is very difficult to do this using the traditional off-the-shelf scheduling software. Many issues have to be treated, such as, poor system design, limitation in heuristic-based implementation, system adaptability and scalability. This leads to the necessity of an additional software infrastructure to provide solutions to the aforementioned problems. This research focuses on the design of a general-purpose distributed scheduling framework, based on the concepts of object-oriented and multi-agent design patterns. The dissertation describes how object-oriented and multi-agent design patterns are used to model an adequate scheduling system. Specifically, a multi-component scheduling framework, capable of scaling the scheduling of user applications across different distributed system environments is proposed. The proposed solution suggests using two levels of scheduling: global and local. The global scheduling policies assign the response time requirements for scheduling component service invocations. The local scheduling policies are responsible for performing request scheduling, in order to meet these requirements. The proposed scheduling approach does not require a central point of control, its platform independent, and essentially, it provides quality of service to both user applications and resource owners, by reaching a compromise between their necessities. The experiments conducted using simulation, were used to study the effectiveness and the feasibility of the proposed scheduling schemes in respect to various deployment requirements. The validity of the simulation was confirmed by comparing its results with the results obtained in experiments with other heuristic-based scheduling algorithms. The proposed approach was shown to work well for different classes of applications flow, including, compute-intensive and data-intensive applications, under different scheduling policies.
TABLE OF CONTENTS
TITLE PAGE ……………………………………………………………………………………………….. II
DECLARATION ………………………………………………………………………………………….. II
CERTIFICATION ……………………………………………………………………………………….. III
DEDICATION …………………………………………………………………………………………….. IV
ACKNOWLEDGEMENTS ……………………………………………………………………………. V
TABLE OF CONTENTS ……………………………………………………………………………… VII
LIST OF FIGURES…………………………………………………………………………………….. XIII
LIST OF TABLES ……………………………………………………………………………………. XVIII
LIST OF ABBREVIATIONS ……………………………………………………………………….. XX
LIST OF SYMBOLS ………………………………………………………………………………… XXIII
ABSTRACT ……………………………………………………………………………………………. XXIV
1.0 GENERAL INTRODUCTION ………………………………………………………… 1
1.1 Background of the Study ………………………………………………………………………1
1.1.1 Fundamental Concepts ………………………………………………………………. 3
1.2 Research Motivation …………………………………………………………………………….6
1.3 Research Problem Description …………………………………………………………….10
1.4 Research Aim and Objectives ……………………………………………………………..12
1.5 Methodology……………………………………………………………………………………..13
1.6 Research Contribution to Knowledge …………………………………………………..14
1.7 Dissertation Outline ……………………………………………………………………………15
2.0 LITERATURE REVIEW ………………………………………………………………. 17
2.1 Synopsis of Chapter ……………………………………………………………………………17
2.1.1 Formalism Description …………………………………………………………….. 17
viii
2.1.2 Job Characteristics …………………………………………………………………… 18
2.1.3 Machine Environment ……………………………………………………………… 19
2.1.4 The Notion of Resources ………………………………………………………….. 20
2.2 Background Information on Distributed Systems …………………………………..20
2.2.1 Distributed systems …………………………………………………………………. 21
2.2.2 Distributed Scheduling System …………………………………………………. 31
2.2.3 Distributed Artificial Intelligence and Multi-Agent System ………….. 33
2.2.4 Distributed Task Allocation, Coordination and Scheduling …………… 35
2.3 Multi-Agent System …………………………………………………………………………..36
2.3.1 Characteristics of Multi-Agent System ………………………………………. 38
2.3.2 Agent Interaction and Communication ………………………………………. 40
2.3.3 Advantages of Multi-Agent Systems ………………………………………….. 41
2.3.4 Design Method for Multi-Agent System …………………………………….. 42
2.3.5 Multi-Agent-Based Scheduling System Architecture …………………… 44
2.3.6 Challenging Issues with Multi-Agent Systems ……………………………. 50
2.4 Object-Oriented Software Design ………………………………………………………..51
2.4.1 Object-Oriented Database Design ……………………………………………… 52
2.5 Scheduling Methods …………………………………………………………………………..58
2.5.1 Object-Oriented Method and Control (OOC) ………………………………. 58
2.5.2 Multi-Agent Method (MA) ……………………………………………………….. 59
2.5.3 Operations Research Method (OR) ……………………………………………. 60
2.5.4 Knowledge-Base Method (KB) …………………………………………………. 60
2.5.5 Rule-Based Approach ………………………………………………………………. 61
2.5.6 Neural Networks Method (NN) …………………………………………………. 61
ix
2.5.7 Expert Systems Method (ES) ……………………………………………………. 62
2.6 State of the Art Scheduling Systems …………………………………………………….62
2.6.1 Scheduling Algorithms Overview ……………………………………………… 63
2.6.2 Scheduling and Resource Management Systems Overview ………….. 68
2.7 Survey of Related Approaches and Contribution of Dissertation ……………..73
2.7.1 Adaptive Scheduling for Distributed Systems …………………………….. 73
2.7.2 Multi-agent Cluster Scheduling for Scalability and Flexibility ……… 75
2.7.3 Decentralized Coordination in Multi-agent System ……………………… 76
2.7.4 Distributed Problem Solving by Means of Agent Negotiation ………. 77
2.8 Summary and Limitations of the Related Work ……………………………………..79
3.0 GENERAL-PURPOSE SCHEDULING FRAMEWORK ………………… 81
3.1 Synopsis of Chapter ……………………………………………………………………………81
3.2 Architecture of Object-Oriented Distributed Scheduling System ……………..82
3.2.1 Object-Oriented Preliminaries …………………………………………………… 83
3.2.2 Aggregation, Generalization and Inheritance ………………………………. 88
3.2.3 Dynamic Modelling of System Behaviours ………………………………… 89
3.3 Scheduling System Frameworks and Design Patterns …………………………….94
3.3.1 Scheduling System Frameworks ……………………………………………….. 94
3.3.2 Scheduling System Design Patterns …………………………………………… 95
3.3.3 Scheduler Pattern …………………………………………………………………….. 97
3.4 Proposed Object-Oriented Scheduling System Architecture ……………………98
3.4.1 Object-Oriented Design Philosophy …………………………………………… 99
3.4.2. System Architecture Overview …………………………………………………. 99
3.4.3 Object-oriented Scheduling System Design Pattern ……………………. 102
x
3.4.4 Object-Method Interactions …………………………………………………….. 107
3.4.5 Abstraction of Scheduling System Design Pattern ……………………… 109
3.4.6 Job Object Scheduling Process ………………………………………………… 110
3.4.7 Multi-component Application Model ……………………………………….. 126
3.5 Objects Interaction and Collaboration Pattern ……………………………………..128
3.6 The Proposed Multi-agent Scheduling Model ………………………………………131
3.6.1 General Scheduling Blueprint for Multi-Agent System ………………. 133
3.7 Agent-Based Communication Architecture …………………………………………141
3.8 Object-based Storage System Structure ………………………………………………148
3.9 Techniques for Achieving Components Reusability and Portability ……….150
3.10 Summary of Chapter‘s Contributions ………………………………………………..151
4.0 MULTI-AGENTS SCHEDULING COORDINATION, RESOURCE SELECTION AND ALLOCATION …………………………………………………………. 152
4.1 Synopsis ………………………………………………………………………………………….152
4.1.1 Multi-Agent Scheduling Platform ……………………………………………. 154
4.1.2 MAS Communication Mechanism …………………………………………… 157
4.1.3 Agents Interaction …………………………………………………………………. 159
4.2 The ANN Based Multi-Agent Resource Selection Architecture ……………..163
4.2.1 MAS Planning through Coordination and Negotiation ……………….. 167
4.2.2 Using MPGP to Solve Resource Selection Problem …………………… 168
4.2.3 Multi-Agent System Learning with the Backpropagation Algorithm ……………………………………………………………………………………………………. 174
4.2.4 Multiple Input Backpropagation Algorithm for ANN Training ……. 178
4.2.5 ANN Resource Selection Behaviour ………………………………………… 180
xi
4.2.6 Case Study ……………………………………………………………………………. 182
4.3 Inter-Site Scheduling Technique ………………………………………………………..189
4.3.1 Application of Routing Indices in Multi-Agent Scheduling ………… 190
4.3.2 Resource Search Technique ……………………………………………………. 200
4.3.3 Resource Search Algorithm …………………………………………………….. 202
4.3.4 Search Algorithm Comparison ………………………………………………… 204
4.4 Job Scheduling and Allocation Mechanism …………………………………………207
4.4.1 Job Agent Module …………………………………………………………………. 209
4.4.2 Formalization of Cluster Resource Mapping …………………………….. 212
4.3.3 Agent-Based First-Fit Resource Assignment Strategy ………………… 216
4.5 Resource Ranking Strategy ……………………………………………………………….218
4.5.1 Existing Resource Ranking Strategies ………………………………………. 219
4.5.2 Proposed Resource Ranking Strategy ………………………………………. 219
4.6 Job Resource Requirement Constrained Selection Model ……………………..229
4.6.1 Agent-Based Matchmaking Model …………………………………………… 230
4.6.2 Efficient Resource Sorting Criteria ………………………………………….. 234
4.6.3 Efficient Resource List Searching Technique ……………………………. 235
4.6.4 Agent-Based Close to File Job Placement Mechanism ……………….. 238
4.6.5 Agent-Based Job-Resource Assignment Model …………………………. 243
4.7 Main Contributions of Chapter …………………………………………………………..246
5.0 FRAMEWORK EVALUATION ………………………………………………….. 248
5.1 Synopsis ………………………………………………………………………………………….248
5.2 Data Set ………………………………………………………………………………………….250
5.3 Data Sets from MetaCentrum …………………………………………………………….250
xii
5.4. Cluster Performance Analysis (Experiment 1) …………………………………….250
5.4.1 Applications ………………………………………………………………………….. 252
5.4.2 Simulation Results and Discussion ………………………………………….. 258
5.5 Simulation Using Agent Grid Repast Simulator (Experiment 2) …………….266
5.5. 1 Scheduling Cost Functions …………………………………………………….. 268
5.5.2 Searching Algorithms Comparison ………………………………………….. 270
5.5.3 Multi-component Scheduler Evaluation ……………………………………. 272
5.6 Evaluation of Proposed Matchmaking Algorithm (Experiment 3) ………….276
5.6.1 Local Scheduling Policies ………………………………………………………. 276
5.6.2 Experiment Setup ………………………………………………………………….. 280
5.6.3 Performance Evaluation of the Matchmaker Algorithm ……………… 287
5.6.4 Performance Evaluation Based on Job Related Criteria ………………. 291
5.7 Evaluation of Task Scheduling With Agent and Without Agent (Experiment 4) .295
5.8 Performance Evaluation of Agents Network Traffic Load (Experiment 5) 297
5.9 Discussion of the Results in this Chapter …………………………………………….299
6.0 SUMMARY, CONCLUSION AND RECOMMENDATIONS FOR FUTURE DIRECTION ……………………………………………………………………………. 300
6.1 Summary…………………………………………………………………………………………300
6.2 Conclusion ………………………………………………………………………………………300
6.3 Recommendations for Future Direction ………………………………………………303
LIST OF JOURNAL PUBLICATIONS ……………………………………………………………………305
REFERENCES …………………………………………………………………………………………..307
APPENDICES ……………………………………………………………………………………………326
xiii
CHAPTER ONE
GENERAL INTRODUCTION
1.1 Background of the Study
Research in distributed systems, specifically, grids and clouds scheduling has grown tremendously over the last four decades. Scheduling problems are complex, and as such, several individuals and research groups from different fields have studied them for decades. Different scheduling techniques have emerged as a result, and some have been applied to tackle many different scheduling problems. Some of these scheduling techniques include the critical path methods, dynamic programming, branch and bound, and priority rules. Unfortunately, most of the scheduling problems have no known polynomial time algorithms, and are therefore classified as NP-hard in complexity theory. Based on this reason, many researchers have used different scheduling methodologies such as the artificial intelligence, expert systems, operations research, multi-agent, object-oriented and sometimes, a combination of two or more of these methods to develop scheduling infrastructures.
Generally speaking, a large number of the scheduling systems developed and in use today are basically application specific type of systems, designed specifically to solve specific problems, while only a handful are for general-purpose. Often, the general-purpose systems lack adaptability in terms of platform shift, and are usually not cost effective to maintain due to initial poor design conception and implementation. On the other hand, application specific scheduling systems usually fare somewhat better than their generic counterpart, for the simple fact that they tend to be more precise in handling problems. However, they are sometimes more difficult to manage and cannot adapt to dynamic environments. Considering these facts, it then justifies the fact that
2
there should not be any trade-off whatsoever between adaptability and performance with regards to adequate and quality design, implementation and adaptability.
In this dissertation, the design of an inter-provider distributed scheduling platform that addresses the scheduling of multi-component applications within the context of distributed computing systems environment is considered. The projected problem domain to be addressed is targeted at allocation and coordination of computational tasks scheduling problems in heterogeneous computing platforms such as Cloud computing, Grid computing, Utility computing, Client-server computing, Cluster computing and Peer-to-peer computing.
A carefully designed scheduling pattern can be used to solve some of the challenging problems identified in Section 1.3. However, the dissertation main focus is to addressing three main issues, which include, system design paradigm (a major reason for the limited success of the current scheduling systems, designed based on the concepts of classical design techniques which are either operation or data oriented), heuristic-based implementation methods (current scheduling systems or scheduling algorithms use heuristics or approximation methods to find solutions to most scheduling problems) and system adaptability (current scheduling systems are mainly used for meeting specific needs and objectives, and for solving specific domain related problems). Generally, a complete scheduling system has to provide capabilities for scaling of individual applications across variety of distributed system environments, extensibility (perform similar scheduling task in a different distributed system platform), robustness (fault-tolerance in a dynamic heterogeneous and unpredictable distributed computing environment), adaptability with respect to inter-providers’ tasks coordination and good performance given limited but available scarce resources.
3
The remainder of this chapter present some fundamental concepts and discusses in detail the research motivation, proposed research objectives and concepts, ways of achieving the set objectives and the contribution of this dissertation to the areas of distributed tasks scheduling in the domain of distributed computing systems.
1.1.1 Fundamental Concepts
The scheduling problem in distributed systems environment has been an active research topic, and therefore, many terminologies have been suggested. Unfortunately, some of the terms are neither clearly stated nor consistently used by different researchers, which frequently confuse readers. For clarity, in this dissertation, some key terminologies are re-defined (adopted from (Malathi and Sarumathi, 2010; Zhang et al., 2009; Zhu and Ni, 2013)).
Distributed system: A distributed system is a networked framework that enables the sharing, selection, and aggregation of geographically distributed autonomous and heterogeneous resources dynamically at runtime depending on their availability, capability, performance, cost, and users’ quality-of-service requirements. Examples of these include grids, clouds, client-servers, peer-to-peer, clusters and workstations.
Resource: A resource is an infrastructure that is required to execute a task, for example, a processor used for data processing, data storage device, or a network link for data transporting.
Scheduling: Scheduling is a technique with which scheduler gathers both the resource and application state information, selects appropriate resources, generates schedules, predicts the potential performance for each candidate schedule, and determines the best schedule for the applications to be executed on a distributed system, subject to some performance goals.
4
Scheduling goals: The main goals of scheduling are to optimize system and user oriented objectives such as maximizing throughput, maximizing resource utilization, minimizing execution time, and fulfilling economic constraints.
Scheduler: A scheduler is the mediate resource controller that interfaces between the users’ applications and the underlying resources. The scheduler is responsible for discovering available resources for an application, selecting the appropriate system(s), and submitting the application.
Application: An application is a collection of jobs coordinated to solve a certain problem. In other words, an application may consist of a number of jobs, either dependent or independent, which together fulfil the whole task.
Multi-component applications: may be defined as large, related collection of interacting computational tasks, schedulable across multiple execution sites and designed to achieve a specific overall result.
Job: A job is considered as a single unit of work within an application. It is typically allocated to execute on one single resource. It has input and output data, and execution requirements in order to complete its task.
Task: A task is an atomic unit of a job to be scheduled by the scheduler and assigned to a resource.
Site: Site is considered as a computational resource that can interact with schedulers directly, and accept workloads from the schedulers. A site may be a simple personal machine, a workstation, a supercomputer, or a cluster of workstations. From a scheduler‘s point of view, a site is an atomic resource unit that can be allocated to application jobs (Fig. 1.1).
5
Processing node: A processing node (also referred to as computing node or computational node), means different things under different environments. Under the context of Grid environments, a processing node is a site. However, in the context of Cluster environments, a processing node means a stand-alone computer.
Multi-component computing: Multi-component computing is the seamless application of geographically distributed non-dedicated meta-computing systems to user applications. A meta-computing platform generally consists of a collection of heterogeneous, non-dedicated computing resources, which may include multiprocessor platforms, computer Grids, cloud computing infrastructures, among others.
Hardware architecture: Computer architecture can be classified into four categories in terms of their control flows and data flows. SISD: an acronym for single instruction single data, a single instruction flow operates on a single data flow. This describes the sequential processing techniques, SIMD: stands for single instruction multiple data, a single instruction flow on multiple data flow simultaneously, MISD: multiple instruction single data, which implies multiple instruction flows operate on single data flow simultaneously, it seldom appears in real architecture and MIMD: multiple instruction flows operate on multiple data flows simultaneously. The two most commonly used architectures are the SIMD and MIMD.
Heterogeneous Computing Environments: are categories of distributed computing resources. They comprise of several Processing Elements (PEs), each with different processing capabilities or speeds (in-terms of how fast they can execute tasks), communication latencies between PEs, and the amount of memory associated with each PE. The main focus of this dissertation, however, is on heterogeneous computing environments and is explained in detail, in the next chapter.
6
Fig. 1.1: Computing Environment where Clusters of PEs from Five Sites are Connected by High-Latency Networks (Janjic, 2012)
1.2 Research Motivation
The need for distributed computing platforms is on the rise, especially with the recent advent of cloud computing and the Big Data paradigm. For example, in the work of Konwinski (2012), it was stated that the clusters of commodity servers have been widely adopted as primary computing platform for large Internet services, data-intensive scientific applications and enterprise analytics. These distributed computing environments are somewhat defined and characterised by heterogeneity and dynamism. According to Frincu (2011), the behaviour of most scheduling algorithms designed specifically for this environment is drastically and continuously being affected by the system‘s configuration. Subsequently, these scheduling algorithms become susceptible to other challenging issues such as fault-tolerance, scalability, computational and communication heterogeneity and resource sharing. These issues will be discussed in detail in chapter two. Again, sometimes the negative performances of these algorithms have resulted in poor execution of user submitted tasks and sometimes non execution of such tasks might be recorded. One noticeable consequence of these effects is that, often
7
times, wrong estimate of the scheduling algorithm is taken and it negatively influences the cost function (such as, execution cost and processing time).
The problem of scheduling distributed heterogeneous resources and computational tasks which are somewhat compute intensive is known to be extremely challenging or is said to be NP-complete and has been studied for decades (Stavrinides and Karatza, 2012; Wu et al., 2011). NP-complete problems are in NP, the set of all decision problems whose solutions can be verified in polynomial time; NP may be equivalently defined as the set of decision problems that can be solved in polynomial time on a non-deterministic Turing machine. The complexity of this problem increases especially when the scheduling process becomes dependent upon the activities of some other scheduler that lacks knowledge of the applications to be scheduled. This type of scheduling process is common within distributed scheduling environments, where for example, scheduling agents have to cooperate, coordinate and negotiate among themselves to achieve particular scheduling goals, such as, maximizing resource utilization and determining an allocation which satisfies certain restrictions.
In line with the identified scheduling challenge, the problem can be addressed, by drawing upon the potentials of object-oriented software design, and multi-agent technology to deal with such complex situations. There are many advantages in following an object-oriented design approach in the development of a scheduling system (Pinedo and Yen, 1997), also, it seems appropriate to consider multi-agent scheduling approach as good candidate for solving dynamic scheduling problems, especially in a typical distributed heterogeneous computing system environment. One good reason for this suggestion is that, agents based on some special characteristics are capable of handling complex problems in distributed systems (Jennings, 2001; Lucena and Nunes, 2012).
8
Pinedo and Yen (1997), are of the opinion that, in choosing an object-oriented design approach in the development of distributed scheduling system, the method offers some specific and unique benefits. First, the design should be modular, which makes maintenance and modification of the system relatively easy. Second, large segments of the code should be reusable. This implies that two scheduling systems that are substantially different may still share a significant amount of code. Third, the designer should think in terms of the behaviour of objects, not in low-level detail. In other words, the object-oriented design approach can speed up the design process and separate the design process from the implementation process. It is equally important to note that, by following this method, the proposed scheduling solution would have automatically addressed the major point raised in the dissertation’ problem statement regarding design and partly implementation issues mentioned in section 1.3.
The dissertation’ major goal is to develop an architectural design pattern for a broad-spectrum scheduling system that addresses some of the key scheduling problems in a typical distributed computing environment. Broad-spectrum in this case, implies a scheduling system that is adaptive and somewhat generic solution approach to problem solving and whose tasks allocation and coordination capability is able to scale beyond certain inter-provider environment. Multi-Agent Systems offer a natural extension to distributed tasks scheduling as they allow resources from multi-providers to inter-operate autonomously through negotiation (Ouelhadj et al., 2005).
In a typical distributed problem solving environment scenario, intelligent agents often tend to pursue their own individual goals when solving a problem. However, most multi-agent schedulers rely on specialized agent hierarchies (Cao et al., 2005), that could be likened to the conventional centralized schedulers which are disadvantaged by the single point of failure orientation that often times leads to failure in the overall
9
functioning of the scheduler. Besides, they require for each virtual organisation to use the existing agents and thus to redefine their scheduling policies. In contrast, a completely decentralized approach with no strict adherence to any form of hierarchies and in which agents are autonomous in their dealings, offers a more fault-tolerant scheduling environment. Likewise, in order to allow virtual organisations to maintain their autonomy, only the communication protocol should be standardized. In this way, any member of the distributed system could expose its own custom built agent, in terms of scheduling policies, to deal with task coordination and negotiations issues.
Based on the aforementioned facts regarding the different scheduling system implementation, the dissertation also recognizes several approaches that have been proposed towards creating either adaptive scheduling strategies or multi-agent system for computational tasks scheduling. However, most of these approaches are problem-defined and often focus on individual scheduling knowledge orientation such as coordination, negotiation, service-level agreement, adaptive scheduling, interactive scheduling, and cooperative scheduling. In this sense, building a functional Scheduling Platform which is fully distributed in terms of communication, storage, scheduling and negotiation solutions and also, adaptive by nature is obviously still an open research challenge. To address this challenge, a proposal is made in this dissertation, to design an adaptive inter-provider scheduling framework based on the principle of object-oriented and autonomous multi-agent software design patterns. The proposed framework environment is modelled upon a decentralized peer-to-peer network overlay. This choice is simply based on the fact that, this network model often produces efficient systems interaction and coordination (Crespo and Garcia-Molina, 2005; Solar et al., 2012).
10
1.3 Research Problem Description
The efficient scheduling of available computing resources can greatly increase the amount of resources available to users of data-intensive applications, where the time spent on file transfer dominate the computing times. In the case of compute-intensive applications, the situation is the opposite. However, it is very difficult to achieve this by using the traditional off-the-shelf scheduling software and resource provisioning systems. Many issues have to be addressed first, such as system design paradigm, heuristic-based implementation methods, scalability, application deployment, distributed scheduling, adaptability, heterogeneity, unpredictability of the resource environment, fault-tolerance, and system flexibility. This leads to the necessity of an additional software infrastructure to provide solutions to the earlier mentioned challenges in this section. The following properties must be considered when developing such an infrastructure.
a. It must allow the provisioning of heterogeneous resources efficiently, such as, processors, memory, storage, and other kinds of hardware resources, which would otherwise remain idle or underutilized.
b. It needs to allow for decentralized scheduling and execution of users’ jobs on various heterogeneous resources running on different distributed computing environments.
c. It needs to hide low level scheduling details from the user. The user does not need to know where submitted applications will be executed. Object-oriented concept is well suited for this approach.
d. It needs to provide quality of service to both users’ applications and resource providers, by reaching a compromise between their requirements.
e. It needs to be adaptable, by allowing existing applications to easily adapt to the new context.
11
f. It needs to have a low deployment cost, not requiring extensive reconfigurations on existing systems.
g. It must not rely only on heuristic methods alone for solution to problems. Autonomous agents can also be trained using non-linear techniques to perform different scheduling functions with greater precision. Even though heuristic approaches can be fast and efficient, agents are more likely to be adaptive, scalable and fault-tolerant than most approximation algorithms.
Generally, an adequate general-purpose distributed scheduling system should overcome these needs or challenges to leverage the promising potential of any of the distributed systems, providing high-performance services.
There are instances of many projects that claim to have addressed the resource or data-intensive scientific applications scheduling problem. Systems such as Globus (Foster, 2006), Netsolve, (Casanova and Dongarra, 1997), Legion (Chapin et al., 1999), the emergent cloud computing Schedulers (Nurmi et al., 2009; Buyya et al., 2009), and OpenNebula (Sotomayor et al., 2009), present resource management architectures that support resource discovery, dynamic resource status monitor, resource allocation, and job control. These architectures make it easy to create a high-level scheduler. Legion also provides a simple, default scheduler. However, they usually rely on knowing the state of every computational resource, thereby paying much attention to low-level scheduling information and neglecting individual application requirements. But Dail et al. (2000), show that schedulers with special knowledge of the individual application can easily outperform a default scheduler that lacks this knowledge.
Equipping schedulers with too much static information limits their scalability, as in the case of Condor (Tannenbaum et al., 2002), and supplying insufficient information limits
12
their scheduling capabilities, as in the case of BOINC (Anderson, 2004). These problems can be overcome with a decentralized solution that relies only on the dynamic resource information coming from the resource provider end (example: available number of processor per node, processor speed, memory, and associated bandwidth) and the individual application resource requirement (example: total memory and minimum bandwidth). However, under dynamic scenario, supplying schedulers with fine-grained application and resource description, guide the scheduler to obtain much better matches with the underlying resources, and hence more accurate performance predictions.
Other additional works that have addressed the aforementioned issues to some certain degree include (Kim et al., 2008; Basu et al., 2009a; Rahman et al., 2010, and Anderson, 2004). However, none of these projects addressed the scheduling problem from the perspective of both application and resource description granularity simultaneously.
1.4 Research Aim and Objectives
The main aim of this research is to develop a broad-spectrum scheduling system architectural design pattern, that supports the scheduling of diverse multi-component applications in a distributed computing system environments. Similarly, the research objectives are to:
i. Propose a general-purpose scheduling framework which consists of a resource model, an application model, a performance model and a scheduling policy based on the concept of object-oriented and multi-agent design patterns.
ii. Develop a Neural Network based Multi-agents intelligent resource selection framework that provides a common resource selection service for different kinds of application.
13
The design goal in this case is to provide quality of service (QoS) to both users’ applications and resource owners, by reaching a compromise between their necessities.
iii. Propose novel job forwarding and resource discovery methods, based on the concept of peer-to-peer computing paradigm. Here, the concept of routing indices document discovery technique is introduced to address the job query forwarding and resource discovery problems, in a typical multi-agent distributed scheduling system environment.
iv. Implement and evaluate the developed architectural framework using realistic application scenarios to support the applicability of the proposed scheduling structure in real-world scheduling problems.
A general-purpose scheduling framework suitable for all kinds of distributed systems environment comprises four models: application model, resource model, performance model, and scheduling policy. An application model extracts the characteristics of applications to be scheduled from the job description file. A resource model describes the characteristics of the underlying resources inside the distributed systems. A performance model is responsible for predicting the performance potential of a schedule. The prediction is usually based on the prediction of the behaviour of a specific job on a specific computational resource. Scheduling policy is responsible for deciding how applications should be executed and how resources should be utilized.
1.5 Methodology
In this dissertation, the focus is to understand the existing state-of-the-art scheduling systems in distributed computing system environments, and then, apply that understanding to design and simulate a better general-purpose scheduling system. However, throughout the course of this study, the following research processes were applied to achieve the dissertation research objectives:
14
a. Review related literature and state-of-the-art in distributed scheduling systems.
b. Study the concept of design patterns in object-oriented software development and multi-agent technology.
c. Use the insight gained from (a) and (b) to develop a general-purpose scheduling framework.
d. Study some mathematical concepts within the scope of basic set theory and apply this knowledge to model resource selection, configuration, and mapping.
e. Study communication patterns in peer-to-peer system and apply the same concept in the agent interaction for the proposed scheduling system.
f. Implement the general-purpose scheduling framework developed
g. Use a standard benchmark workload to simulate and evaluate the proposed scheduling models on a targeted distributed system framework.
1.6 Research Contribution to Knowledge
The main contribution of this research is the conceptualization and application of object-oriented design patterns and multi-agent technology to the design of a general-purpose scheduling model. Then use the general model to propose a novel, more decentralized, distributed scheduling architecture, which addresses the limitations of the existing technologies. An intelligent resource selection framework that provides a common resource selection service for different kinds of applications is also presented. In general, the contributions are summarised as follows:
a. Presentation of a framework model for a decentralized general-purpose scheduling architecture, which addresses the issues of poor system design, heuristic-based implementation and limited system scalability.
15
b. Presentation of a scheduling design pattern architecture, that shifts scheduling logic to intelligent agents.
c. Design and simulation of an intelligent resource selection framework that maximizes resource utilization and performance of distributed systems.
d. Implementation and evaluation of the proposed algorithmic framework to compare its scalability and versatility with previous researches.
1.7 Dissertation Outline
The rest of this dissertation is organized into six chapters. In Chapter two, review on related literatures which includes background information, scheduling algorithms, resource management systems, scheduling methods, object-oriented software design patterns, object-based storage system, multi-agent technology and peer-to-peer communication models are presented.
In Chapter three, a description of the proposed scheduling model in terms of objects and methods is presented. An object-oriented design pattern for the scheduling model and the translation of the scheduling components into autonomous multi-agent architectural framework is also presented. The Chapter also covers description on the communication patterns used by scheduling agent types to convey messages across different scheduling layers using a decentralized peer-to-peer network overlay infrastructure. The last section summarizes the achievements made so far.
Chapter four describes models for expressing resource selection, configuration, and mapping using multi-agents coordination process. A training pattern for the scheduling agent types using Artificial Neural Network is discussed. Subsequently, a description of the general-purpose resource selection model, performance model and mapping strategy of the cluster application used in the dissertation case study are presented.
16
In Chapter five, the implementation of the general-purpose resource selection framework and the agent-based scheduling system is achieved, by simulating each stage in the scheduling process. The research findings, results and discussions are also discussed in this chapter. Finally, the work is concluded and the future work is presented in Chapter six. This last chapter is followed by the list of references cited through the dissertation and a list of author‘s publications.
17
IF YOU CAN'T FIND YOUR TOPIC, CLICK HERE TO HIRE A WRITER»