Distributed Applications
Richard John Anthony
Reader in self-managing computer systems
University of Greenwich, UK
AMSTERDAM • BOSTON • HEIDELBERG • LONDON
NEW YORK • OXFORD • PARIS • SAN DIEGO
SAN FRANCISCO • SINGAPORE • SYDNEY • TOKYO
Morgan Kaufmann is an imprint of Elsevier
Morgan Kaufmann is an imprint of Elsevier 225 Wyman Street, Waltham, MA, 02451, USA
Copyright © 2016 Elsevier Inc. All rights reserved.
No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or any information storage and retrieval system, without permission in writing from the publisher. Details on how to seek permission, further information about the Publisher’s permissions policies and our arrangements with organizations such as the Copyright Clearance Center and the Copyright Licensing Agency, can be found at our website: www.elsevier.com/permissions.
This book and the individual contributions contained in it are protected under copyright by the Publisher (other than as may be noted herein).
Notices
Knowledge and best practice in this field are constantly changing. As new research and experience broaden our understanding, changes in research methods, professional practices, or medical treatment may become necessary.
Practitioners and researchers must always rely on their own experience and knowledge in evaluating and using any information, methods, compounds, or experiments described herein. In using such information or methods they should be mindful of their own safety and the safety of others, including parties for whom they have a professional responsibility.
To the fullest extent of the law, neither the Publisher nor the authors, contributors, or editors, assume any liability for any injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or operation of any methods, products, instructions, or ideas contained in the material herein.
Library of Congress Cataloging-in-Publication Data
A catalog record for this book is available from the Library of Congress.
British Library Cataloguing in Publication Data
A catalogue record for this book is available from the British Library.
ISBN: 978-0-12-800729-7
For information on all MK publications visit our website at www.mkp.com
To Maxine, Elaine, Darrell, my mother Ellen and in memory of my father Norman
4.11.4
6.8
6.8.1
Preface
This book provides a comprehensive introduction to designing and developing distributed applications. The main emphasis is on the communication aspects of multicomponent systems, and the ways in which the design of such systems is impacted by, and impacts on, the behavior of the underlying operating systems, networks, and protocols.
The backdrop for this book is the increasing dependence of business, and society in general, on distributed systems and applications. There is an accompanying increasing need for well-trained engineers who can deliver high quality solutions. This requires strong design skills and best practice implementation techniques as well as a big picture view in which the engineer appreciates the way in which applications will use the resources of the system and be impacted by the configuration and behavior of the host system as a whole.
An integrated approach is taken, which cuts across several traditional computer science disciplines including operating systems, networking, distributed systems, and programming, and places the required background and theory into application and systems contexts with a variety of worked examples. The book is multidimensional; it has a problem-solving style and achieves a balance between theoretical underpinning and practitioner focus through development of application use cases of distributed applications.
Through embedded practical activities, the reader actually participates in the content, performing experiments and running simulations as they go. During these activities, dynamic aspects of systems are presented in animated and dynamic ways which convey far more information, and make complex aspects of systems accessible. Most of the accompanying experiments and simulations are userconfigurable to support what-if investigation and give the reader an opportunity to gain an in-depth understanding. Practical programming challenges cover a wide range of aspects of systems. Several of these involve building full working distributed applications; these are made accessible through the provision of well-documented sample source code and clear guided tasks to add functionality and build systems by extending the sample code.
THE ORIGIN AND PURPOSE OF THIS BOOK
Designing and developing distributed applications is a hybrid topic in computer science, fundamentally based on concepts and mechanisms which are drawn from several of the traditional core subject areas which include networking, operating systems, distributed systems (theoretical, rather than developmental), and software engineering. The vast majority of high quality texts currently available focus on one of these subject areas with clear traditionally established boundaries in their scope. The majority of these books are primarily theoretical in style and approach.
At the outset of writing this book, I had been teaching a practical-oriented course in the area of distributed applications for many years and was very aware of the lack of a single book that comprehensively covered the subject matter of designing and developing distributed applications, with a strong practical emphasis. In effect, what I wanted was a standalone guide that would serve as a primary text for my own course and for others like it. I also wanted a book that would be accessible to my students, who are a diverse group with different levels of experience and confidence. I wanted, with a single
book, to encourage those who are just starting out in software engineering, while equally satisfying the needs of more experienced learners who require more advanced challenges. My course already had a theory-meets-practice emphasis which had worked well for 13 years and was well received by students. On several occasions, when discussing the lack of availability of a suitable course text book, students had suggested I write one myself based directly on the course.
This book fills a clearly identified gap in provision. It provides an integrative text that relates the various underlying concepts together in a self-contained way so that a reader can appreciate the big picture view across systems, while also understanding the key underpinning theory and being able to explore with supported practical activities, all from one self-contained source. As such, the book is differentiated from other mainstream texts which tend to map onto one of the traditional subject areas and also tend to have a more theoretical basis.
The book has been designed to support courses which teach distributed applications design with a theory-meets-practice emphasis. The main focus is on application development and the supporting knowledge necessary to ensure high quality outcomes in this regard, and has been organized so as to naturally bridge across several areas of computer science. As such, it does not attempt to develop as much breadth in those areas as a traditionally organized text (for example, just focusing on networking, or operating systems) would be expected to do. Rather, it provides the very important integration across these disciplines. The primary design is focused on providing accessible example-based coverage of key aspects of distributed systems and applications, with detailed discussion supported by case studies, interactive teaching and learning tools, and practical activities. A main goal was to facilitate readers to understand practical examples of socket-based applications and to start to develop their own applications as a guided parallel track to the reading of the book.
The theoretical aspects of the book and the majority of mechanistic aspects covered are transferrable across languages, but there are implementation aspects which have language-dependent interpretation. Sample code is therefore provided in three popular programming languages to maximize accessibility: C++, Java, and C#.
The supplemental resources code base is extensive, including sample code for the various in-text examples, starting points, and sample solutions for the end-of-chapter programming tasks, and full source code for all three of the case studies.
The supplemental resources also include special editions of the author’s established Workbenches suite of teaching and learning tools. These are used in some of the in-text activities and can also be used on a broader basis of self-guided or tutor-guided exploration of topics, or can be used to bring the subject matter to life in lectures or laboratory settings. The concept of the Workbenches was inspired by the need to represent dynamic aspects of systems in realistic and accessible ways. Any tutor who has attempted to teach scheduling (as one example of the very many dynamic aspects covered) with a series of static diagrams will appreciate the limitations of the approach in terms of the difficulty to convey the true richness of the behavior that can be exhibited. The Workbenches were specifically designed to overcome these limitations when teaching dynamic or complex aspects of systems. The user-configured practical experiments and user-configured simulations cover a wide range of topics in networking, distributed systems, and operating systems. The chapter texts link to these activities and provide guidance to the reader to map the practical learning to the underpinning theoretical concepts.
The style of the book, with its strong emphasis on guided practical exploration of the core theory, makes it suitable as a self-study guide as well as a course companion text.
THE INTENDED AUDIENCE
The book has been designed to have a very wide appeal. The target audience includes
• Teachers of distributed systems who require a self-contained course text with in-built practical activities, programming exercises, and case studies which they can use as the basis of an interesting and inspiring course.
• Students studying application development and requiring a book which links together the many different facets of distributed systems, operating systems, and networking, with numerous clearly explained practical examples and a rich repository of sample code.
• Experienced programmers who are new to designing and developing distributed applications and/ or socket programming, or perhaps need a quick-start resource to get a distributed application project off the ground.
• Trainee programmers learning any of C++, Java, or C# and requiring the additional challenge of writing network applications with sockets.
• Sockets-level programmers familiar with one of the languages supported by the book {C++, Java, C#} and requiring an example-based resource to facilitate cross-training to one of the other languages.
• Students studying other areas of computer science and requiring a basic grounding in distributed systems in the form of a self-study guide with a practical emphasis.
THE ORGANIZATION OF THE BOOK
The book has a core section of four chapters which look at the background concepts, technical requirements, challenges presented, as well as the techniques and supporting mechanisms necessary to build distributed applications. Four specific viewpoints have been chosen so as to divide the material into related categories which are significant from a design and operational perspective. This approach enables a structured and detailed examination of the underlying concepts and mechanisms of systems, which cuts across the boundaries of the traditional teaching subjects.
The following Chapter 6 is set at the higher level of distributed systems themselves. This chapter does the important job of integrating the ideas, concepts, and mechanisms discussed in the earlier core chapters into the context of entire systems, and identifies the services needed to ensure those systems are high quality in terms of their functional and nonfunctional requirements.
All of the chapters have a practical emphasis, with in-built experiments and practical exploration activities and there is a case study that runs through all of the core chapters, integrating and crosslinking them. However, to address the wide scope of architectures, structures, behaviours, and operating contexts of distributed applications, a final chapter provides two further, fully operational and clearly documented case studies accompanied by full code.
The Introduction chapter motivates the book and the integrative systems approach that has been taken. It provides a brief historical perspective on the rise of distributed computing and its significance in a modern context. It provides a short foundation of some key topics which are covered in depth later in the book, but are necessary for the reader to have a basic appreciation at the outset. This includes the general characteristics of distributed systems; the main benefits of distributed systems; key challenges that must be overcome when building distributed applications; metrics for measuring the quality and
performance of distributed systems; and an introduction to the main forms of transparency. This chapter also introduces the three case studies, the supplementary material available at the book’s companion Web site, the in-text practical activities, and the Workbenches suite of interactive teaching and learning tools.
The Process view chapter examines the ways in which processes are managed and how this influences communications at the low level. It deals with aspects such as process scheduling and blocking, message buffering and delivery, the use of ports and sockets, and the mechanism of process binding to a port which thus enables the operating system to manage communications at the computer level on behalf of its local processes. This chapter also deals with concepts of multiprocessing environments, threads, and operating system resources such as timers.
The Communication view chapter examines the ways networks and communication protocols operate and how the functionalities and behavior of these impact on the design and behavior of applications. This viewpoint is concerned with topics which include communication mechanisms and the different modes of communication, e.g., unicast, multicast, and broadcast, and the way such choices can impact on the behavior, performance, and scalability of applications. The functionality and features of the TCP and UDP transport-layer protocols are described, and compared in terms of performance, latency, and overheads. Low-level details of communication are examined from a developer viewpoint, including the role and operation of the socket API primitives. The remote procedure call and remote method invocation higher-level communication mechanisms are also examined.
The Resource view chapter examines the nature of the resources of computer systems and how they are used in facilitating communication within distributed applications. Physical resources of interest are processing capacity, network communication bandwidth, and memory. The discussion focuses on the need to be efficient with the use of these finite resources which directly impact on the performance and scalability of applications and the system itself. Memory is also discussed in the context of buffers for the assembly, sending, and receiving of messages.
The Architecture view chapter examines the structures of distributed systems and applications. The main focus is on the various models for dividing the logic of an application into several functional components and the ways in which these components interconnect and interact. The chapter also considers the ways in which the components of systems are mapped onto the underlying resources of the system and the additional functional requirements that arise from such mapping, for example, the need for flexible run-time configuration. The various architectural models are discussed in terms of their impact on key nonfunctional quality measures such as scalability, robustness, efficiency, and transparency.
The Distributed Systems chapter follows the four core viewpoint chapters. Distributed systems form a backdrop for these chapters as they each deal with a specific set of supporting theoretical aspects, concepts, and mechanisms. This chapter is set at a level higher and focuses on the distributed systems themselves, their key features and functional requirements, and the specific challenges associated with their design and development. The distributed systems chapter thereby puts the content of the core chapters into the wider systems perspective and discusses issues that arise from the distribution itself, as well as techniques to address these issues. The provision of transparency is key to achieving quality in distributed applications. For this reason, transparency is a theme that runs through all the chapters, in relation to the various topics covered, and is also a main focal aspect of the discussion of the case studies. To further reinforce the importance of transparency, it is covered as a subject in its own right, in depth in this chapter. Ten important forms of transparency are defined and explored in terms of their significance and the way in which they impact on systems and applications. Techniques to facilitate the
The Workbench-based activities are all suitable for standalone piecemeal use. For example, you may choose to only use a subset of the activities of the Operating Systems Workbench (or one of the others) to bring to life some specific aspects of the course content that students struggle with, or simply to break up a long sequence of presentation slides with a live experiment or simulation.
As a reference text with practical examples. The book covers a wide range of topics within the subject areas of distributed systems, networking, programming, and operating systems. The book is distinguished from mainstream alternatives by its extensive range of practical examples and source code resources. It therefore serves as a reference guide with a twist. For a significant proportion of the topics, you will find related guided practical activities or programming challenges with solutions or contextualization in the form of one or more of the use cases.
THE SUPPORT MATERIALS
The book is supplied with supplementary materials provided via a companion Web site. The URL is http://booksite.elsevier.com/9780128007297.
The materials are organized on the Web site in a way which maps onto the various chapters in the book. There are several types of resources which include: Sample program code. Sample code is provided for the in-text activities and examples where relevant, the use-case applications, and sample solutions for the programming exercises. The sample code can be used in several ways:
• In most cases, complete application source code is provided. This enables readers to study the entire application logic and to relate it to the explanation in the text. There are cases where the text provides brief code snippets to explain a key point; the reader can then examine the full application code to put the snippet into perspective with the application logic. Much of the sample code is provided in three languages: C++, Java, C#.
• The sample application code can be used as a starting point on which to develop solutions to the end-of-chapter resources; guidance as to which is the most appropriate resource to use is given in such cases.
• There are also some specific sample solutions to the end of chapter programming exercises, in cases where the solution is not already exemplified elsewhere in the resources.
Executable files. Many of the applications are also provided in executable form. This enables readers to run applications and study their behavior without having to first compile them. This is particularly important when following the in-text activities and examples to minimize interruption when switching between reading and the practical work.
The Workbenches teaching and learning tools. The book is accompanied by a suite of sophisticated interactive teaching and learning applications (called Workbenches) that the author has developed over a period of 13 years to promote student-centred and flexible approaches to teaching and learning, and to enable students to work remotely, interactively, and at their own pace. The Networking Workbench, Operating Systems Workbench, and Distributed Systems Workbench provide combinations of configurable simulations, emulations, and implementations to facilitate experimentation with many of the underlying concepts of systems. The Workbenches have been tried and tested with many cohorts of students and are used as complementary support to several courses by several lecturers at