Distributed Computing Pearls. Gadi Taubenfeld
system functions correctly at any given time rapidly approaches zero. Such applications enable the system as a whole to continue to function despite the failure of a limited number of components.
Today’s economy involves manufacturing, distributing, and retailing of goods. However, it also has to do with creating and disseminating information, for example, publishing books, filmmaking, etc. Future economy is likely to be dominated by information. Information is a representation of knowledge, and can be represented in two ways: analog—a book that you can hold in your hand; or digital—a book that is stored as a file in your computer. The digital revolution is about converting analog information to digital information and use computer networks such as the Internet to move the digital information around. Such networks are required to be able to move the information in large quantities, everywhere, cheaply, securely, and as fast as possible.
1.3 COMPUTERS WITH MULTIPLE PROCESSORS
A processor is the brain of the computer. It is a component in a computer that interprets and execute computer program instructions and processes data. Throughout the history of modern computing, application developers have been able to rely on improved processor design to deliver significant performance improvements while reducing costs at the same time. That is, as processors got faster so did the applications. Unfortunately, increasing difficulty with heat and power consumption of the new smaller and faster processors along with the limits imposed by quantum physics has made this progression increasingly more difficult.
Until a few years ago mainstream computers were built with a single processor only. This situation has changed, as it became more difficult to increase the (clock) speed of uniprocessor computers. Hence, all microprocessor companies have been forced to bet their futures on multiple processors (also called multicores1) which resides inside a single computer.
Essentially, all computer manufacturers are now offering a new generation of multiprocessor computers where a single computer includes several processors all executing concurrently, and interact and collaborate with one another. Several computer manufacturers have been building, for many years now, costly high-end computers where each computer includes many processors,2 however relatively cheap multiprocessor computers are available today as mainstream computers and can be found in many homes.3 The switch from uniprocessors to multiprocessors is a milestone in the history of computing, and researchers have the rare opportunity to re-invent some of the cornerstones of computing.
This fundamental change in computing architecture requires a fundamental change in how such computers are programmed. Writing a concurrent application for a multiprocessor computer that takes advantage of having multiple processors to increase speed and get better performance is much more challenging and complex than programming a uniprocessor computer, and requires an understanding of new basic principles. Much of the future of computers with multiple processors will be told by how well programmers can take advantage of the new concurrent computing architecture.
1.4 SYNCHRONIZATION
Computation on computer networks like the Internet and computation on a single multiprocessor computer have many aspects in common. The key issue in both cases is the need to understand how separate computers on the Internet or, similarly, separate processors within a single computer, interact and synchronize with one another. Synchronization techniques are perceived as essential to design and support the working activities of groups of computers and processors.
Many of our daily interactions with other people involve synchronization. You and your spouse may have to synchronize on who will buy the groceries, empty the garbage can, take the kids to school, which one of you will be the first to take a shower (assuming you only have one shower at home), will take the car, or use the single computer you have. Assume that you have a cat and your neighbor has a dog and you and your neighbor are sharing a yard, then you and your neighbor might want to coordinate to make sure that both pets are never in the yard at the same time.
In these examples, synchronization is used to ensure that only one participant (and not both) will take a specific action at a given time. Another type of synchronization has to do with cooperation. You and your spouse might need to move a heavy table together to its new location (it is too heavy for just one person). A classical example of cooperation is for two camps of the same army to decide on the exact time for a coordinated attack on the enemy camp.
We point out that the use of the term synchronization in computer science is slightly more general than its use in standard English. The following quote from the Oxford dictionary explains this point, “The use of synchronize to mean coordinate or combine as in ‘We must synchronize our efforts’ is considered incorrect by some people and should be avoided in standard English.” In computer science, synchronization also means coordination. That is, synchronization between processors is classified as either contention or coordination.
1.5 WHY IS SYNCHRONIZATION DIFFICULT?
All the above examples for synchronization between people have similar examples for synchronization between computers. Synchronization is needed in all systems and environments where several processors can be active at the same time. Without proper synchronization, the integrity of the data may be destroyed if two computers update a common file at the same time, and as a result, deposits and withdrawals could be lost, confirmed reservations might have disappeared, etc. However, while achieving synchronization between humans is sometimes relatively easy, achieving synchronization between computers is challenging and difficult. The reason is that most computers communicate with each other in a very restricted way.
While humans can see and hear each other, computers, and computing devices, in general, can in most cases only read and write. So, one computer can write a note (or send a message) that the other computer will later read, but they cannot see each other. To understand the difficulty with this type of restricted communication, the next two chapters examine several simple two-person interactions where communication is restricted either to writing and reading of notes or to sending and receiving of messages.
1.6 ALGORITHMS AND PROGRAMS
The notion of an algorithm is a central notion in computer science. An algorithm is just the recipe upon which a problem is solved. It was originally used in the context of solving mathematical problems. Euclid, the famous Greek mathematician, invented sometime between 400 and 300 B.C., an algorithm for finding the greatest common divisor of two possible integers. For example, the greatest common divisor of 18 and 27 is 9. This algorithm is considered to be the first nontrivial mathematical algorithm ever devised.
The word algorithm is derived from the name of the Persian mathematician Mohammed al-Khowârizmî, who lived in Baghdad during the 9th century. Al-Khowârizmî laid out the basic algorithms for adding, multiplying, and dividing numbers, and for extracting square roots. On a computer, an algorithm is expressed as a computer program which specifies, in the exact syntax of some programming language, the computation one expects a computer to perform. A recipe for preparing a cake, which prescribes the activities needed for preparing the cake, is also an algorithm. Such a recipe can be expressed in many different natural languages.
A plan or a strategy for winning in a game or solving a puzzle is also an algorithm. Thus, throughout the book, we shall use the terms, an algorithm, a plan, a strategy, or a solution, interchangeably. In most chapters of the book, we explain fundamental concepts which involve concurrency and synchronization between computers, by examining situations and solving problems which relate to interactions between people where communication is restricted in various ways. Thus, we shall use the terms a plan or a solution, more often than we shall use the term an algorithm.
1.7 CONCURRENT AND DISTRIBUTED ALGORITHMS
A concurrent or distributed algorithm is the recipe upon which a problem is solved by more than just one computing element. Finding the largest number in a set of numbers by first dividing the set into two subsets, using two processors to find the maximum number in each subset, and then comparing these two numbers, is an example of