From Traditional Fault Tolerance to Blockchain. Wenbing Zhao

From Traditional Fault Tolerance to Blockchain

and M. V. Steen. Distributed Systems: Principles and Paradigms. Prentice Hall, 2nd edition, 2006.

11. A. S. Tanenbaum and D. J. Wetherall. Computer Networks (5th Ed.). Pearson, 2010.

12. L. Tewksbury, L. Moser, and P. Melliar-Smith. Live upgrade techniques for corba applications. In New Developments in Distributed Applications and Interoperable Systems, volume 70 of IFIP International Federation for Information Processing, pages 257–271. Springer US, 2002.

2
Logging and Checkpointing

Checkpointing and logging are the most essential techniques to achieve dependability in distributed systems [7]. By themselves, they provide a form of fault tolerance that is relatively easy to implement and incurs low runtime overhead. Although some information could be lost (if only checkpointing is used) when a fault occurs and the recovery time after a fault is typically larger than that of more sophisticated fault tolerance approaches, it may be sufficient for many applications. Furthermore, they are used in all levels of dependability mechanisms.

A checkpoint of a distributed system refers to a copy of the system state [7]. If the checkpoint is available after the system fails, it can be used to recover the system to the state when the checkpoint was taken. Checkpointing refers to the action of taking a copy of the system state (periodically) and saving the checkpoint to a stable storage that can survive the faults tolerated.

To recover the system to the point right before it fails, other recovery information must be logged in addition to periodical checkpointing. Typically all incoming messages to the system are logged. Other nondeterministic events may have to be logged as well to ensure proper recovery.

Checkpointing and logging provide a form of rollback recovery [7] because they can recover the system to a state prior to the failure. In contrast, there exist other approaches that accomplish roll-forward recovery, that is, a failed process can be recovered to the current state by incorporating process redundancy into the system. However, roll-forward recovery protocols typically incur significantly higher runtime overhead and demand more physical resources.

2.1 System Model

In this section, we define the system model used in the checkpointing and logging algorithms introduced in this chapter. The algorithms are executed in a distributed system that consists of N number of processes. Processes within the system interact with each other by sending and receiving messages. These processes may also interact with the outside world by message exchanges. The input message to the distributed system from the outside world is often a request message sent by the user of the system. The output message from the system is the corresponding response message. An example distributed system consisting of 4 processes is shown in Figure 2.1.

Schematic illustration of an example distributed system.

Figure 2.1 An example distributed system.

2.1.1 Fault Model

In such a distributed system, a failure could occur at a process. However, it is assumed that when a process fails, it simply stops execution and loses all its volatile state (i.e., the fail-stop model [18] is used). In addition, it is assumed that any two processes can establish a reliable connection (such as a TCP connection) for communication. Even though the network may lose messages, the reliable channel can effectively mask such losses. Naturally, the reliable connection ensures the first-in first-out (FIFO) property between the two endpoints of the reliable connection. This assumption also implies that the network does not partition, i.e., it does not prevent two or more processes in the system from interacting with each other for extended period of time.

2.1.2 Process State and Global State

The state of an individual process is defined by its entire address space in an operating system. A generic checkpointing library (such as Condor [23]) normally saves the entire address space as a checkpoint of the process. Of course, not everything in the address space is interesting based on the application semantics. As such, the checkpoint of a process can be potentially made much smaller by exploiting application semantics.

The state of a distributed system is usually referred to as the global state of the system [5]. It is not a simple aggregation of the states of the processes in the distributed system because the processes exchange messages with each other, which means that a process may causally depend on some other processes. Such dependency must be preserved in a global state. Assume that each process in the distributed system takes checkpoints periodically, this implies that we may not be able to use the latest set of checkpoints for proper recovery should the processes fails, unless the checkpointing at different processes are coordinated [5]. To see why, considering the three scenarios illustrated in Figure 2.2 where the global state is constructed by using the three checkpoints, C₀, C₁, C₂, taken at processes P₀, P₁, and P₂, respectively.

Figure 2.2(a) shows a scenario in which the checkpoints taken by different processes are incompatible, and hence cannot be used to recover the system upon a failure. Let’s see why. In this scenario, P₀ sends a message m₀ to P₁, and P₁ subsequently sends a message m₁ to P₂. Therefore, the state of P₂ potentially depends on the state of P₁ after it has received m₁, and the state of P₁ may depend on that of P₀ once it receives m₀. The checkpoint C₀ is taken before P₀ sends the message m₀ to P₁, whereas the checkpoint C₁ is taken after P₁ has received m₀. The checkpoints are not compatible because C₁ reflects the receiving of m₀ while C₀ does not reflect the sending of m₀, that is, the dependency is broken. Similarly, C₂ reflects the receiving of m₁ while C₁ does not reflect the sending of m₁.

Schematic illustration of consistent and inconsistent global state examples.

Figure 2.2 Consistent and inconsistent global state examples.

EXAMPLE 2.1

To understand the problem better, consider the following example. Assume that P₀ and P₁ represent two bank accounts, A and B respectively. The purpose of m₀ is to deposite $100 to account B after P₀ has debited account A. P₀ takes a checkpoint C₀ before the debit operation, and P₁ takes a checkpoint C₁ after it has received and processed the deposit request (i.e., m₀), as illustrated in Figure 2.2(a). If P₀ crashes after sending the deposit request (m₀), and P₁ crashes after taking the checkpoint C₁, upon recovery, P₁’s state would reflect a deposit of $100 (from account A) while P₀’s state would not reflect the corresponding debit operation. Consequently, $100 would appear

Скачать книгу

From Traditional Fault Tolerance to Blockchain. Wenbing Zhao