From Traditional Fault Tolerance to Blockchain. Wenbing Zhao
prior to the failure by replaying the logged nondeterministic events.
2 It limits the size of the log. By taking a checkpoint periodically, the logged events prior to the checkpoint can be garbage collected.
Logging protocols can be classified into three types [7]:
◾ Pessimistic logging. A message received is synchronously logged prior to its execution.
◾ Optimistic logging. To reduce the latency overhead, the nondeterministic events are first stored in volatile memory and logged asynchronously to stable storage. Consequently, the failure of a process might result in permanent loss of some messages, which would force a rollback to a state earlier than the state when the process fails.
◾ Causal logging. The nondeterministic events (and their determinant, such as delivery order of messages received at a process) that have not yet logged to stable storage are piggybacked with each message sent. With the piggy-backed information, a process can have access all the nondeterministic events that may have causal effects on its state, thereby enabling a consistent recovery of the system upon a failure.
In both optimistic logging [21, 19, 20] and causal logging protocols [1], the dependency of the processes has to be tracked and sufficient dependency information has to be piggybacked with each message sent. This not only increases the complexity of the logging mechanisms, but most importantly, makes the failure recovery more sophisticated and expensive because the recovering process has to find a way to examine its logs and determines if it is missing any messages and often causes cascading recovery operations at other processes.
On the other hand, pessimistic logging protocols are much simpler in their design and implementation and failure recovery can be made much faster [11] (specific advantages will be elaborated in section 2.3.1 below). Therefore, our discussion will focus on the pessimistic logging techniques and there will be no further elaboration on optimistic and causal logging.
2.3.1 Pessimistic Logging
The most straightforward implementation of pessimistic logging is to synchronously log every incoming message to stable storage before it is executed at a process. Each process can checkpoint its state periodically at its own pace without the need to coordinate with other processes in the distributed system. Upon recovery from a failure, a process restores its state using the last checkpoint and replays all logged incoming messages to recover itself to the state right before it fails.
EXAMPLE 2.5
Consider the example shown in Figure 2.11. Process P1 crashes after sending message m8. Process P2 crashes after sending message m9. Upon recovery, P1 restores its state using the checkpoint C1,0. Because it will be in the state interval initiated with the receiving of message m0, messages m2, m4, and m5 will be deterministically regenerated. This should not be a problem because the receiving processes should have mechanism to detect duplicates. Subsequently, the logged message m6 is replayed, which triggers a new state interval in which m8 would be deterministically regenerated (and discarded by P0. Similar, upon recovery, P2 restores its state using the checkpoint C2,0. The restored state is in the state interval initiated by the receiving of m1, and message m3 will be deterministically regenerated and sent to P3. Again, P3 would detect that it is a duplicate and discard it. Furthermore, the logged messages m4 and m7 is replayed, causing the sending of messages m6 and m9, which will be ignored by P1 and P3.
Figure 2.11 An example for pessimistic logging.
Pessimistic logging can cope with concurrent failing and recovery of two or more processes, as illustrated in the example shown in Figure 2.11. Messages received while a process is recovering (i.e., while it is restoring its state using the latest checkpoint and by replaying all the logged messages), can be buffered and examined when the process completes its recovery. It is possible that while a process is engaging in a recovery, another process fails and recovers itself concurrently, as the above example shows. In this case, P1 would receive a duplicate message (m6) regenerated by another recovering process P2 and temporarily buffers it. P1 then would discard it as soon as it is done recovery. Similarly, P2 would receive the duplicate message m4 regenerated by P1, which will be discarded after the recovery is completed.
2.3.1.1 Benefits of Pessimistic Logging.
It is apparent that pessimistic logging has a number of very desirable characteristics:
◾ Processes do not need to track their dependencies. The relative ordering of the incoming messages to each process is naturally reflected in the log (i.e., during recovery, the messages in the log will be replayed in the order in which they are logged). Hence, the pessimistic logging mechanism is straightforward to implement and less error prone.
◾ Output commit is free with pessimistic logging. This is a great fit for distributed applications that interact with their users frequently.
◾ There is no need to carry out coordinated global checkpointing because by replaying the logged messages, a process can always bring itself to be consistent with other processes in the system. This further reduces the complexity of adding rollback recovery support to applications. Furthermore, a process can decide when it is the best time to take a local checkpoint, for example, when its message log is too big.
◾ Recovery can be done completely locally to the failed processes. The only impact to other processes is the possibility of receiving duplicate messages and discard them. Hence, the recovery is simpler and in general faster than optimistic and causal logging. The localization of failure recovery also means that pessimistic logging supports concurrent failure recovery of multiple processes.
2.3.1.2 Discussion.
There are three issues that warrant additional elaboration: reconnection, message duplicate detection, and atomic message receiving and logging.
Reconnection. A process must be able to cope with temporary connection failures and be ready to accept reconnections from other processes. This is an essential requirement for recoverable distributed system. This calls for a design in which the application logic is independent from the transport level events. This can be achieved by using a event-based [8] or document-based distributed computing architecture such as Web services [15], in conjunction with appropriate exception handling.
Message duplicate detection. As mentioned above, a process must be capable of detecting duplicate messages because it may receive such messages replayed by another process during recovery. Even though transport-level protocols such as TCP have build-in mechanism to detect and discard duplicate messages, such mechanism is irrelevant because it works only within the established connection. During failure recovery, the recovering process will inevitably re-establish the connections to other processes, hence, such mechanism cannot be depend on. Furthermore, not all application-level protocols have duplicate detection support (they often depend on the underlying transport-level protocol to do so).