From Traditional Fault Tolerance to Blockchain. Wenbing Zhao
recovering process may receive two types of retransmitted regular messages: (1) those with a valid rsn value, and (2) those without. Because the rsn counter is part of the state checkpointed, the recovering process knows which message is to be executed next. During the recovery, the process executes the retransmitted regular messages with valid rsn values according to the ascending rsn order. This ensures that these messages are replayed in exactly the same order as they were received prior to the failure. During the replay, the process may send regular messages to other processes. Such messages are logged at the recovering process as usual and they are likely to be duplicate. This is not a concern because of the duplicate detection mechanism in place and the duplicate message handling mechanism described above.
After replaying these messages, the process is recovered to a state that is visible to, and consistent with, other processes prior to the failure. For regular messages without rsn values, the recovering process can replay them in an arbitrary order because the process must not have sent any regular message since the receipt of such messages prior to its failure.
2.3.2.4 Limitations and Correctness.
The sender-based message logging protocol described above ensures proper recovery of a distributed system as long as a single failure occurs at a time. That is, after a process fails, no other processes fail until the failed process is fully recovered. Note that the protocol cannot cope with two or more concurrent failures. If two or more failures occur concurrently, the determinant for some regular messages (i.e., the rsn values) might be lost, which would lead to orphan processes and the cascading rollback (i.e., the domino effect).
EXAMPLE 2.7
Consider a distributed system consisting of three processes P0, P1, and P2, shown in Figure 2.19. P0 sends P1 a regular message <REGULAR,k,?,mi>. After the message is fully logged at P0, P1 sends P2 a message <REGULAR,s,?,mt>. Then, both P0 and P1 crashed. Upon recovery, although P0 can resend the regular message <REGULAR,k,?,mi> to P1, however, the receiving order information rsn is lost due the failures. Hence, it is not guaranteed that P1 could initiate the correct state interval that resulted in the sending of regular message <REGULAR,s,?,mt>. P2 would become an orphan process and be forced to rollback its state.
We prove below that the recovery mechanism introduced in section 2.3.2.3 guarantees a consistent global state of the distributed system after the recovery of a failed process. The only way the global state of a distributed system becomes inconsistent is when one process records the receipt of a (regular) message that was not sent by any other process (i.e., the message is an orphan message). We prove that any regular message that is received at a process must have been logged at the sending process. For a pair of nonfailing processes, the correctness of this statement is straightforward because the sending process always logs any message it sends. The interesting case is when a nonfailing process received a regular message that was sent by a process that fails subsequently.
Figure 2.19 Two concurrent failures could result in the loss of determinant information for regular messages.
Let’s assume a process Pi fails and another process Pj receives a regular message sent by Pi prior to the failure, we need to prove that the message must have been logged at Pi either prior to its failure or will have been logged before the end of the recovery.
If Pi checkpointed its state after sending the regular message prior to the failure, the message must have been logged in stable storage and is guaranteed to be recoverable. Otherwise, the message itself would have been lost due the failure because it was logged in volatile memory. However, we prove that the message will be regenerated during the recovery.
According to the protocol, a process cannot send any new regular message before it has received the ACK message for every regular message received. The fact that the message was sent means Pi must have received the ACK message for the regular message that triggered the state interval in which the message was sent. This in turn means that the sending process of the regular message, say Pk must have received the corresponding ORDER message sent by Pi. Hence, upon recovery, Pk will be contacted by Pi and the regular message with a valid rsn value will be retransmitted to Pi. This would ensure the recovering process Pi to reinitiate the state interval in the correct order. The regular message received by Pj will be correctly regenerated and logged at Pi during recovery. This completes our proof.
2.3.2.5 Discussion.
As we have mentioned before, unlike the receiver-based pessimistic logging, performing a local checkpointing at a process does not truncate its message log because the log contains messages sent to other processes and they might be needed for the recovery of these other processes. This is rather undesirable. Not only it means unbounded message log size, but it leads to unbounded recovery time as well.
The sender-based message logging protocol can be modified to at least partially fix the problem. However, it will be at the expense of the locality of local checkpointing. Once a process completes a local checkpoint, it broadcasts a message containing the highest rsn value for the messages that it has executed prior to the checkpoint. All messages sent by other processes to this process that were assigned a value that is smaller or equal to this rsn value can now to purged from its message log (including those in stable storage as part of a checkpoint). Alternatively, this highest rsn value can be piggybacked with each message (regular or control messages) sent to another process to enable asynchronous purging of the logged messages that are no longer needed.
REFERENCES
1. L. Alvisi and K. Marzullo. Message logging: Pessimistic, optimistic, causal, and optimal. IEEE Trans. Softw. Eng., 24(2):149–159, Feb. 1998.
2. B. K. Bhargava and S.-R. Lian. Independent checkpointing and concurrent rollback for recovery in distributed systems - an optimistic approach. In Symposium on Reliable Distributed Systems, pages 3–12, 1988.
3. B. K. Bhargava, S.-R. Lian, and P.-J. Leu. Experimental evaluation of concurrency checkpointing and rollback-recovery algorithms. In ICDE, pages 182–189. IEEE Computer Society, 1990.
4. A. Borg, W. Blau, W. Graetsch, F. Herrmann, and W. Oberle. Fault tolerance under unix. ACM Trans. Comput. Syst., 7(1):1–24, Jan. 1989.
5. K. M. Chandy and L. Lamport. Distributed snapshots: determining global states of distributed systems. ACM Trans. Comput. Syst., 3(1):63–75, Feb. 1985.
6. D. Davis, A. Karmarkar, G. Pilz, S. Winkler, and U. Yalcinalp. Web Services Reliable Messaging (WSReliableMessaging) Version 1.2, OASIS Standard. http://docs.oasis-open.org/ws-rx/wsrm/200702/wsrm-1.2-spec-os.pdf, February 2009.
7. E. N. M. Elnozahy, L. Alvisi, Y.-M. Wang, and D. B. Johnson. A survey of rollback-recovery protocols in message-passing systems. ACM Comput. Surv., 34(3):375–408, Sept. 2002.
8. O. Etzion and P. Niblett. Event Processing in Action. Manning Publications, 2010.
9. S. Ghemawat, H. Gobioff, and