From Traditional Fault Tolerance to Blockchain. Wenbing Zhao
2.15 A summary of the mean logging latency and mean end-to-end latency under various conditions.
It is somewhat surprising to see that the performance on the solid state disk is not significantly better than that on the traditional disk, especially for small messages. For large messages, the solid state disk does make the logging operations more predictable in its latency, that is, the standard deviation [12] is much smaller than that on the traditional disk, as can be seen in Figure 2.15.
Figure 2.16 Probability density function of the end-to-end latency.
The end-to-end latency results shown in Figure 2.16 prove that indeed the pessimistic logging contributes very moderate (often less than 10%) overhead to the performance of the system as observed by the client. For messages of up to 100KB, the end-to-end latency with and without pessimistic logging falls within 10ms. For small messages, the end-to-end latency can go down as low as about 100μs. In all circumstances, the end-to-end latency is significantly larger than the logging latency. For the message size of 100KB, the oneway transfer latency over the network is estimated to be around 2600μs (half of the end-to-end latency without logging). This implies that the network manages to offer slightly under 40MB per second transfer rate.
2.3.2 Sender-Based Message Logging
For distributed applications that do not wish to log messages synchronously in stable storage, the sender-based message logging protocol [13] can be used to achieve limited degree of robustness against process failures. The basic idea of the sender-based message logging protocol is to log the message at the sending side in volatile memory. Should the receiving process fail, it could obtain the messages logged at the sending processes for recovery. To avoid restarting from the initial state after a failure, a process can periodically checkpoint its local state and write the message log in stable storage (as part of the checkpoint) asynchronously.
Unlike the receiver-based message logging protocol introduced in section 2.3.1, where the relative ordering of the messages received can be implicitly logged, such ordering information (i.e., the determinant for the messages) must be explicitly supplied by the receiver of a message to the sender. Furthermore, after sending the ordering information, the receiver needs to wait for an explicit acknowledgment for the ordering message. Prior to receiving of the acknowledgment, the receiver must not send any message to other processes (however, it can execute the message received immediately without delay, similar to the optimization for pessimistic logging discussed in section 2.3.1.2. This restriction is put in place to prevent the formation of orphan messages and orphan processes [7], which would force the orphan processes to roll back their state during the recovery of another process.
An orphan message is one that was sent by a process prior to a failure, but cannot be guaranteed to be regenerated upon the recovery of the process [7]. An orphan process is a process that receives an orphan message. If a process sends out a message and subsequently fails before the determinants of the messages it has received are properly logged, the message sent becomes an orphan message.
2.3.2.1 Data Structures
In the sender-based message logging protocol, each process must maintain the following data structures:
◾ A counter, seq_counter, used to assign a sequence number (using the current value of the counter) to each outgoing (application) message. The counter is initialized to 0 and incremented by one for each message sent. The sequence number is needed for duplicate detection (at the receiving process).
◾ A table used to carry out duplicate detection on incoming messages. The table consists of a collection of entries, one for each process with which the current one communicates. Each entry has the form <process_id,max_seq>, where max_seq is the maximum sequence number that the current process has received from a process with an identifier of process_id. A message is deemed as a duplicate if it carries a sequence number lower or equal to max_seq for the corresponding process.
◾ Another counter, rsn_counter, used to record the receiving/execution order of an incoming message. The counter is initialized to 0 and incremented by one for each message received. The receiving order of a message is represented by the current value of the counter and it is sent back to the sending process of the message for logging.
◾ A message log (in volatile memory) for messages sent by the process. In addition to the message sent, the following meta data is also recorded for each message:- Destination process id, receiver_id;- Sending sequence number, seq;- Receiving sequence number, rsn.The destination process id, the sending sequence number, and the message will be logged prior to the sending of the message. However, the receiving order number will be logged after the process receives such information later.
◾ A history list for the messages received since the last checkpoint. Each entry in the list has the following information regarding each message received:- Sending process id, sender_id;- Sending sequence number, seq;- Receiving sequence number, rsn (assigned by the current process).The history list is used to find the receiving order number for a duplicate message received. Upon receiving a duplicate message, the process should supply the corresponding (original) receiving order number so that the sender of the message can log such ordering information properly.
All the data structures described above except the history list must be checkpointed together with the process state. The two counters, one for assigning the message sequence number and the other for assigning the message receiving order, are needed so that the process can continue doing so upon recovery using the checkpoint. The table for duplicate detection is needed for a similar reason. However, the saving of the message log as part of the checkpoint might appear to be counter-intuitive because a major benefit of doing checkpointing is to truncate the message log (i.e., garbage collect logged messages) for (receiver-based) pessimistic logging as described in section 2.3.1. For sender-based message logging, unfortunately this side benefit is no longer applicable. The message log is needed for the receiving processes to recover from a failure, and hence, cannot be garbage collected upon a checkpointing operation. Additional mechanism, which will be introduced towards the end of this section, is necessary to ensure that the message log does not grow indefinitely.
The reason why the history list can be garbage collected upon a checkpointing operation is because the receiving sequence number information in the list (i.e., the receiving/execution order of the messages leading to the checkpoint) will no longer be needed for failure recovery. When a process receives a duplicate message and it cannot find the corresponding receiving sequence number in the history list because it has recently checkpointed its state, it may inform the sender that the message can now be purged from its message log – it is no longer needed for failure recovery due to the recent checkpoint.
In addition to the above data structures, the protocol uses the following types of messages:
◾ REGULAR message type. It is used for sending regular messages generated by the application process, and it has the form <REGULAR,seq,rsn,m>, where m refers to the message content. Obviously, at the time of sending of a message, its receiving sequence number, rsn, would not be known to the sending process, in which case, it assumes a special constant value (such as -1) indicating the unknown status. When a logged message is replayed to a recovering process, the sending process might have already learned the rsn value, in which case, a concrete rsn value is supplied.
◾