Blockchain Data Analytics For Dummies. Michael G. Solomon
alt="Illustration depicting the contents of an Ethereum transaction containing the following fields: Nonce, Signature, Gas price, Gas limit, To, Value, and Limit."/>
FIGURE 3-6: Contents of an Ethereum transaction.
Ethereum transactions contain the following fields:
Nonce: Each Ethereum account keeps track of the number of transactions it executes. This field is the latest transaction, based on the account’s counter. The transaction nonce is used by the network to ensure that transactions are executed in the proper order.
Signature: The digital signature of the account owner, proving the identity of the account requesting this transaction.
Gas price: The unit price you're willing to pay to execute this transaction.
Gas limit: The maximum total amount you're willing to pay to execute this transaction.
To: The address of the recipient of this transaction. For transfers, the To address is the account that will receive the transfer. For calling functions, the To address is the address of the smart contract.
Value: The total amount of ether you want to send to the recipient.
Data: The actual data submitted as the transaction body. Each type of transaction may have different data, based on its functionality. For calling functions, the data might contain parameters.
As users submit transaction requests to nodes, the nodes create transactions and submit them to the transaction pool. Miners then pick transactions from the pool and build new blocks. After an Ethereum mining node constructs a block, it starts the mining process. The first miner to complete the mining process adds the block to the blockchain and broadcasts the new block to the rest of the network.
You can look at the public Ethereum blockchain at any time by going to Etherscan at
https://etherscan.io/
. Etherscan lets you see blockchain statistics as well as block and transaction details.
Decoding block data
Etherscan presents blockchain data in a readable format. But in doing so, it hides some important details. Blockchain data isn’t always stored in a format that is easily readable, at least to most people. For many reasons beyond the scope of this book, blockchain implementations store some data as a hash, not in a raw format. Storing data as hash values makes common querying and analytics operations more difficult than interacting with databases.
Each type of blockchain data has nuances in the way its data is formatted and stored. For example, a transaction’s input data value in its raw format, shown in Figure 3-7, isn't very helpful. You can see this by clicking or tapping View Input As ⇒ Original.
FIGURE 3-7: Original format of input data.
Etherscan can decode the input data for you. Click or tap the Decode Input Data button and Etherscan will try to translate the input data into easy-to-read input parameters for the called function. Figure 3-8 shows successfully decoded data for the cancelOrder()
function. (In Figure 3-4, you saw that this transaction calls the cancelOrder()
smart contract function.)
FIGURE 3-8: Decoded data for the cancelOrder()
function.
You don't get this level of detail in every transaction. This transaction called a function in a registered smart contract. Registering a smart contract means that the developer submitted the application binary interface (ABI) for the contract, along with the compiled bytecode. An ABI is a definition of a smart contract’s state data, events, and functions, including each function’s input and return parameters. Etherscan uses the ABI, if it is available, to provide more descriptive information. If the ABI is not available, Etherscan can display only the raw input data.
If you explore the Etherscan page, you’ll notice the Event Logs, State Changes, and Comments tabs. I don’t cover those here, but I do revisit them in Chapter 6. Transaction data isn’t the only data you’ll encounter in a blockchain application. Smart contract developers commonly use events to log notable actions in a smart contract. Data from these events are often of interest in the data analysis process. You’ll see this type of data again.Categorizing Common Data in a Blockchain
You’ve already seen most of the types of data you’ll use when carrying out blockchain analytics. You’ve seen block header data, basic transaction data, and details contained in some transactions. You may have investigated the Etherscan user interface to view some event data, and even the effect a transaction has on the blockchain state. In this section, you learn more about the main categories of blockchain app data: transaction, events, and state.
Serializing transaction data
The core of blockchain data is contained in the transaction. A blockchain transaction records the transfer of some value from one account to another account. Additional information may be in the transaction, such as input data that records smart contract parameters, but not every transaction includes additional data.
Each transaction does include a timestamp showing the date and time the transaction was mined, so you can create a chronological list of transactions and see how value changed ownership at specific points in time and how value moved among accounts. This movement is serial. The serial nature of data storage can yield interesting information but can also be an obstacle to analyzing the data.
Unlike traditional data storage systems such as relational databases, final tallies or balances often have to be calculated over time. A traditional database can store the current balance of an account, while you may have to trace all blockchain transactions for an account to arrive at its final balance. The data is available, but it may take more work to get to it.
Blockchain gives you the flexibility of tracing transactions by account but doesn’t always make it easy to query a single value. For example, suppose you want to know the balance of a specific account on a specific date. Finding the current account balance is easy, but finding the balance as of a specific date (and time), requires serializing the transactions for that account and calculating account increases and decreases up to the date and time in question.
If you're comfortable with databases and applications that access database data, searching transactions doesn’t sound like such a bad thing to do. However, remember that a blockchain is not a database. The data in a blockchain is not stored in a manner that makes general purpose queries easy and fast. You can get the information you want, but you have to think about the effort to get that data in a different way.
The serialized transaction storage of blockchain data does provide the flexibility to trace and retrieve activity data in several ways. Here are a few types of queries you can satisfy by tracing blockchain transactions:
Find all transactions in which a specific account sent funds.
Find all transactions that resulted in a specific account receiving funds.