How Arbitrum data availability works

What is the general view of Arbitrum data flow?

Arbitrum currently supports two primary data availability mechanisms:

Rollup Mode: In this mode, all transaction data is included in either the calldata of transactions submitted to the Ethereum mainnet (the parent chain) or the blobs submitted by the transaction. This inclusion ensures that all data is readily available on-chain for anyone to download and verify.

Anytrust Mode: In Anytrust mode, transaction data initially gets submitted to a group of nodes known as the Data Availability Servers (DAS). The DAS stores and distributes the data. Instead of including the entire dataset on-chain, only a cryptographic proof (Data Availability Certificate, or DACert) is submitted to the parent chain. This proof significantly reduces the amount of data stored on-chain, reducing costs.

Because of those data availability mechanisms, Arbitrum Nitro nodes synchronize their data differently than Ethereum nodes or other layer-one network nodes. While Go-Ethereum nodes utilize a sophisticated P2P network to synchronize with the Ethereum blockchain by discovering other nodes, exchanging data, and participating in the consensus mechanism, Arbitrum nodes diverge from this traditional approach.

However, Arbitrum nodes do not primarily rely on a traditional peer-to-peer (P2P) mechanism for syncing their state as many other blockchains do.

Here's how Arbitrum data flow works:

Batching and Submission:
1. The Sequencer queues transactions and batches them together.
2. These batches get submitted to the parent chain:
  1. In Anytrust mode, the Sequencer sends the batch to the Data Availability Server (DAS) and then submits the Data Availability Certificate (DACert) which is returned and generated by the DAS to the parent chain.
  2. In Rollup mode, the Sequencer submits the batch of transactions directly to the sequencer inbox contract on the parent chain. (Blob or calldata directly)
Node Synchronization:
1. Upon joining the network, a full node:
  1. In Rollup mode, data is read directly from the parent chain calldata or blobs (depending on how the Sequencer posts the data).
  2. In Anytrust mode, it reads data from the DACert to verify data availability.
2. The node continues to follow this process to catch up with the latest chain height.
3. Once caught up, the node receives updates on new Sequencer-queued messages directly from the Sequencer feed.(We will provide a detailed view of this at the next section)
Catching Up:
1. If a node falls behind the chain, it reverts to the process described in step 2 to resynchronize with the latest state.

In essence, Arbitrum nodes prioritize data retrieval from the parent chain and rely on the Sequencer for real-time updates, deviating from the traditional P2P synchronization approach used by Ethereum nodes.

How full nodes decode the data from the parent chain:

Arbitrum full nodes decode data received from the parent chain (Ethereum) to update their local state. This process involves monitoring events, parsing data, and Message Processing.

Event Querying:
1. Full nodes subscribe to the SequencerBatchDelivered event emitted by the inbox contract on the parent chain. This event signifies the arrival of a new batch of transactions.
Event Parsing:
1. Upon receiving the SequencerBatchDelivered event, the node parses the event data into a SequencerInboxBatch struct. This struct typically includes:
  1. BlockHash: The hash of the parent chain block containing the batch.
  2. ParentChainBlockNumber: The block number of the parent chain block.
  3. SequenceNumber: The sequence number of the batch.
  4. TimeBounds: Time constraints for the batch.
  5. AfterDelayedAcc: Accumulator hash after processing delayed messages.
  6. AfterDelayedCount: Count of delayed messages.
  7. rawLog: The raw event log data.
Data Serialization:
1. The SequencerInboxBatch struct serializes into a byte array.
2. The serialized data adheres to a specific format:
  1. TimeBounds.MinTimestamp (8 bytes)
  2. TimeBounds.MaxTimestamp (8 bytes)
  3. TimeBounds.MinBlockNumber (8 bytes)
  4. TimeBounds.MaxBlockNumber (8 bytes)
  5. AfterDelayedCount (8 bytes)
  6. payload (variable length)
    1. The payload field further contains the following:
      1. Type: Indicates the type of payload (e.g., DAS, blob message).
      2. Content: The actual data associated with the payload type (e.g., DACert, BlobHashes, brotli compressed data).
Data Decoding and Retrieval:
1. Based on the payload type:
  1. DAS Type: The node queries the Data Availability Service (DAS) to retrieve the raw data.
  2. Blob Message Type: The node decodes the blob message to obtain the raw data.
  3. Brotli Message Type: No extra steps are needed here; continue to the next step.
2. Data Decompression: If the raw data is Brotli-compressed, the node decompresses it. It's worth noting that the raw data we get from above i and ii might also be Brotli-compressed data.

Message Processing:

After decoding and decompressing the data, the node obtains a series of messages.

Message Types:

BatchSegmentKindL2Message	This message will contain raw data on a series of transactions. Usually, this is a single block.
BatchSegmentKindL2MessageBrotli	The message is the same as the above one, but this is brotli compressed data.
BatchSegmentKindDelayedMessages	This message contains a new delayed message read from the parent chain delayed inbox.
BatchSegmentKindAdvanceTimestamp	This message will notify STF to advance a second of the timestamp state.
BatchSegmentKindAdvanceL1BlockNumber	This message will notify STF to advance a new parent chain block number.

State Transition: Finally, the State Transition Function (STF) processes these messages, and the STF will follow the rules to execute and update the Arbitrum node's local state.

Conclusion:

This process ensures that Arbitrum nodes can trustless sync an accurate view of the chain without trusting other full nodes on the network.

How full nodes sync the data from the sequencer feed:

Once Arbitrum full nodes have caught up with the chain, they switch from initial synchronization to a real-time update mode. This switch involves receiving data from the sequencer feed, which continuously broadcasts updates about newly queued transactions.

Data Acquisition:
1. Full nodes maintain a connection to the sequencer feed or your private feed. For how to run a private feed, please refer to How to run a feed relay
2. The sequencer feed transmits data packets containing information about the latest queued transactions.
Data Decoding:
1. Full nodes decode the received data packets using the methods described in How to read the sequencer feed.
Message Processing:
1. After successful decoding, the full nodes obtain the same type of data as outlined in the previous section's Step 5.
2. Send the message to the State Transition Function (STF) and execute.
(This step is the same as the previous section's Step 5)

What is the general view of Arbitrum data flow?​

How full nodes decode the data from the parent chain:​

How full nodes sync the data from the sequencer feed:​

What is the general view of Arbitrum data flow?

How full nodes decode the data from the parent chain:

How full nodes sync the data from the sequencer feed: