Minimmit: Fast Finality with Even Faster Blocks

Achieving low-latency consensus in geographically distributed systems remains a key challenge for blockchain and distributed database applications. To this end, there has been significant recent interest in State-Machine-Replication (SMR) protocols that achieve 2-round finality under the assumption that $5f+1\leq n$, where $n$ is the number of processors and $f$ bounds the number of processors that may exhibit Byzantine faults. In these protocols, instructions are organised into views, each led by a different designated leader, and 2-round finality means that a leader’s proposal can be finalised after just a single round of voting, meaning two rounds overall (one round for the proposal and one for voting).

In this paper, we introduce Minimmit, a Byzantine-fault-tolerant SMR protocol with lower latency than previous 2-round finality approaches.  Our key insight is that view progression and transaction finality can operate on different quorum thresholds without compromising safety or liveness. Experiments simulating a globally distributed network of 50 processors, uniformly assigned across ten virtual regions, show that the approach leads to a 17% reduction in transaction latency compared to the state-of-the-art.

Joint work with Brendan Kobayashi Chou and Patrick O’Grady.

Here is the pdf.

The Pipes Model for Latency and Throughput Analysis


Abstract: Protocols for State-Machine-Replication (sometimes called `blockchain’ protocols)  generally make use of rotating leaders to drive consensus. In typical protocols (henceforth called `single-sender’ protocols), the leader is a single processor responsible for making and disseminating proposals to others. Since the leader acts as a bottleneck, apparently limiting throughput, a recent line of research has investigated the use of `multi-sender’ protocols in which many processors distribute proposals in parallel. Examples include DAG-based protocols such as DAG-Rider, Bullshark, Sailfish, Cordial Miners, Mysticeti, and variants such as Autobahn. However, existing models do not allow for a formal analysis to determine whether these protocols can actually handle higher throughputs than single-sender protocols such as PBFT, Tendermint, and HotStuff.

In this paper, we describe a very simple model that allows for such an analysis. For any given protocol, the model allows one to calculate latency as a function of  network bandwidth, network delays, the number of processors $n$, and the incoming transaction rate. Each protocol has a latency bottleneck: an incoming transaction rate at which latency becomes unbounded over the protocol execution, i.e., a maximum throughput that the protocol can handle without unbounded latency.  

With the aim of building to an analysis for state-of-the-art State-Machine-Replication (SMR) protocols, we begin by considering protocols for simpler primitives, such as Best-effort Broadcast and Reliable Broadcast. For Best-effort Broadcast, we establish a tight lower bound on latency for single-sender and multi-sender protocols when blocks are distributed without the use of techniques such as erasure coding. Perhaps unsurprisingly, a key difference between the single-sender and multi-sender approaches in this case is a factor $n$ in the point at which the latency bottleneck appears. However, for other primitives such as Reliable Broadcast,  our results may be more surprising: the factor $n$ difference now disappears, and maximum throughput for the two approaches differs by a constant factor, while multi-sender approaches will generally have latency that grows more quickly with $n$. For state-of-the-art SMR protocols, the picture that emerges is one with seemingly inherent trade-offs. If one compares single-sender protocols that use pipelining and erasure coding, such as DispersedSimplex, with DAG-based protocols such as Sailfish or Bullshark, the former are seen to have lower latency for a wide range of throughputs, while the benefit of the latter protocols is that they have  a latency bottleneck which is higher by a constant factor. 

Joint work with Kartik Nayak and Nibesh Strestha. Experiments to be added soon!!

Here is the pdf.

Frosty for partial synchrony

Snowman is the consensus protocol used by blockchains on Avalanche. Recent work of ours has shown both how to augment Snowman with a `liveness’ module called `Frosty’ that protects against liveness attacks, and also how to modify Snowman so as to be consistent in partial synchrony. Since Frosty assumes (a strong form of) synchrony, the aim of this note is to show how to modify Frosty to deal with the partially synchronous version of Snowman.

Joint work with Stephen Buttolph and Kevin Sekniqi. Here is the pdf.

From Permissioned to Proof-of-Stake Consensus

This paper presents the first generic compiler that transforms any permissioned consensus protocol into a proof-of-stake permissionless consensus protocol. For each of the following properties, if the initial permissioned protocol satisfies that property in the partially synchronous setting, the consequent proof-of-stake protocol also satisfies that property in the partially synchronous and quasi-permissionless setting (with the same fault-tolerance): consistency; liveness; optimistic responsiveness; every composable log-specific property; and message complexity of a given order. Moreover, our transformation ensures that the output protocol satisfies accountability (identifying culprits in the event of a consistency violation), whether or not the original permissioned protocol satisfied it.

Joint work with Jovan Komatovic, Joachim Neu, Tim Roughgarden, Ertem Nusret Tas

Here is the pdf (AFT 2025).

Grassroots Consensus

Grassroots platforms aim to offer an egalitarian alternative to global platforms — centralized/autocratic and decentralized/plutocratic alike. Within the grassroots architecture, consensus is needed to realize platforms that employ digital social contracts, which are like smart contracts except that they are among people not accounts and are executed by these people’s smartphones not by high-performance servers controlled by parties outside to the contract. Key envisioned grassroots platforms include sovereign democratic digital communities and federations, community banks and their grassroots cryptocurrencies, and digital cooperatives.
The grassroots architecture can benefit from a consensus protocol that is (i) quiescent, (ii) efficient during low- and high-throughput, (iii) responsive, (iv) blocklace-based, (v) UDP-ready, and (vi) grassroots. The Grassroots Consensus protocol addresses all these requirements while having competitive performance in both low- and high-throughput scenarios and being one of the most concise and elegant consensus protocols for partial synchrony. It achieves that by building on two cutting-edge consensus protocols — the quiescent high-performance Morpheus and the blocklace-based Cordial Miners, improving the latter’s dissemination protocol and making it UDP-ready, and extending the protocol with a constitution and a constitutional amendment component, making it grassroots.

Joint work with Idit Keidar and Ehud Shapiro. Here is the pdf.

Beyond Optimal Fault-Tolerance

One of the most basic properties of a consensus protocol is its fault-tolerance—the maximum fraction of faulty participants that the protocol can tolerate without losing fundamental guarantees such as safety and liveness. Because of its importance, the optimal fault-tolerance achievable by any protocol has been characterized in a wide range of settings. For example, for state machine replication (SMR) protocols operating in the partially synchronous setting, it is possible to simultaneously guarantee consistency against $\alpha$-bounded adversaries (i.e., adversaries that control less than an $\alpha$ fraction of the participants) and liveness against $\beta$-bounded adversaries if and only if $\alpha + 2\beta \leq 1$.

This paper characterizes to what extent “better-than-optimal” fault-tolerance guarantees are possible for SMR protocols when the standard consistency requirement is relaxed to allow a bounded number $r$ of consistency violations, each potentially leading to the rollback of recently finalized transactions. We prove that bounding rollback  is impossible without additional timing assumptions and investigate protocols that tolerate and recover from consistency violations whenever message delays around the time of an attack are bounded by a parameter $\Delta^*$ (which may be arbitrarily larger than the parameter $\Delta$ that bounds post-GST message delays in the partially synchronous model). Here, a protocol’s fault-tolerance can be a non-constant function of $r$, and we prove, for each $r$, matching upper and lower bounds on the optimal “recoverable fault-tolerance” achievable by any SMR protocol. For example, for protocols that guarantee liveness against 1/3-bounded adversaries in the partially synchronous setting, a 5/9-bounded adversary can always cause one consistency violation but not two, and a 2/3-bounded adversary can always cause two consistency violations but not three.  Our positive results are achieved through a generic “recovery procedure”  that can be grafted on to any accountable SMR protocol and restores consistency following a violation while rolling back only transactions that were finalized in the previous $2\Delta^*$ timesteps.

This is joint work with Tim Roughgarden.

Beyond Optimal Fault-Tolerance, pdf, (AFT 2025).

Accountable liveness

Safety and liveness are the two classical security properties of consensus protocols. Recent works have strengthened safety with accountability: should any safety violation occur, a sizable fraction of adversary nodes can be proven to be protocol violators. This paper studies to what extent analogous accountability guarantees are achievable for liveness. To reveal the full complexity of this question, we introduce an interpolation between the classical synchronous and partially-synchronous models that we call the x-partially-synchronous network model in which, intuitively, at most an x fraction of the time steps in any sufficiently long interval are asynchronous (and, as with a partially-synchronous network, all time steps are synchronous following the passage of an unknown “global stablization time”). We prove a precise characterization of the parameter regime in which accountable liveness is achievable: if and only if x<1/2 and f<n/2, where n denotes the number of nodes and f the number of nodes controlled by an adversary. We further refine the problem statement and our analysis by parameterizing by the number of violating nodes identified following a liveness violation, and provide evidence that the guarantees achieved by our protocol are near-optimal (as a function of x and f). Our results provide rigorous foundations for liveness-accountability heuristics such as the “inactivity leaks” employed in Ethereum.

Joint work with Joachim Neu, Tim Roughgarden, Luca Zanolini.

Accountable liveness, pdf, (CCS 2025).

Recover from Excessive Faults in Partially-Synchronous BFT SMR

Byzantine fault-tolerant (BFT) state machine replication (SMR) protocols form the basis of modern blockchains as they maintain a consistent state across all blockchain nodes while tolerating a bounded number of Byzantine faults. We analyze BFT SMR in the excessive fault setting where the actual number of Byzantine faults surpasses a protocol’s tolerance. We start by devising the very first repair algorithm for linearly chained and quorum-based partially synchronous SMR to recover from faulty states caused by excessive faults. Such a procedure can be realized using any commission fault detection module — an algorithm that identifies the faulty replicas without falsely locating any correct replica. We achieve this with a slightly weaker liveness guarantee, as the original security notion is impossible to satisfy given excessive faults. We implement recoverable HotStuff in Rust. The throughput resumes to the normal level (without excessive faults) after recovery routines terminate for 7 replicas and is slightly reduced by ≤4.3% for 30 replicas. On average, it increases the latency by 12.87% for 7 replicas and 8.85% for 30 replicas. Aside from adopting existing detection modules, we also establish the sufficient condition for a general BFT SMR protocol to allow for complete and sound fault detection when up to (n−2) Byzantine replicas (out of n total replicas) attack safety. We start by providing the first closed-box fault detection algorithm for any SMR protocol without any extra rounds of communication. We then describe open-box instantiations of our fault detection routines in Tendermint and Hotstuff, further reducing the overhead, both asymptotically and concretely.

Joint work with Tiantian Gong, Gustavo Franco Camilo, Kartik Nayak, and Aniket Kate.

Here is the pdf (USENIX Security 2025).

Morpheus Consensus: Excelling on trails and autobahns

Recent research in consensus has often focussed on protocols for State-Machine-Replication (SMR) that can handle high throughputs. Such state-of-the-art protocols (generally DAG-based) induce undue overhead when the needed throughput is low, or else exhibit unnecessarily-poor latency and communication complexity during periods of low throughput.

Here we present Morpheus Consensus, which naturally morphs from a quiescent low-throughput leaderless blockchain protocol to a high-throughput leader-based DAG protocol and back, excelling in latency and complexity in both settings. During high-throughout, Morpheus pars with state-of-the-art DAG-based protocols, including Autobahn.  During low-throughput, Morpheus exhibits competitive complexity and lower latency than standard protocols such as PBFT and Tendermint, which in turn do not perform well during high-throughput. 

The key idea of Morpheus is that as long as blocks do not conflict (due to Byzantine behaviour, network delays, or high-throughput simultaneous production) it produces a forkless blockchain, promptly finalizing each block upon arrival.  It assigns a leader only if one is needed to resolve conflicts, in a manner and with performance not unlike Autobahn.

Morpheus, pdf. This is joint work with Ehud Shapiro (OPODIS 25)

Snowman for partial synchrony

Snowman is the consensus protocol run by blockchains on Avalanche. Recent work of ours established a rigorous proof of probabilistic consistency for Snowman in the synchronous setting, under the simplifying assumption that correct processes execute sampling rounds in `lockstep’. In this paper, we describe a modification of the protocol that ensures consistency in the partially synchronous setting, and when correct processes carry out successive sampling rounds at their own speed, with the time between sampling rounds determined by local message delays.

Joint work with Aaron Buchwald, Stephen Buttolph, and Kevin Sekniqi.

Snowman for partial synchrony, pdf.