Coinbase reviews the May outage incident: AWS cascading failure exposes architectural risks

By: rootdata|2026/06/02 04:45:00
0
Share
copy

Coinbase released a retrospective report on the large-scale service interruption event on May 7, 2026.

The outage lasted approximately 8 hours, with full recovery taking about 12 hours. During this time, trading, deposits, withdrawals, and most core services were unavailable or severely degraded. Coinbase stated that the outage was caused by multiple cooling units failing simultaneously in the cooling system of a data center in one availability zone (use1-az4) in the AWS us-east-1 region, triggering cabinet thermal protection shutdowns, which led to EC2 instances and EBS volumes going offline, affecting multiple internet services.

During the recovery process, the Coinbase trading matching engine lost quorum due to the cluster architecture deployed in a single AWS data center losing most nodes. It required urgent code adjustments and the reconstruction of a new node group to restore operation, gradually restarting market trading during the recovery.

Additionally, the AWS-managed Kafka (MSK) service experienced control plane failures, preventing the automatic re-election of partition leaders, further blocking quotes, fees, and some settlement and data flow systems, which expanded the overall impact.

After manual partition migration in collaboration with the AWS engineering team, the system gradually returned to normal. Coinbase stated that this incident exposed its shortcomings in cross-availability zone automatic switching capabilities and disaster recovery for managed middleware. The company will upgrade its cross-region hot backup architecture, strengthen regular failure drills, and migrate the Kafka system from dual availability zones to a three availability zone deployment, while also working with AWS to advance root cause fixes and improvements.

-- Price

--

You may also like

A Perspective on the Indian Cryptocurrency Market: Descending into Silence or Moving Towards Maturity?

The Indian cryptocurrency industry has not gone silent; it is steadily maturing towards diversification.

It took me a year to see the painful truth about Agent payments

Among the four major tracks of Agent purchasing, Agent API, Agent inter-payment, and Agent finance, currently only Agent finance has real users and willingness to pay. But worse than having no demand is that the real competition point has never been payment...

Morning News | Bitmine issues preferred shares to raise $300 million; Polymarket accuses Kalshi of industrial espionage

Overview of Important Market Events on June 4th

Privacy coin trust crisis! ZEC plummets over 56% in a single day

The recent increase in ZEC is nearly 3 times, and the vulnerability news may have just provided an opportunity to exit.

Who is leading the price discovery in the cryptocurrency market? Measured delays on platforms like Binance and Hyperliquid

There is a saying circulating on crypto Twitter: Hyperliquid has replaced Binance and become the center of crypto price discovery. Arrakis conducted a cross-platform test using the tick-by-tick transaction data from 29 perpetual markets, and the truth lies within milliseconds.

Anthropic launches IPO: Business miracle or valuation bubble?

Human economy is transitioning from a carbon-based drive to a dual-engine drive of carbon-based and silicon-based, which is what is truly happening behind Anthropic's IPO.

Contents

Popular coins

Latest Crypto News

Read more
iconiconiconiconiconiconicon
Customer Support:@weikecs
Business Cooperation:@weikecs
Quant Trading & MM:bd@weex.com
VIP Program:support@weex.com