Mint Explainer | Millions affected: Why cloud crashes cloud AI’s horizon

Technology India Finance October 21, 2025 0 Comments

Mint Explainer | Millions affected: Why cloud crashes cloud AI’s horizon

As companies move massive AI workloads to the cloud, reliance on a handful of large cloud providers called hyperscalers—AWS, Google Cloud, and Microsoft Azure—can make the cloud a single point of failure for critical AI infrastructure. Mint explains why this risk is one software as a service (SaaS) and AI-native companies can’t afford to ignore.

What is the AWS outage all about?

On Monday, Amazon’s cloud computing arm, AWS, suffered a major global outage that impacted thousands of online platforms from social media and gaming to streaming and financial apps not only in North America but also from the UK, Australia, and India.

AWS attributed the downtime to a domain name server (DNS issue) which prevents devices like computers and smartphones from locating websites even though they are still running since the DNS system translates website names (like livemint.com) into internet protocol (IP) addresses that computers can understand.

E-commerce delivery expert ParcelHero estimates retailers across the UK, Europe and the US would have lost around $1 billion because of the global outage. While AWS says it has successfully resolved the issue, such outages raise a bigger concern as AI companies increasingly shift more training (when AI learns) and inference (uses that learning to, say, identify a cat in a new image) workloads to the cloud.

Were there similar big outages in the past?

Major cloud outages have repeatedly compromised the internet. In February 2017, an AWS outage due to an internal human error disrupted Slack and Quora, while Google Cloud experienced big outages in June 2019 and November 2021 that affected Gmail and YouTube too.

Azure experienced similar outages in 2021, but the major one was in July 2024 when a faulty CrowdStrike Falcon Sensor update disrupted 8.5 million Windows devices worldwide, impacting aviation, banking, and government systems.

How dependent are companies and governments on the cloud today?

Just three cloud services providers–AWS, Microsoft and Google–cumulatively service more than 60% of the world’s cloud infrastructure needs. April-June 2025 enterprise spending on cloud infrastructure services increased to almost $99 billion worldwide, up over $20 billion from the second quarter of 2024, as per data from Synergy Research Group. The revenue includes infrastructure as a service (IaaS), platform as a service (PaaS) and hosted private cloud services.

With generative AI (GenAI) being the major driver of this growth, cloud providers have seen their quarterly revenues jump by $36 billion since the beginning of 2023. Amazon remained dominant in the market in the April-June quarter with a 30% market share, followed by Microsoft (20%), and Google Cloud (13%), according to Synergy Research. Small cloud providers include CoreWeave, Oracle, Databricks and Huawei.

But how could AWS alone crash half the internet?

AWS may hold just 30% of the cloud infrastructure market but many globally popular apps, including social media platforms, gaming services, streaming sites, and financial apps (like Alexa, Snapchat, Venmo, Reddit, Coinbase, WhatsApp, Signal, Zoom, and Perplexity), rely on its services.

Hence, when a key region or service fails, millions of users are affected (some airlines, too, like Delta Airlines and United Airlines encountered disruptions, as per Down Detector), regardless of its overall market share. The impact is amplified by the Metcalfe’s law which underscores the network effect: services often depend on AWS application programming interfaces (APIs), databases, authentication, or DNS, meaning that even apps hosted elsewhere can break if they call AWS components.

Additionally, companies tend to consolidate workloads in a few regions or providers for efficiency and cost savings, creating single points of failure and making it seem like “half the internet” is offline.

How reliant is AI on the cloud?

Cloud AI integrates AI with cloud computing, allowing organizations to seamlessly align their day-to-day operational activities with AI tools, algorithms, and cloud services. The global cloud AI market size, which was valued at $78.36 billion in 2024, is forecast to rise from $102.09 billion in 2025 to $589.22 billion by 2032, as per Fortune Business Insights (www.fortunebusinessinsights.com/cloud-ai-market-108878).

The reason is that every major AI breakthrough relies on scaling cloud computing. AI foundation models, including OpenAI’s GPT-5, Meta’s LlaMa, Google’s Gemini and Anthropic’s Claude require cloud infrastructure for their massive computational, storage, and networking needs. AI workloads are the primary driver of infrastructure demand growth, pushing cloud providers toward specialized AI chips, containers, and services.

This is enabling the next platform shift, involving the combination of ubiquitous cloud access and embedded AI capabilities to create entirely new software categories and business models.

How to address redundancy?

Cloud providers do have a comprehensive disaster-recovery framework but many businesses choose “availability within region” and not full multi-region or multi-cloud architecture because of the associated costs and complexity involved. To reduce downtime risks in the AI era, companies must adopt a multi-layered strategy.

Relying on a single cloud provider or region creates single points of failure, so spreading workloads across multiple regions or even multiple cloud vendors can ensure continuity if one service goes down.

Critical systems—databases, APIs, and authentication services—should have active failover and redundancy, with regular testing to confirm that backups work under real-world conditions. Applications should be decoupled from any one service to prevent cascading failures, and designed to offer partial functionality rather than complete shutdown during outages. Continuous monitoring and chaos testing help identify vulnerabilities before they become critical.

However, moving workloads across vendors or regions comes with significant costs, including higher cloud bills, integration complexity, and potential data transfer fees. Companies must weigh these expenses against the risk of downtime, especially as Gen Z users and enterprises alike demand fast, uninterrupted AI services.

Mint Explainer | Millions affected: Why cloud crashes cloud AI’s horizon

What is the AWS outage all about?

Were there similar big outages in the past?

How dependent are companies and governments on the cloud today?

But how could AWS alone crash half the internet?

How reliant is AI on the cloud?

How to address redundancy?

Post Comment Cancel reply

Do Not MIss

Japan’s exports hit record high, but trade deficit continues

India, Korea sign pact for exchange of notes for NCERT’s Technical Cooperation project

India has good reason to help Trump erect a border wall

Sony PS6 launch timeline tipped: Performance, graphics and everything to expect

Travel and Hospitality sector in India projecting 8.2 pc net employment change in HY2 FY2025: TeamLease Report

India examining US trade memo; directive on bilateral trade deal positive: Report

Global tourism almost return to pre pandemic level in 2024

US President Donald Trump announces $500 billion AI initiative

Amazon Fab Phone Fest is now LIVE! Get Samsung Galaxy S23 Ultra, Honor 200, and more at up to 52% discount

Trump eyes 10 % tariff on China over fentanyl exports to Mexico, Canada

Mint Explainer | Millions affected: Why cloud crashes cloud AI’s horizon

What is the AWS outage all about?

Were there similar big outages in the past?

How dependent are companies and governments on the cloud today?

But how could AWS alone crash half the internet?

How reliant is AI on the cloud?

How to address redundancy?

Related Posts

Post Comment Cancel reply

Do Not MIss