- Major websites faced outages due to a Cloudflare technical failure on Tuesday
- Cloudflare provides CDN, DDoS protection, DNS, and security services globally
- A latent bug triggered by a routine configuration change caused the outage
Cloudflare Outage: Millions of people were affected as major websites were disrupted on Tuesday due to a major technical failure at Cloudflare, which is a critical internet infrastructure provider. OpenAI, Elon Musk's X (formerly Twitter) and Spotify were among the affected sites, with some even inaccessible for about three hours. The outage appeared similar to those that hit Amazon (AWS) and Microsoft cloud services last month, disrupting some online services.
What is Cloudflare?
Cloudflare is a global network of servers that provides security, performance and reliability services to millions of websites and online applications. It acts as a middle layer between users and website servers.
According to its website, people use Cloudflare services to increase the security and performance of their websites and services. "Cloudflare is one of the world's largest networks. Today, businesses, non-profits, bloggers, and anyone with an Internet presence boast faster, more secure websites and apps thanks to Cloudflare," the company says.
What are the services Cloudflare provides?
The incident, which began around 12:00 UTC, impacted services that rely on Cloudflare's CDN, DDoS protection and DNS services.
- Content Delivery Network (CDN): It caches website data on global servers, reducing load times.
- DDoS Protection: It shields websites from malicious traffic.
- DNS Services: Directs users to correct IP addresses.
- Security and Firewalls: Filters out malicious requests.
Why did the outage happen? Has it been resolved?
Yes, it has. Cloudflare has since resolved the issue and said it is monitoring for errors. The US-based online services provider initially said it had been affected by a "latent bug". Cloudflare's Chief Technology Officer (CTO), Dane Knecht, said that the bug was in a service supporting bot mitigation, triggered by a routine configuration change. This caused a broad degradation of Cloudflare's network and services.
"Sharing an update on the recovery of our services. We were able to resolve the impact to traffic flowing through our network at approximately 14:30 UTC, which was our first priority, but the incident required some additional work to fully restore our control plane (our dashboard and the APIs our customers use to configure Cloudflare)," Knecht said.
In the latest update, Matthew Prince, Co-founder & CEO of Cloudflare, said that the issue was not caused by a cyber attack or malicious activity directly or indirectly.
He explained that the issue was triggered by a change to one of the database systems' permissions, which caused the database to output multiple entries into a "feature file" used by the Bot Management system. "That feature file, in turn, doubled in size. The larger-than-expected feature file was then propagated to all the machines that make up our network."
"After we initially wrongly suspected the symptoms we were seeing were caused by a hyper-scale DDoS attack, we correctly identified the core issue and were able to stop the propagation of the larger-than-expected feature file and replace it with an earlier version of the file. Core traffic was largely flowing as normal by 14:30. We worked over the next few hours to mitigate increased load on various parts of our network as traffic rushed back online. As of 17:06 all systems at Cloudflare were functioning as normal."














