On July 19th 2024, an estimated 8.5 million computers around the world used by large enterprises showed the Blue Screen of Death (BSOD). Airports, Hospitals, Banks, Government services, and other institutions came to a grinding halt. The CrowdStrike outage, prompted by a faulty software update, serves as a stark reminder of the critical importance of robust product testing and quality assurance that should have been a no-brainer for product companies. Four days later, while the specifics of the update that caused this crash still remain debated on the internet, it compels us to examine how rigorous testing practices can serve as powerful prophylactic measures against such breaches. This isn’t just about preventing bugs; it’s about safeguarding reputations and maintaining the trust that underpins our digital interactions. This exploration delves into the crucial role of product testing, using the CrowdStrike incident as a lens through which to examine potential vulnerabilities and proactive safeguards.

The High Stakes of Software Quality assurance: Lessons from the CrowdStrike Incident  

Since Crowdstrike Falcon is designed to protect cloud, identity and endpoint workloads in a kernel module, it inherently enjoys high privileges, causing bad code for the entire OS to crash.

Enterprises across the globe would spend tens of thousands of dollars in overtime on average, not to mention lost productivity because it is absolutely normal for them to have physical servers across the country, which would require an on-site visit for sys admins to manually fix.

Dissecting the Anatomy of Software Weaknesses

Software vulnerabilities, often likened to chinks in a digital fortress’s armor, frequently arise from oversights or errors during the development process. These frailties can manifest in a myriad of forms, ranging from insecure coding practices to inadequate validation of user input.

Building a “Culture of Quality” for product companies

If the past few days have taught product companies anything, it is that there is no cutting corners on Quality Assurance. Developing a “Culture of Quality” and allowing QA teams to play a recommendation and risk assessment advisory role to the Business Leaders is non-negotiable. At the risk of making the engineering team (and leadership) squirm, get sign off in writing on acceptance of risks if certain mitigations are being ignored. Oh, and definitely, never release fresh updates on a Friday.

A comprehensive product testing regime, encompassing a multifaceted approach, is paramount in ensuring software integrity.  

1. Deployment Testing

This specialized form of testing requires updates to be pushed in a controlled, limited number of systems on different versions of servers to ensure that there is no failure before deploying to all systems. Make sure to ramp up in a phased manner, from1%, 5%, 10% and then more, especially when deploying kernel level changes. Ensure the release process includes a combination of automated testing, manual testing, and regression testing to confirm that new updates do not interfere with existing functionalities.

2. Regression Testing

As software undergoes modifications and updates, regression testing emerges as an indispensable practice to ensure that new code changes do not inadvertently introduce new vulnerabilities or disrupt existing functionalities. This iterative testing process helps maintain the stability and security of software applications over time.

3. Testing in Production-Like Environments

Use staging environments to replicate real-world production setups as closely as possible during testing is crucial to uncover vulnerabilities that might not manifest in controlled development settings. Employing realistic data sets, network configurations, and user behaviors enhances the effectiveness of testing procedures. If your pre-prod environment is not built like production, push for proper rectification of this issue before going live.

Conclusion  

The Crowdstrike outage serves us a clarion call to ensure that SaaS companies create a culture that values and prioritizes Quality Assurance and adequate software testing before releasing products or upgrades. It has also shone a light on the increasing dependency of IT systems in today’s connected world.