Jump to content

They say software will eat the world. Here are some software bugs that took a stab at it


nir

Recommended Posts

Well, you know what we mean. Variable quality comes with increasing quantity

Finding bugs in code

 

Analysis "On the afternoon of Tuesday, September 25, our engineering team discovered a security issue affecting almost 50 million accounts," said Facebook's Guy Rosen in a security update in September.

 

The issue was serious. What was stolen was not passwords but access tokens, an opaque string that identifies a user and grants access to an API, a software interface used by apps to access Facebook on behalf of users.

 

Having the tokens enabled attackers to log in as other users. Worse still, it was also possible for the miscreants to log in on other sites using Facebook authentication, though it is not known if this happened, nor who was behind the attacks.

 

The cause of this security disaster was a programming blunder, or rather, a "combination of three bugs" according to Pedro Canahuati, vice president of engineering. The first was a bug in what was meant to be a read-only API, which allowed a video to be posted. The second was a bug in the video upload API, that generated an access token with broad privileges (the permissions of the Facebook mobile app). The third and perhaps most serious was a bug that generated the access token not for the current user, but for another user whose profile was being viewed.

 

The incident demonstrates that even the best-resourced software projects have bugs, and that bugs that do not immediately impact functionality are hard to spot. Testing is integral to software development and deployment today, but no test can exercise every possible combination of input, or account for every possible failure.

 

This case also shows how the impact of software bugs today can be magnified by the huge usage of certain pieces of code. In this instance, we're talking about fifty million accounts, with potential access to an unknown number of third-party sites.

 

Another recent example: Microsoft released the October 2018 feature update for Windows 10 with insufficient testing and included a bug that in certain cases deleted user data. It was deployed to thousands of users before being pulled. The reputational, support and remediation costs are considerable.

 

At the, well, heart of the Heartbleed bug discovered in 2014 was the following single line of code in the OpenSSL encryption library:

memcpy(bp, pl, payload);

The code does not check that the actual size of the data matches the amount being copied. The attacker can lie about the size and receive a copy of other data that happens to be in memory, perhaps including usernames and passwords. The bug impacted the Apache and Nginx web servers, used by two thirds of the world's active internet sites, as well as countless other network applications, via OpenSSL.

One line forward, two lines back

Venture-capitalist Marc Andreessen made headlines in 2011 when he reckoned software was eating the world. That would be fine if it weren't for the fact that two decades before, coder and author Steve McConnell reckoned on an industry average of 15 to 50 bugs per 1,000 lines of code with the Standish Report in 1995 saying one third of software projects got cancelled - wasting $81bn per year. McConnell and Standish were ghosts at the party of software delivery, further cementing the practice's reputation for poor quality. Software projects were expected to run late and contain bugs by default. It was a surprise if something came in on time, on budget and worked as intended.

 

That was the 1990s and software tools and development practices and cultures have changed a lot since then.

 

Prominent examples of positive changes include test-driven development and the concept of code coverage (the amount of code executed during testing) that emerged from Extreme Programming methodology in the nineties. Static code analysis is another, with tools that enforce coding standards as well as catching obvious bugs. Coding standards themselves have evolved, incorporating knowledge on what makes code more reliable, such as keeping code units short.

 

Safer programming languages have also helped. High-level languages with automatic memory management and no direct use of pointers, such as Java, first released in 1996, have made it easier for developers to avoid some errors. More recently, software lifecycle management tools, for teams of any size, are now simple to adopt thanks to cloud-hosted tools such as GitHub.

 

Are developers keeping their code clean as result?

 

As software has become more pervasive, the answer should be "yes". Software is no longer just the preserve of the workplace - Windows PCs and servers - or written for obscure industrial control systems, while software bugs risk taking down more than simply the company's HR server.

 

A glance down the Mitre Common Weakness Enumeration (CWE) List for 2017 is not especially encouraging. Top of the list is injection attacks, including SQL injection, where carefully crafted user input can execute unauthorized commands. It has been well known for years, but still causes problems.

 

Another area referenced by the CWE list is "using components with known vulnerabilities". Software developers have always depended on libraries; most of the code in any project is written by other people. Libraries and components are not immune from bugs though, as Heartbleed demonstrated. When a bug in a library or component is discovered, it will be fixed.

 

Patching all the applications that used the buggy version though is more than challenging. Many impacted applications will no longer be actively developed. Others may reside in firmware for network appliances that will never be patched. The Internet of Things becomes the Internet of Bugs, unless automatic patching exists, along with scrupulous patch management by Thing vendors.

 

Bugs can be expensive, and bugs can kill. Carnegie Mellon University Professor Phil Koopman specialises in embedded software quality including safety-critical areas such as self-driving cars and other automotive software. In a recent post about potentially deadly automotive software defects he lists more than 50 reports of disturbing defects such as unintended acceleration, cruise control which will not disengage, and power steering preventing the driver from controlling the vehicle.

 

Step one: Reduce code complexity

Koopman makes the point that improving software quality is largely a matter of observing best practice.

 

These include reducing code complexity, using static analysis tools and compiling with zero warnings, rigorous checking of real-time code scheduling, satisfactory software testing, and use of basic tools including configuration management, version control and bug tracking.

 

CAST Research Labs reported in 2017 on application software health, based on 1,850 "large, multi-layer, multi-language business applications" across 329 organisations and eight countries – more than a billion lines of code. The report is based on five health factors: robustness, security, efficiency, changeability (how difficult it is to modify the code) and transferability (how difficult it is to understand for a developer new to the code).

 

The CAST report is encouraging, in that overall mean scores were at 3.0 or above for all categories, on a scale of 0 to 4, with security best at 3.22 and transferability worst at 3.0. This decent level of quality reflects the fact that these are in general mission-critical applications in well-resourced sectors.

 

The conclusions are still worth reading. Security scores had a wide variation, so lack of secure coding practice remains a problem. In terms of team sizes, CAST reckons teams of more than 20 developers achieved poor scores and suggests optimal team size is more like 10 people or fewer.

 

Another point of interest is that the methodology behind the best scoring projects was neither Agile (emphasis on iterative development) nor Waterfall (emphasis on up-front planning), but a hybrid approach with extensive up-front analysis followed by short iterative coding sprints.

 

CAST also makes the point that software architecture that involves "multiple components spread across several layers of the application" is harder to test than code quality at the level of a class or method; but it is structural quality that accounts for most operational problems.

Slow coding class

Writing bulletproof code is slower and therefore more expensive at the time of development, which is one reason why software quality remains so variable. It is well known that the cost of fixing defects increases the later it is found, though putting generic figures on how much difference it makes is difficult as it varies greatly.

 

In a carefully architected DevOps process for a web application, where a code change can be made, tested automatically and deployed into production rapidly, the cost of fixing a bug found late may not be too bad. At the other extreme is a case like the 1996 explosion on launch of the Ariane 5 rocket, caused by a 64-bit variable being converted to a 16-bit variable when the number was too large to fit. The immediate cost of the bug was around $500m.

 

An extreme example, but what about the everyday? So-called "poor-quality" software is costing the US economy $2.26 trillion - after you remove technical debt according to the Consortium for IT Software Quality.

Breaking that down, losses from software failures account for 37.46 per cent of that figure, with the task of finding and fixing defects accounting for 16.87 per cent. The figures are derived from, among other factors, lost business and wasted investment.

 

Identifying and fixing a bug before it makes it through production is a priority, as the cost involved in fixing it or dealing with its aftermath increases the longer it lives on.

 

Bugs that make it through to production can therefore have severe long-term costs. If the bug is in software or firmware on which other software depends, such as an API, then third-party developers may have to code around the bug. Then a compatibility problem appears, since fixing the bug may break that other software.

One key difference between today's internet-connected world, and the early days of software development, is that deploying patches is easy and generally automated. Tracking down bugs that have made it into production is also easier, thanks to techniques such as prompting users to submit crash data back to the developers, or "flight recorder" agents that capture the application state and log exactly what code was executing at the time of a crash. Obvious defects are mitigated by being found and fixed quickly.

 

Decades after McConnell and despite numerous changes along the way, the frustration of software quality remains this: the knowledge and tools needed to write solid code exist, but it is human factors including finance, deadlines, mis-management, skills shortages, and the challenge of dealing with legacy code and systems that means code quality remains uneven.

 

Given the huge and growing importance of software, the continuing prevalence of bugs is both sobering and disturbing. Implementing systems to minimise the burden of deploying fixes helps for sure, but it is effort to improve software quality at source that will yield the biggest benefit. ®

 

Source

Link to comment
Share on other sites


  • Replies 1
  • Views 493
  • Created
  • Last Reply

Archived

This topic is now archived and is closed to further replies.

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...