Starting on August 28, Maryland is hosting an on-line, security-minded programming contest that we call Build-it, Break-it, Fix-it. If you are a graduate or undergraduate student at a US-based University, I encourage you to sign up – you could win significant prizes! Otherwise, please encourage students you know to participate.
The contest web site, with details about the event, is https://builditbreakit.org/
The subject of this blog post is a bit more about the contest, but primarily it is about our motivation for running it and what it has to do with programming languages.
Building Secure Software
The news is filled with discussion of software that is supposed to be secure, but isn’t. For example:
- The gotofail bug is a coding error that ends up bypassing the SSL certificate verification stage of setting up a secure connection.
- The Heartbleed bug is a flaw in OpenSSL that would allow an attacker to send a carefully crafted “heartbeat” message during an SSL connection which could illicitly read from within the remote process’s memory, possibly stealing passwords and other valuable information.
- The Ruby on Rails YAML bug is a failure to check that input via a web request does not contain YAML-encoded code, which can be made to execute on the remote site when it is deserialized.
These bugs are due to simple mistakes. For goto-fail, a basic analysis of the program (the same kind that is done by Java to ensure local variables are initialized before they are used) would show that steps of the certificate validation procedure are “dead code.” The RoR Yaml bug is a failure to appreciate the influence afforded by untrusted input, and therefore to enforce that only certain (less influential) inputs are allowed. The Heartbleed bug is a classic failure that permits accessing outside the bounds of an array; this sort of bug is (essentially) impossible in languages like Java or Python, which are memory-safe, but seem to keep coming up in important C and C++ programs. Even for C and C++ there are tools that can analyze programs to look for these sorts of bugs (and similar tools could be used to find the unchecked user input in the RoR Yaml case).
Do you ever wonder: Why can’t we get this right? Why do people keep building software with the same kinds of defects, especially when there are known techniques for preventing them? Programming languages enthusiasts may roll their eyes more than most, since many analyses and languages to find or prevent security bugs are developed by the PL community.
The need for evidence
One reason that people may not be employing PL techniques to find/prevent security bugs is that there is a lack of empirical evidence to say they are worth it. I suspect few would dispute that the languages, analyses, and processes I mentioned above, and more sophisticated ones like formal verification, can make a difference. But the question is: how effective are PL techniques for addressing security issues, and at what cost? How hard is it to use them? Which are most effective? Is it more effective to use Java, or is it just as effective to perform heavy-duty post-development analysis and use C/C++? People have hypothesized answers to these questions, but I don’t think they have ever been put to a scientific test.
The build-it, break-it, fix-it contest was conceived as a way to acquire useful scientific evidence, while at the same time engaging the student population and the wider community in a mentality of building security in rather than adding it after the fact.
Contest Phase 1: Build It
The contest is held in three phases, each on consecutive weekends. In the first phase, which we call the build-it phase, contestants are asked to build a software system of moderate complexity—for example, a web server or a file parser. Contestants can use whatever programming language or technology platform they wish, as long as it can be deployed to the testing infrastructure we will use to judge it. We judge each contestant’s submission by a pair of objective tests: does it meet predefined functionality requirements, and does it run in a certain amount of time. Submissions that pass a core set of correctness and performance tests are accepted, and they are awarded points commensurate with their level of performance—the faster the (correct) software, the more points it gets. Contestants also have the option of implementing extra features for more points.
We hope that the build-it phase simulates what software companies do, in the small: They must build a basic product, and they would like it to be feature-ful, efficient, and secure. These goals may conflict. For example, more features means more time coding and less time penetration testing. A type-safe language like Java may help security, but may hurt efficiency compared to a type-unsafe language like C. How does one manage these tradeoffs effectively? Different teams will answer this question differently.
Contest Phase 2: Break It
In the second phase, called the break-it phase, contestants perform vulnerability analysis on the software (including its source code) submitted during the first phase. Phase-two contestants (which may or may not overlap with phase-one contestants) get points by finding vulnerabilities (and other defects) in the phase-one submissions, and those submissions with errors lose points. Defects are reported by break-it teams using test cases that demonstrate a failure or exploit a vulnerability. Demonstrated exploits get more points.
The break-it phase fills the role of a penetration testing red team, and this portion of the contest resembles many other security contests, like DefCon CTF, whose focus is on finding vulnerabilities. In most contests, bugs are synthetic, introduced on purpose by the contest organizers into a software package. In our contest, bugs are “real” because they were written by other contestants. There is also likely to be more diversity in the submissions of our contest, since build-it teams are free to use any programming language or set of libraries they wish (within some practical limits). To win, the best break-it teams will have to overcome this diversity to nevertheless find flaws. It will be interesting to see what the most effective strategies turn out to be.
Contest Phase 3: Fix It
In the final phase, called the fix-it phase, build-it teams receive the test cases that identify failures in their code. For each test case, the team can submit a fix, and the fixed software is run against the remaining test cases, thereby potentially identifying many test cases that are “morally the same,” i.e., the failure derives from the same root cause. Points are only tallied for non-duplicate test cases. If no fix is identified for a test case, then full points are deducted/awarded.
This phase is important to prevent abuse during phase two, whether accidental or purposeful. In particular, the same bug can be revealed by test cases that are superficially different, and yet we do not want to award points to break-it teams for discovering many test cases for the same bug, nor do we want to penalize build-it teams unfairly. We also need to be sure that a submitted fix does not correct many bugs, incorrectly unifying too many test cases. Some judging and automation support will still be used during this phase to avoid gaming of the system.
At the conclusion of this phase, the organizers tally the scores and declare the best builders and the best breakers, with cash prizes awarded to each.
What can the contest teach us?
We hope that the data we gather during the contest, and its outcomes, will give us useful evidence on what works. We can analyze the results and ask: Which projects were most secure (even if they were not the best performing or most feature-ful), and why? Was it the programming method they used? Was it the programming language? Was it the experience of the team involved? Likewise, which techniques were most effective at finding defects? By having many (hopefully 50-100) independent implementations of the same program, we control one important variable (what the program does) while having enough data to determine which other variables seemed to make a difference in the outcomes.
Beyond what its outcomes can teach us, the contest itself serves as a platform to demonstrate to students the value of secure programming. Compared to the pedagogy in existing classes (and other contests) our contest offers several advantages. We have contestants write security-critical software and the contest structure ensures they receive direct feedback on the security (and other) properties of their code. By having the contestants audit other implementations of the same specification, they can learn about different software development methodologies and the strengths and weaknesses of each. Additionally, contestants will be exposed to languages and platforms that possess properties which allow for the easier creation of secure software.
Liftoff, August 28
We go live on August 28. You can follow the contest, receiving announcements and details on the programming topic as they firm up, on Twitter and on Facebook. You will also be to sign up at https://builditbreakit.org. As this contest is run purely on-line, anyone in the US can participate (we cannot do outside the US or non-students due to limitations on awarding prizes). Hope to see you, your friends, and students in August!