Software Security is a Programming Languages Issue

This is the the last of three posts on the course I regularly teach, CS 330, Organization of Programming Languages. The first two posts covered programming language styles and mathematical concepts. This post covers the last 1/4 of the course, which focuses on software security, and related to that, the programming language Rust.

This course topic might strike you as odd: Why teach security in a programming languages course? Doesn’t it belong in, well, a security course? I believe that if we are to solve our security problems, then we must build software with security in mind right from the start. To do that, all programmers need to know something about security, not just a handful of specialists. Security vulnerabilities are both enabled and prevented by various language (mis)features, and programming (anti)patterns. As such, it makes sense to introduce these concepts in a programming (languages) course, especially one that all students must take.

This post is broken into three parts: the need for security-minded programming, how we cover this topic in 330, and our presentation of Rust. The post came to be a bit longer than I’d anticipated; apologies!

Software source code

Security is a programming (languages) concern

The Status Quo: Too Much Post-hoc Security

There is a lot of interest these days in securing computer systems. This interest follows from the highly publicized roll call of serious data breaches, denial of service attacks, and system hijacks. In response, security companies are proliferating, selling computerized forms of spies, firewalls, and guard towers. There is also a regular call for more “cybersecurity professionals” to help man the digital walls.

It might be that these efforts are worth their collective cost, but call me skeptical. I believe that a disproportionate portion of our efforts focuses on adding security to a system after it has been built. Is your server vulnerable to attack? If so, no problem: Prop an intrusion detection system in front of it to identify and neuter network packets attempting to exploit the vulnerability. There’s no doubt that such an approach is appealing; too bad it doesn’t actually work. As computer security experts have been saying since at least the 60s, if you want a system to actually be secure then it must be designed and built with security in mind. Waiting until the system is deployed is too late.

Building Security In

There is a mounting body of work that supports building secure systems from the outset. For example, the Building Security In Maturity Model (BSIMM) catalogues the processes followed by a growing list of companies to build more secure systems. Companies such as Synopsys and Veracode offer code analysis products that look for security flaws. Processes such as Microsoft’s Security Development Lifecycle and books such as Gary McGraw‘s Software Security: Building Security In, and Sami Saydjari‘s recently released Engineering Trustworthy Systems identify a path toward better designed and built systems.

These are good efforts. Nevertheless, we need even more emphasis on the “build security in” mentality so we can rely far less on necessary, but imperfect, post-hoc stuff. For this shift to happen, we need better education.

Security in a Programming Class

Running off of a cliff

Choosing performance over security

Programming courses typically focus on how to use particular languages to solve problems efficiently. Functionality is obviously paramount, with performance an important secondary concern.

But in today’s climate shouldn’t security be at the same level of importance as performance? If you argue that security is not important for every application, I would say the same is true of performance. Indeed the rise of slow, easy-to-use scripting languages is a testament to that. But sometimes performance is very important, or becomes so later, and the same is true of security. Indeed, many security bugs arise because code originally written for a benign setting ends up in a security-sensitive one. As such, I believe educators should regularly talk about how to make code more secure just as we regularly talk about how to make it more efficient.

To do this requires a change in mindset. A reasonable approach, when focusing on correctness and efficiency, is to aim for code that works under expected conditions. But expected use is not good enough for security: Code must be secure under all operating conditions.

Normal users are not going to input weirdly formatted files to to PDF viewers. But adversaries will. As such, students need to understand how a bug in a program can be turned into a security vulnerability, and how to stop it from happening. Our two lectures in CS 330 on security shift between illustrating a kind of security vulnerability, identifying the conditions that make that vulnerability possible, and developing a defense that eliminates those conditions. For the latter we focus on language properties (e.g., type safety) and programming patterns (e.g., validating input).

Security Bugs

In our first lecture, we start by introducing the high-level idea of a buffer overflow vulnerability, in which an input is larger than the buffer designed to hold it. We hint at how to exploit it by smashing the stack. A key feature of this attack is that while the program intends for an input to be treated as data, the attacker is able to trick the program to treat it as code which does something harmful. We also look at command injection, and see how it similarly manifests when an attacker tricks the program to treat data as code.

SQL injection analogy to Scrabble

SQL injection: malicious code from benign parts

Our second lecture covers vulnerabilities and attacks specific to web applications, including SQL injection, Cross-site Request Forgery (CSRF), and Cross-site scripting (XSS). Once again, these vulnerabilities all have the attribute that untrusted data provided by an attacker can be cleverly crafted to trick a vulnerable application to treat that data as code. This code can be used to hijack the program, steal secrets, or corrupt important information.

Coding Defenses

It turns out the defense against many of these vulnerabilities is the same, at a high level: validate any untrusted input before using it, to make sure it’s benign. We should make sure an input is not larger than the buffer allocated to hold it, so the buffer is not overrun. In any language other than C or C++, this check happens automatically (and is generally needed to ensure type safety).

For the other four attacks, the vulnerable application uses the attacker input when piecing together another program. For example, an application might expect user inputs to correspond to a username and password, splicing these inputs into a template SQL program with which it queries a database. But the inputs could contain SQL commands that cause the query to do something different than intended. The same is true when constructing shell commands (command injection), or Javascript and HTML programs (cross-site scripting). The defense is also the same, at a high level: user inputs need to either have potentially dangerous content removed or made inert by construction (e.g., through the use of prepared statements).

None of this stuff is new, of course. Most security courses talk about these topics. What is unusual is that we are talking about them in a “normal” programming languages course.

Our security project reflects the defensive-minded orientation of the material. While security courses tend to focus on vulnerability exploitation, CS 330 focuses on fixing the bugs that make an application vulnerable. We do this by giving the students a web application, written in Ruby, with several vulnerabilities in it. Students must fix the vulnerabilities without breaking the core functionality. We test the fixes automatically by having our auto-grading system test functionality and exploitability. Several hidden tests exploit the initially present vulnerabilities. The students must modify the application so these cases pass (meaning the vulnerability has been removed and/or can no longer be exploited) without causing any of the functionality-based test cases to fail.

Low-level Control, Safely

The most dangerous kind of vulnerability allows an attacker to gain arbitrary code execution (ACE): Through exploitation, the attacker is able to execute code of their choice on the target system. Memory management errors in type-unsafe languages (C and C++) comprise a large class of ACE vulnerabilities. Use-after-free errors, double-frees, and buffer overflows are all examples. The latter is still the single largest category of vulnerability today, according to MITRE’s Common Weakness Enumeration (CWE) database.

Programs written in type-safe languages, such as Java or Ruby, 1 are immune to these sorts of memory errors. Writing applications in these languages would thus eliminate a large category of vulnerabilities straightaway. 2 The problem is that type-safe languages’ use of abstract data representations and garbage collection (GC), which make programming easier, remove low-level control and add overhead that is sometimes hard to bear. C and C++ are essentially the only game in town 3 for operating systems, device drivers, and embedded devices (e.g., IoT), which cannot tolerate the overhead and/or lack of control. And we see that these systems are regularly and increasingly under attack. What are we to do?

Rust: Type safety without GC

In 2010, the Mozilla corporation (which brings you Firefox) officially began an ambitious project to develop a safe language suitable for writing high-performance programs. The result is Rust. 4 In Rust, type-safety ensures (with various caveats) that a program is free of memory errors and free of data races. In Rust, type safety is possible without garbage collection, which is not true of any other mainstream language.

Rust language logo

Rust, the programming language

In CS 330, we introduce Rust and its basic constructs, showing how Rust is arguably closer to a functional programming language than it is to C/C++. (Rust’s use of curly braces and semi-colons might make it seem familiar to C/C++ programmers, but there’s a whole lot more that’s different than is the same!)

We spend much of our time talking about Rust’s use of ownership and lifetimes. Ownership (aka linear typing) is used to carefully track pointer aliasing, so that memory modified via one alias cannot mistakenly corrupt an invariant assumed by another. Lifetimes track the scope in which pointed-to memory is live, so that it is freed automatically, but no sooner than is safe. These features support managing memory without GC. They also support sophisticated programming patterns via smart pointers and traits (a construct I was unfamiliar with, but now really like). We provide a simple programming project to familiarize students with the basic and advanced features of Rust.

Assessment

I enjoyed learning Rust in preparation for teaching it. I had been wanting to learn it since my interview with Aaron Turon some years back. The Rust documentation is first-rate, so that really helped.

I also enjoyed seeing connections to my own prior research on the Cyclone programming language. (I recently reflected on Cyclone, and briefly connected it to Rust, in a talk at the ISSISP’18 summer school.) Rust’s ownership relates to Cyclone’s unique/affine pointers, and Rust’s lifetimes relate to Cyclone’s regions. Rust’s smart pointers match patterns we also implemented in Cyclone, e.g., for reference counted pointers. Rust has taken these ideas much further, e.g., a really cool integration with traits handles tricky aspects of polymorphism. The Rust compiler’s error messages are also really impressive!

A big challenge in Cyclone was finding a way to program with unique pointers without tearing your hair out. My impression is that Rust programmers face the same challenge (as long as you don’t resort to frequent use of unsafe blocks). Nevertheless, Rust is a much-loved programming language, so the language designers are clearly doing something right! Oftentimes facility is a matter of comfort, and comfort is a matter of education and experience. As such, I think Rust fits into the philosophy of CS 330, which aims to introduce new language concepts that are interesting in and of themselves, and may yet have expanded future relevance.

Conclusions

We must build software with security in mind from the start. Educating all future programmers about security is an important step toward increasing the security mindset. In CS 330 we illustrate common vulnerability classes and how they can be defended against by the language (e.g., by using those languages, like Rust, that are type safe) and programming patterns (e.g., by validating untrusted input). By doing so, we are hopefully making our students more fully cognizant of the task that awaits them in their future software development jobs. We might also interest them to learn more about security in a subsequent security class.

In writing this post, I realize we could do more to illustrate how type abstraction can help with security. For example, abstract types can be used to increase assurance that input data is properly validated, as explained by Google’s Christoph Kern in his 2017 SecDev Keynote. This fact is also a consequence of semantic type safety, as argued well by Derek Dreyer in his POPL’18 Keynote. Good stuff to do for Spring’19 !

Notes:

  1. Ruby is dynamically typed, but arguably type-safe.
  2. This is not strictly true as parts of these languages’ implementations are written in C/C++, and programs in type-safe languages can call out to C/C++ via a foreign function interface. Even so, eliminating C/C++ as a normal source programming language would dramatically reduce the attack surface.
  3. And even if it is not needed, C/C++ is still used quite a bit anyway. Old habits, and legacy code, die hard.
  4. Rust development began in 2006, but was not officially supported by Mozilla until later.

21 Comments

Filed under Education, Software Security, Types

21 Responses to Software Security is a Programming Languages Issue

  1. Pingback: Software Security Is a Programming Languages Issue - zentrade.online

  2. Mike – you really need to look at SPARK (the Ada subset) – it’s been type-safe and GC-free since day 1… OK… you might not consider Ada to be “mainstream”, but please don’t confuse popularity with fitness-for-purpoose… 🙂 Rod Chapman, previously of SPARK team…

  3. I did know about SPARK but hadn’t thought about it in this context. IIUC, SPARK has severe limits on dynamic memory allocation (and pointer def/use), so it’s not a great fit for some of the application areas I mentioned that Rust is targeting, e.g., web browsers. That said, Rust has restrictions too, so perhaps it’s not as far away as I first surmised. And in any case I agree with you that it’s an effort worth mentioning to students. Thanks!

    • Steve Schwarm

      The latest version of Ada address many of the issues. I just do not understand why people dislike the language. It fixes a bunch of issue beyond the ones mentioned so far. Using if /end if for example. Array indexes do not have have to start at Zero. Range checks on integers. Tasks are a built in type.

      It is really enlightening to go read the Ada requirements document. I lot of thought went into its design.

    • I programmed for decades, and was self taught. Languages all have security issues, many of which are not the programmers fault, but in the developers domain of libraries and shared functions. Moreover, if the system is constructed correctly, the permissions of the code are that of the invoking user, so good partitioning would go a long way to isolate the code. Sandboxing works too, to mitigate programmer errors. This is not to excuse the programmer, but rather to point out that security is a multilevel problem.

      Worse, when in college, and I graduated only a decade ago, most of the people teaching had less security knowledge than I did, and that wasn’t much either. I do work at getting better, but it is a big subject all on it’s own, and in my field of electronic test, the programmer was dealing with high speed, parallel architectures, defined specifications that required research on each new product, and constant upgrading of test techniques, in addition to changes in programming languages.

      In my 21 years as a test applications developer, the languages were in time line order, Basic, C, Fortran, Pascal, C & C++, Basic, Visual Basic for applications, Excel macros, Assembly Language, RPG2, RS1, and some various others on a project by project basis. My preferred language is C because most of what I develop deals with direct machine drive and it puts me closer to the machine with relative simple access to assembly as needed.

      To do all those languages, in addition to crossing 12 different varieties of test platforms and over 200 different bench instruments, teaching classes on the machines, developing over 100 custom projects, and 4 special tools, and taking college classes, exactly when should I have been studying the total attack surface provided by new languages, platforms, development tools and so on?

      • I don’t know the answer to your question. The point of this post is to say that instructors of college courses *should* know about security and be teaching it while teaching programming. Just as we talk about how to achieve efficiency, not just correctness, we should also talk about how to achieve security.

        I disagree with the perspective that language doesn’t matter. At the least, C and C++ add a level of risk. Without type safety, none of your abstractions are safe — a memory error in any part of the program can compromise any of the best-coded other parts of the program.

        Indeed, we observed this situation empirically in our build-it, break-it, fix-it programming contest, which often involved participants with significant programming experience. Those who elected to write their code in C were 8.5X more likely to have a security bug found compared to those who used a statically type-safe language (like Java or Go). See the our paper for more.

        • Examining the paper, I didn’t see any modification for the size of the programs. In reality the size of the C and C++ solutions seemed to have more code. This could be one reason for greater errors, as more code also implies more programmers as well as more opportunities for failure.

          Also C and C++ programmers that produce more code are likely less experienced at least in my limited experience?

          But all that aside, yes, memory issues are one of the big issues, and so are the issues of the memory management tools in most OS implementations. Using malloc and free are one of the big issues, and the segmentation of memory by the low level operations of the OS and the inability of the OS to properly triage the memory usage is another issue that makes these errors more difficult to manage.

          One of the additional issues is the choice of problem space. Web applications have a huge attack surface, and you specified these intentionally to provide that as the litmus test. C and C++ are in fact the wrong languages for this type of development, and I have worked mostly on technical software which would be unduly burdened by some of the overhead in the strongly typed and fenced memory of the other languages. Speed and efficiency were paramount, and yes security was never mentioned in the development process. I believe it should have been, but you know what they say about hindsight vs foresight.

          But I will say your paper was well thought out and the results enlightening.

  4. What does any of this have to do with configuration errors? If I misread your post I apologise, but none of this protects against a poor root or admin password. Perhaps we should use a different programming language for authentication, but I don’t see how that would help. Security is more than just buffer overflows and sql injection. If you post your private key online by mistake, how does using rust over c help that? If your vpn is misconfigured and Intruders can access an internal database, no specific programming language would help.

    • Totally agree. My point is that software security is a PL/programming issue, so all programmers need to be aware of it, not that security is only a programming issue and nothing else. The argument you make also goes in reverse: You can protect your keys and configure your VPN properly and set a good password but if you are vulnerable to remote code injection attacks due to poor programming, none of that matters.

  5. Tim Kertis

    The programmers aren’t the problem. The problem is with the languages themselves. These languages such as Java, C, and C++ were not designed for writing secure code. This shortcoming can be remedied by using a Secure Coding Framework. Google SCF and Kertis to find out more.

  6. In the same vein: http://langsec.org/

    I also like to think about what could’ve been and could be if something similar to the SmallTalk family was mainstream over the C family.

    To quote Dan Ingalls: “An operating system is a collection of things that don’t fit into a language. There shouldn’t be one”

    The philosophy of a program being the serialization of a running process instead of what is used to create one.

    The closest main-stream thing we have today are DBMS’s I think.

    • Someone on Twitter also pointed to langsec. That’s an interesting movement that observes that the parser is a place where validation ought to happen. Someone else on Twitter observed that types also capture invariants, and that type checking is (at run-time, in part) can help ensure/capture those invariants and insure they are preserved.

      As for the impact of SmallTalk: It’s a very different programming model, and I’m wondering how/whether it changes the security question. I.e., you still have to worry about communications and untrusted inputs, and possible leaks, etc. Perhaps your thinking is that all programs would naturally be doing this rather than thinking the OS is taking care of it?

  7. I interpret Language As an Operating System as:

    1. Representing OS features as language abstractions
    2. Implementation of those features in the language itself

    If it’s a suitably portable and expressive language, you’ve now gained
    a single domain of discourse to be modeled and enforced.
    Every program is instead a lightweight Virtual Machine in a sense.

    To quote from [2] section 3.2

    “””
    Conventional operating systems support multiple programs through a process abstraction.
    that gives each program its own control flow, I/O environment, and resource controls.
    A process is distinguished primarily by its address space, where separate address spaces
    serve both as a protection barrier between programs and as a mechanism for defining a
    program’s environment.
    […]
    In MrEd, separate address spaces are unnecessary for protection between programs, due to
    the safety properties of the programming language.
    […]
    Instead of providing and all-encompassing process abstraction, MrEd provides specific mechanisms
    for creating threads of control, dealing with graphical I/O, and managing resources.
    “””

    When it comes to communication between machines, I’d like to see if there can be some unification of
    the above ideas with COAST described in [5] by Roy Fielding and others. Definitely a door I’d like to
    see pushed against.

    Some Examples of Languages As Operating System:

    [1] SqueakNOS:
    [2] The MrEd VM:
    [3] The Lively Kernel
    [4] The Lively Web

    Other references:

    [5]

  8. vasko

    when type safety comes to mind the first thing i think of is Pascal. it had great impact on all high level and more modern languages than C. array out of bound exception, nested functions that are now becoming more popular with FP languages, the pascal type string that is copied by almost every OOP language… must be the most underrated language. of course, there was place for only one in the same position. it was either C or Pascal. i love them both and *whoever* creates another new/safe/better language, just to start with, he is no match for Niklaus Wirth, Dennis Ritchie and not to forget John McCarthy (LISP).

    • It also amazes me that _all_ the Pascal-family languages (Pascal, Oberon, Eiffel, Delphi, Ada etc. etc) allow user-defined scalar types that embody the problem domain, not the machine domain … e.g in Ada
      type Number_Of_Working_Engines is range 0 .. 4;
      plus named type equivalence.

      All of this means my static analysis tool can immediately reject any attempt to assign -1 or +5 (or any expression that _might_ evaluate to something outside of that range…) onto an object of this type.

      That all the C-languages (even Rust) omit this is a remarkable oversight on those language designers. I teach SPARK and Ada to C programmers a fair bit, and it takes _ages_ to convince them that there’s more to life than int or int32_t… it seems to be a real blindspot…
      – Rod

  9. System security is a pure political issue. The different states were interested in infiltrating private computer systems. In the modern conception of police states the state has no restrictions on violating privacy. All privacy rights exist only on the paper. It wouldn’t be any problem to construct a secure IT if the different states really wanted. As Individual or company it is impossible to construct security. You depend on to much external components that stupid (or bad) people have built.

    Since Trump things go better. As the left media mainstream hates Trump the control of the infiltration that the US have is more and more seen as a threat. The security holes are more and more closed to limit the power of Trump. If all states hold together security can be restitued and we are coming back in a civilized world with real privacy.

    • I definitely agree that security is a matter of incentives. Companies would actively adopt more security-enhancing development practices if there was a stronger disincentive for inaction. I observe that there are some engineering standards that must be followed, e.g., for civil engineers building bridges, but no similarly-enforced standards for software engineers building software. If there were, the situation might be better.

      In any case, getting to that place relies on training more people, and determining best practices. I’m not so convinced that we have the one true answer and companies are just not following it. We know a lot, but we need more research and education development. We in computer science and PL can be part of the solution.

  10. Pingback: Software Security is a Programming Languages Issue | MySafePick

  11. Pingback: Teaching Programming Languages (part 2) - The PL Enthusiast

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.