Software Security is a Programming Languages Issue

This is the the last of three posts on the course I regularly teach, CS 330, Organization of Programming Languages. The first two posts covered programming language styles and mathematical concepts. This post covers the last 1/4 of the course, which focuses on software security, and related to that, the programming language Rust.

This course topic might strike you as odd: Why teach security in a programming languages course? Doesn’t it belong in, well, a security course? I believe that if we are to solve our security problems, then we must build software with security in mind right from the start. To do that, all programmers need to know something about security, not just a handful of specialists. Security vulnerabilities are both enabled and prevented by various language (mis)features, and programming (anti)patterns. As such, it makes sense to introduce these concepts in a programming (languages) course, especially one that all students must take.

This post is broken into three parts: the need for security-minded programming, how we cover this topic in 330, and our presentation of Rust. The post came to be a bit longer than I’d anticipated; apologies!

Software source code

Security is a programming (languages) concern

The Status Quo: Too Much Post-hoc Security

There is a lot of interest these days in securing computer systems. This interest follows from the highly publicized roll call of serious data breaches, denial of service attacks, and system hijacks. In response, security companies are proliferating, selling computerized forms of spies, firewalls, and guard towers. There is also a regular call for more “cybersecurity professionals” to help man the digital walls.

It might be that these efforts are worth their collective cost, but call me skeptical. I believe that a disproportionate portion of our efforts focuses on adding security to a system after it has been built. Is your server vulnerable to attack? If so, no problem: Prop an intrusion detection system in front of it to identify and neuter network packets attempting to exploit the vulnerability. There’s no doubt that such an approach is appealing; too bad it doesn’t actually work. As computer security experts have been saying since at least the 60s, if you want a system to actually be secure then it must be designed and built with security in mind. Waiting until the system is deployed is too late.

Building Security In

There is a mounting body of work that supports building secure systems from the outset. For example, the Building Security In Maturity Model (BSIMM) catalogues the processes followed by a growing list of companies to build more secure systems. Companies such as Synopsys and Veracode offer code analysis products that look for security flaws. Processes such as Microsoft’s Security Development Lifecycle and books such as Gary McGraw‘s Software Security: Building Security In, and Sami Saydjari‘s recently released Engineering Trustworthy Systems identify a path toward better designed and built systems.

These are good efforts. Nevertheless, we need even more emphasis on the “build security in” mentality so we can rely far less on necessary, but imperfect, post-hoc stuff. For this shift to happen, we need better education.

Security in a Programming Class

Running off of a cliff

Choosing performance over security

Programming courses typically focus on how to use particular languages to solve problems efficiently. Functionality is obviously paramount, with performance an important secondary concern.

But in today’s climate shouldn’t security be at the same level of importance as performance? If you argue that security is not important for every application, I would say the same is true of performance. Indeed the rise of slow, easy-to-use scripting languages is a testament to that. But sometimes performance is very important, or becomes so later, and the same is true of security. Indeed, many security bugs arise because code originally written for a benign setting ends up in a security-sensitive one. As such, I believe educators should regularly talk about how to make code more secure just as we regularly talk about how to make it more efficient.

To do this requires a change in mindset. A reasonable approach, when focusing on correctness and efficiency, is to aim for code that works under expected conditions. But expected use is not good enough for security: Code must be secure under all operating conditions.

Normal users are not going to input weirdly formatted files to to PDF viewers. But adversaries will. As such, students need to understand how a bug in a program can be turned into a security vulnerability, and how to stop it from happening. Our two lectures in CS 330 on security shift between illustrating a kind of security vulnerability, identifying the conditions that make that vulnerability possible, and developing a defense that eliminates those conditions. For the latter we focus on language properties (e.g., type safety) and programming patterns (e.g., validating input).

Security Bugs

In our first lecture, we start by introducing the high-level idea of a buffer overflow vulnerability, in which an input is larger than the buffer designed to hold it. We hint at how to exploit it by smashing the stack. A key feature of this attack is that while the program intends for an input to be treated as data, the attacker is able to trick the program to treat it as code which does something harmful. We also look at command injection, and see how it similarly manifests when an attacker tricks the program to treat data as code.

SQL injection analogy to Scrabble

SQL injection: malicious code from benign parts

Our second lecture covers vulnerabilities and attacks specific to web applications, including SQL injection, Cross-site Request Forgery (CSRF), and Cross-site scripting (XSS). Once again, these vulnerabilities all have the attribute that untrusted data provided by an attacker can be cleverly crafted to trick a vulnerable application to treat that data as code. This code can be used to hijack the program, steal secrets, or corrupt important information.

Coding Defenses

It turns out the defense against many of these vulnerabilities is the same, at a high level: validate any untrusted input before using it, to make sure it’s benign. We should make sure an input is not larger than the buffer allocated to hold it, so the buffer is not overrun. In any language other than C or C++, this check happens automatically (and is generally needed to ensure type safety).

For the other four attacks, the vulnerable application uses the attacker input when piecing together another program. For example, an application might expect user inputs to correspond to a username and password, splicing these inputs into a template SQL program with which it queries a database. But the inputs could contain SQL commands that cause the query to do something different than intended. The same is true when constructing shell commands (command injection), or Javascript and HTML programs (cross-site scripting). The defense is also the same, at a high level: user inputs need to either have potentially dangerous content removed or made inert by construction (e.g., through the use of prepared statements).

None of this stuff is new, of course. Most security courses talk about these topics. What is unusual is that we are talking about them in a “normal” programming languages course.

Our security project reflects the defensive-minded orientation of the material. While security courses tend to focus on vulnerability exploitation, CS 330 focuses on fixing the bugs that make an application vulnerable. We do this by giving the students a web application, written in Ruby, with several vulnerabilities in it. Students must fix the vulnerabilities without breaking the core functionality. We test the fixes automatically by having our auto-grading system test functionality and exploitability. Several hidden tests exploit the initially present vulnerabilities. The students must modify the application so these cases pass (meaning the vulnerability has been removed and/or can no longer be exploited) without causing any of the functionality-based test cases to fail.

Low-level Control, Safely

The most dangerous kind of vulnerability allows an attacker to gain arbitrary code execution (ACE): Through exploitation, the attacker is able to execute code of their choice on the target system. Memory management errors in type-unsafe languages (C and C++) comprise a large class of ACE vulnerabilities. Use-after-free errors, double-frees, and buffer overflows are all examples. The latter is still the single largest category of vulnerability today, according to MITRE’s Common Weakness Enumeration (CWE) database.

Programs written in type-safe languages, such as Java or Ruby, 1 are immune to these sorts of memory errors. Writing applications in these languages would thus eliminate a large category of vulnerabilities straightaway. 2 The problem is that type-safe languages’ use of abstract data representations and garbage collection (GC), which make programming easier, remove low-level control and add overhead that is sometimes hard to bear. C and C++ are essentially the only game in town 3 for operating systems, device drivers, and embedded devices (e.g., IoT), which cannot tolerate the overhead and/or lack of control. And we see that these systems are regularly and increasingly under attack. What are we to do?

Rust: Type safety without GC

In 2010, the Mozilla corporation (which brings you Firefox) officially began an ambitious project to develop a safe language suitable for writing high-performance programs. The result is Rust. 4 In Rust, type-safety ensures (with various caveats) that a program is free of memory errors and free of data races. In Rust, type safety is possible without garbage collection, which is not true of any other mainstream language.

Rust language logo

Rust, the programming language

In CS 330, we introduce Rust and its basic constructs, showing how Rust is arguably closer to a functional programming language than it is to C/C++. (Rust’s use of curly braces and semi-colons might make it seem familiar to C/C++ programmers, but there’s a whole lot more that’s different than is the same!)

We spend much of our time talking about Rust’s use of ownership and lifetimes. Ownership (aka linear typing) is used to carefully track pointer aliasing, so that memory modified via one alias cannot mistakenly corrupt an invariant assumed by another. Lifetimes track the scope in which pointed-to memory is live, so that it is freed automatically, but no sooner than is safe. These features support managing memory without GC. They also support sophisticated programming patterns via smart pointers and traits (a construct I was unfamiliar with, but now really like). We provide a simple programming project to familiarize students with the basic and advanced features of Rust.

Assessment

I enjoyed learning Rust in preparation for teaching it. I had been wanting to learn it since my interview with Aaron Turon some years back. The Rust documentation is first-rate, so that really helped.

I also enjoyed seeing connections to my own prior research on the Cyclone programming language. (I recently reflected on Cyclone, and briefly connected it to Rust, in a talk at the ISSISP’18 summer school.) Rust’s ownership relates to Cyclone’s unique/affine pointers, and Rust’s lifetimes relate to Cyclone’s regions. Rust’s smart pointers match patterns we also implemented in Cyclone, e.g., for reference counted pointers. Rust has taken these ideas much further, e.g., a really cool integration with traits handles tricky aspects of polymorphism. The Rust compiler’s error messages are also really impressive!

A big challenge in Cyclone was finding a way to program with unique pointers without tearing your hair out. My impression is that Rust programmers face the same challenge (as long as you don’t resort to frequent use of unsafe blocks). Nevertheless, Rust is a much-loved programming language, so the language designers are clearly doing something right! Oftentimes facility is a matter of comfort, and comfort is a matter of education and experience. As such, I think Rust fits into the philosophy of CS 330, which aims to introduce new language concepts that are interesting in and of themselves, and may yet have expanded future relevance.

Conclusions

We must build software with security in mind from the start. Educating all future programmers about security is an important step toward increasing the security mindset. In CS 330 we illustrate common vulnerability classes and how they can be defended against by the language (e.g., by using those languages, like Rust, that are type safe) and programming patterns (e.g., by validating untrusted input). By doing so, we are hopefully making our students more fully cognizant of the task that awaits them in their future software development jobs. We might also interest them to learn more about security in a subsequent security class.

In writing this post, I realize we could do more to illustrate how type abstraction can help with security. For example, abstract types can be used to increase assurance that input data is properly validated, as explained by Google’s Christoph Kern in his 2017 SecDev Keynote. This fact is also a consequence of semantic type safety, as argued well by Derek Dreyer in his POPL’18 Keynote. Good stuff to do for Spring’19 !

Notes:

  1. Ruby is dynamically typed, but arguably type-safe.
  2. This is not strictly true as parts of these languages’ implementations are written in C/C++, and programs in type-safe languages can call out to C/C++ via a foreign function interface. Even so, eliminating C/C++ as a normal source programming language would dramatically reduce the attack surface.
  3. And even if it is not needed, C/C++ is still used quite a bit anyway. Old habits, and legacy code, die hard.
  4. Rust development began in 2006, but was not officially supported by Mozilla until later.

27 Comments

Filed under Education, Software Security, Types

27 Responses to Software Security is a Programming Languages Issue

  1. Pingback: Software Security Is a Programming Languages Issue - zentrade.online

  2. Mike – you really need to look at SPARK (the Ada subset) – it’s been type-safe and GC-free since day 1… OK… you might not consider Ada to be “mainstream”, but please don’t confuse popularity with fitness-for-purpoose… 🙂 Rod Chapman, previously of SPARK team…

  3. I did know about SPARK but hadn’t thought about it in this context. IIUC, SPARK has severe limits on dynamic memory allocation (and pointer def/use), so it’s not a great fit for some of the application areas I mentioned that Rust is targeting, e.g., web browsers. That said, Rust has restrictions too, so perhaps it’s not as far away as I first surmised. And in any case I agree with you that it’s an effort worth mentioning to students. Thanks!

    • Steve Schwarm

      The latest version of Ada address many of the issues. I just do not understand why people dislike the language. It fixes a bunch of issue beyond the ones mentioned so far. Using if /end if for example. Array indexes do not have have to start at Zero. Range checks on integers. Tasks are a built in type.

      It is really enlightening to go read the Ada requirements document. I lot of thought went into its design.

    • I programmed for decades, and was self taught. Languages all have security issues, many of which are not the programmers fault, but in the developers domain of libraries and shared functions. Moreover, if the system is constructed correctly, the permissions of the code are that of the invoking user, so good partitioning would go a long way to isolate the code. Sandboxing works too, to mitigate programmer errors. This is not to excuse the programmer, but rather to point out that security is a multilevel problem.

      Worse, when in college, and I graduated only a decade ago, most of the people teaching had less security knowledge than I did, and that wasn’t much either. I do work at getting better, but it is a big subject all on it’s own, and in my field of electronic test, the programmer was dealing with high speed, parallel architectures, defined specifications that required research on each new product, and constant upgrading of test techniques, in addition to changes in programming languages.

      In my 21 years as a test applications developer, the languages were in time line order, Basic, C, Fortran, Pascal, C & C++, Basic, Visual Basic for applications, Excel macros, Assembly Language, RPG2, RS1, and some various others on a project by project basis. My preferred language is C because most of what I develop deals with direct machine drive and it puts me closer to the machine with relative simple access to assembly as needed.

      To do all those languages, in addition to crossing 12 different varieties of test platforms and over 200 different bench instruments, teaching classes on the machines, developing over 100 custom projects, and 4 special tools, and taking college classes, exactly when should I have been studying the total attack surface provided by new languages, platforms, development tools and so on?

      • I don’t know the answer to your question. The point of this post is to say that instructors of college courses *should* know about security and be teaching it while teaching programming. Just as we talk about how to achieve efficiency, not just correctness, we should also talk about how to achieve security.

        I disagree with the perspective that language doesn’t matter. At the least, C and C++ add a level of risk. Without type safety, none of your abstractions are safe — a memory error in any part of the program can compromise any of the best-coded other parts of the program.

        Indeed, we observed this situation empirically in our build-it, break-it, fix-it programming contest, which often involved participants with significant programming experience. Those who elected to write their code in C were 8.5X more likely to have a security bug found compared to those who used a statically type-safe language (like Java or Go). See the our paper for more.

        • Examining the paper, I didn’t see any modification for the size of the programs. In reality the size of the C and C++ solutions seemed to have more code. This could be one reason for greater errors, as more code also implies more programmers as well as more opportunities for failure.

          Also C and C++ programmers that produce more code are likely less experienced at least in my limited experience?

          But all that aside, yes, memory issues are one of the big issues, and so are the issues of the memory management tools in most OS implementations. Using malloc and free are one of the big issues, and the segmentation of memory by the low level operations of the OS and the inability of the OS to properly triage the memory usage is another issue that makes these errors more difficult to manage.

          One of the additional issues is the choice of problem space. Web applications have a huge attack surface, and you specified these intentionally to provide that as the litmus test. C and C++ are in fact the wrong languages for this type of development, and I have worked mostly on technical software which would be unduly burdened by some of the overhead in the strongly typed and fenced memory of the other languages. Speed and efficiency were paramount, and yes security was never mentioned in the development process. I believe it should have been, but you know what they say about hindsight vs foresight.

          But I will say your paper was well thought out and the results enlightening.

  4. What does any of this have to do with configuration errors? If I misread your post I apologise, but none of this protects against a poor root or admin password. Perhaps we should use a different programming language for authentication, but I don’t see how that would help. Security is more than just buffer overflows and sql injection. If you post your private key online by mistake, how does using rust over c help that? If your vpn is misconfigured and Intruders can access an internal database, no specific programming language would help.

    • Totally agree. My point is that software security is a PL/programming issue, so all programmers need to be aware of it, not that security is only a programming issue and nothing else. The argument you make also goes in reverse: You can protect your keys and configure your VPN properly and set a good password but if you are vulnerable to remote code injection attacks due to poor programming, none of that matters.

  5. Tim Kertis

    The programmers aren’t the problem. The problem is with the languages themselves. These languages such as Java, C, and C++ were not designed for writing secure code. This shortcoming can be remedied by using a Secure Coding Framework. Google SCF and Kertis to find out more.

  6. In the same vein: http://langsec.org/

    I also like to think about what could’ve been and could be if something similar to the SmallTalk family was mainstream over the C family.

    To quote Dan Ingalls: “An operating system is a collection of things that don’t fit into a language. There shouldn’t be one”

    The philosophy of a program being the serialization of a running process instead of what is used to create one.

    The closest main-stream thing we have today are DBMS’s I think.

    • Someone on Twitter also pointed to langsec. That’s an interesting movement that observes that the parser is a place where validation ought to happen. Someone else on Twitter observed that types also capture invariants, and that type checking is (at run-time, in part) can help ensure/capture those invariants and insure they are preserved.

      As for the impact of SmallTalk: It’s a very different programming model, and I’m wondering how/whether it changes the security question. I.e., you still have to worry about communications and untrusted inputs, and possible leaks, etc. Perhaps your thinking is that all programs would naturally be doing this rather than thinking the OS is taking care of it?

  7. I interpret Language As an Operating System as:

    1. Representing OS features as language abstractions
    2. Implementation of those features in the language itself

    If it’s a suitably portable and expressive language, you’ve now gained
    a single domain of discourse to be modeled and enforced.
    Every program is instead a lightweight Virtual Machine in a sense.

    To quote from [2] section 3.2

    “””
    Conventional operating systems support multiple programs through a process abstraction.
    that gives each program its own control flow, I/O environment, and resource controls.
    A process is distinguished primarily by its address space, where separate address spaces
    serve both as a protection barrier between programs and as a mechanism for defining a
    program’s environment.
    […]
    In MrEd, separate address spaces are unnecessary for protection between programs, due to
    the safety properties of the programming language.
    […]
    Instead of providing and all-encompassing process abstraction, MrEd provides specific mechanisms
    for creating threads of control, dealing with graphical I/O, and managing resources.
    “””

    When it comes to communication between machines, I’d like to see if there can be some unification of
    the above ideas with COAST described in [5] by Roy Fielding and others. Definitely a door I’d like to
    see pushed against.

    Some Examples of Languages As Operating System:

    [1] SqueakNOS:
    [2] The MrEd VM:
    [3] The Lively Kernel
    [4] The Lively Web

    Other references:

    [5]

  8. vasko

    when type safety comes to mind the first thing i think of is Pascal. it had great impact on all high level and more modern languages than C. array out of bound exception, nested functions that are now becoming more popular with FP languages, the pascal type string that is copied by almost every OOP language… must be the most underrated language. of course, there was place for only one in the same position. it was either C or Pascal. i love them both and *whoever* creates another new/safe/better language, just to start with, he is no match for Niklaus Wirth, Dennis Ritchie and not to forget John McCarthy (LISP).

    • It also amazes me that _all_ the Pascal-family languages (Pascal, Oberon, Eiffel, Delphi, Ada etc. etc) allow user-defined scalar types that embody the problem domain, not the machine domain … e.g in Ada
      type Number_Of_Working_Engines is range 0 .. 4;
      plus named type equivalence.

      All of this means my static analysis tool can immediately reject any attempt to assign -1 or +5 (or any expression that _might_ evaluate to something outside of that range…) onto an object of this type.

      That all the C-languages (even Rust) omit this is a remarkable oversight on those language designers. I teach SPARK and Ada to C programmers a fair bit, and it takes _ages_ to convince them that there’s more to life than int or int32_t… it seems to be a real blindspot…
      – Rod

  9. System security is a pure political issue. The different states were interested in infiltrating private computer systems. In the modern conception of police states the state has no restrictions on violating privacy. All privacy rights exist only on the paper. It wouldn’t be any problem to construct a secure IT if the different states really wanted. As Individual or company it is impossible to construct security. You depend on to much external components that stupid (or bad) people have built.

    Since Trump things go better. As the left media mainstream hates Trump the control of the infiltration that the US have is more and more seen as a threat. The security holes are more and more closed to limit the power of Trump. If all states hold together security can be restitued and we are coming back in a civilized world with real privacy.

    • I definitely agree that security is a matter of incentives. Companies would actively adopt more security-enhancing development practices if there was a stronger disincentive for inaction. I observe that there are some engineering standards that must be followed, e.g., for civil engineers building bridges, but no similarly-enforced standards for software engineers building software. If there were, the situation might be better.

      In any case, getting to that place relies on training more people, and determining best practices. I’m not so convinced that we have the one true answer and companies are just not following it. We know a lot, but we need more research and education development. We in computer science and PL can be part of the solution.

  10. Pingback: Software Security is a Programming Languages Issue | MySafePick

  11. Pingback: Teaching Programming Languages (part 2) - The PL Enthusiast

  12. TS

    The fact that security works best if implemented from the ground up cannot be underestimated. Everything added on top afterwards is just clunky and often incomplete patchwork.

    However, there are several important things to add:

    1. Safety != security:

    A safe software environment is intended to avoid crashes, random or incorrect behaviour and data corruption. This is what this article primarily is about and where programming languages including their runtime environment can help a lot if designed right.

    However, even a perfectly safe and correct program might still leak sensitive information or provide attack vectors for malicious use: Side channel analysis, brute force attempts, bypassed authentification and missed access checks, network-based intrusions, bad or missing encryption, tricking the user,… etc. are hardly the domain of the programming languages itself but mostly one of the runtime environment as a whole, starting from the silicon up to the way the software is used in a social and organizational context.

    2. Run-time checks aren´t a proper improvement:

    Many newer languages happily throw exceptions whenever a null reference or out of range item gets accessed or some invalid conversion was requested.
    However, while this avoids data corruption or leakage it doesn´t solve the original problem that the program is still incorrect and might open up new attack vectors (just think of all those stack dumps happily dumping the intestines of the system to the user).

    A proper solution would instead avoid such problems by construction, e.g. by encouraging the use of more distinct types (non-nullable, non-interchangeable, not modifiable, input/output parameter declaration,…) and algorithms which avoid out-of-bounds accesses at all (range based loops, ready-to-use find/sort/accumulate/filter functionality, proper ownership/lifetime management,..)

    3. C and C++ are totally different in the regard of safety:

    While it is true that grandpas C with its pointers, C-style strings and simple type system opened up a huge dump of bugs this doesn´t apply to C++ as it offers a pretty strict and extensive type system and has other benefits still lacking in many other languages, even ones claiming to be designed for “security” and “easy-to-use”:
    * Types are non-nullable by default, something C# has only recently adopted
    * Immutable data and parameters (unfortunately, not by default but at least it is there)
    * A powerful template system which is inherently type safe and thus can be fully checked at compile time

    Following all best practices (often known as “modern C++” despite being possible for 20+ years already, the newer language features introduced later just added fluff for easier use) C++ allows for code which is both robust, easy to read as well as fast.

    3. The risk of source code modifications should not be underestimated

    Many serious flaws do not exist from the beginning in the source base when everyone usually takes a lot of care and tests thoroughly but get introduced accidentally later on:

    * Copypasting code and altering it is often done with less care than the original implementation, potentially missing to modify all important places correctly
    Also, code duplication increases the amount of code which is less or rarely used at runtime and thus has poor testing coverage and tends to get less attention during a review.

    * Merging code, especially when done with automatic tools like it is normally the case when using a source repository system, can cause source lines to get lost, duplicated or mixed up without noticing

    This is something where the language can make a large difference, the first issue can be helped by encouraging the use of abstractions following the “don´t repeat yourself” principle, the second one by designing the language in a way that tends to break soon if lines or constructs are messed up – e.g., variable scope should be as limited as possible (javascripts always-function-level-scope is pretty bad here, as well as the all-variables declared on top rule enforced by some older languages), beginning and end of a block should be as explicit as possible,…

    4. Importance of proper tools:

    Static types checked at compile time help a lot in terms of program correctness, but is not limited to the compiling stage:

    Static analysis can go beyond that, detecting unreachable and redundant code, dangerous constructs, unchecked use etc. At best it works right while typing, thus showing potential errors as soon as possible, and can cover API calls and works with dynamic or interpreted languages as well.

    Proper code suggestion/completion features also help as it encourages the use of longer, more descriptive and distinct names for objects and types without loosing productivity.

    While any language use can be improved with such tools the language design can make a big difference whether it is easy or hard to build such tools and how precise and complete they can get.

    5. It all ends at the API:

    Most real-world applications do not run in a cozy sandbox invented by its language designers but have to interact with a huge pile of external functionality, often following patterns which do not fit into the language well and providing many changes of using them wrong or in a potentially dangerous way.
    Standard functionality might be covered well as part of the language-specific runtime library, but less common or platform specific features will still sooner or later go beyond that.

    This is also the main reason why I favor deterministic reference counted and block-based approaches to lifetime management over (non-deterministic) garbage collection: The first approach often allows releasing external resources in the same way as the language-specific ones, while GC-based languages still require to release them explicitly – thus forcing you to do what they were supposed to solve automatically in the first place.

    6. Speed and Safety aren´t mutually exclusive:

    It is true that security features like encryption and access right checking introduce additional overhead. However, memory and type safety often sits in the same boat as performance specific features:

    A well-tuned generic reusable algorithm is often not only better tested but also better optimized than a repeatedly self-written one.

    More specific types with a more limited range not only help the compiler to spot more errors but also help the optimizer as it knows more about the data, thus extending the range of possible optimizations and giving better hints onto the preferred ones.

    A prominent example is string handling in C++: Both the standard string as well as third-party solutions usually use pascal-style strings with known length and feature some more or less sophisticated memory management internally, thus not only being safe and easier to use but also much faster than plain C-style strings involving often clunky memory management and the need for repeated length scans.

    A less obvious example are constant-time algorithms required for securing against statistic timing analysis: While this might at first sound bad for performance as it disallows early exit strategies it fits better for parallelizing and vectorizing the code.

    Overall, one can summarize that language (as well as runtime and tool) design should focus on correct and robust (=safe) programming in order to provide a solid foundation to build secure software on top of it rather than focusing on security directly, which would introduce additional counter-productive overhead and reduce flexibility.

    • Great points. Thanks for putting the time into writing this out!

    • Thanks for this lengthy commentary! Overall I don’t disagree with a lot that you are saying. I want to emphasize: I’m not saying all security problems are solved by the programming language. I am saying that software security should be of concern to programmers, and so all programmers should learn about it. That’s why we talk about it in a programming class. Moreover, the language can provide a lot of support to programmers trying to code securely; it doesn’t do all the work by itself. That’s why it’s not out of place in 330. So I very much agree with your summary in point 6 about the foundation the language should aim to provide.

      Some per-point responses:

      1. I disagree that the article is “primarily about” using type-safe languages to avoid crashes. This is certainly a theme, but the entire element of the article, and the course, on input sanitization goes beyond that. I.e., SQL injection, XSS, bypassing authorization using tricky inputs, etc. are not about crashes, but about trusting untrustworthy input. I don’t disagree that there’s more to security than these things. The point of the article is that *all* programmers should know something about security, and that’s why we teach it in a programming class. The language can also help, hence the appearance in a programming languages class.

      2. I agree that proving the absence of problems is superior to run-time checks. But run-time checks are efficacious: they are fail-stop, rather than permitting corruption/bad execution to continue in a way that can cause problems. Stopping an out-of-bounds array write on the spot stops the attack. On the flip side, proving the absence of problems can be costly, in terms of programmer time. We should aim to make it less so, but in the meantime, run-time checks are better than no checks.

      3. C++ certainly provides a lot of features that can help security. The ones you list are just some of them; smart pointers, and others are there, too. (I don’t know what you mean by the claim that the template system is type-safe; do you just mean in the sense of parametric polymorphism being type-safe?) However, my understanding is that C++ programs are still subject to a fair number of memory corruption issues (buffer overflows are *still* the single top CVE category in 2018, and use-after-free is pretty common, as are illegal casts), so despite these improvements, there are still serious problems. Relying on programmers to get it right has not worked. Type safety and type abstraction are very important for helping you reason, end to end, about the modules and abstractions in your program, and a lack of memory safety is a big hole in your ability to do that.

      4. Good points!

      5. Good point about reference counting and interacting with external objects. However, I don’t see the fundamental difference with GC. If you are using reference counting, the run-time system has to observe when the count goes to zero, and then call the right finalizer/handler to free the external object. How is that different than what happens with GC? I’m not familiar with the details here, so perhaps there’s something subtle going on.

      6. I really like your summary: “one can summarize that language (as well as runtime and tool) design should focus on correct and robust (=safe) programming in order to provide a solid foundation to build secure software on top of it rather than focusing on security directly”. Absolutely. Reiterating my point: This article has not said that the language is where all security happens. It is saying that *all programmers* need to be a big part of the software security solution, and therefore that they need to be educated in ways to code securely. The language can help, perhaps most usefully in the manner you describe.

  13. Great post. I think that it’s super important to emphasize early on in a programmer’s education the importance of avoiding “stringly typed programming”. You should be representing queries that you will be submitting to a database engine, and other similar data structures, in a composite manner from the start, e.g. as values of a recursive datatype or an abstract type, not as a string.

    Of course, to work with composite encodings ergonomically, you need notational support. Some languages have SQL query literals, but there are many different SQL variants and other query languages in the world, so addressing this problem at the language level really calls for an extensibility mechanism for literal notation. We have a paper coming out at ICFP next month based on my thesis work that defines just that for Reason / OCaml, complete with strong abstract reasoning principles:

    https://github.com/cyrus-/ptsms-paper/raw/master/icfp18/omar-icfp18-final.pdf

    We had an earlier design at ECOOP 2014:

    https://github.com/wyvernlang/docs/raw/master/ecoop14/ecoop14.pdf

    A related idea is to define regular string types (perhaps as a refinement of the string type) to be able to capture certain injection-related invariants statically. Nathan Fulton and I wrote a paper about that topic:

    https://github.com/cyrus-/papers/raw/master/sanitation-psp14/sanitation-psp14.pdf

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.