This is my second post on UMD CS’s programming languages course, CS 330.
We introduce Ruby and OCaml as exemplars of dynamic/scripting and functional languages, respectively, in the first half of the course. I described these in the first post. This post covers the third quarter of the course. This part goes over technologies that underpin a language’s design and implementation, in particular regular expressions, finite automata, context-free grammars, and LL-k parsing. It also looks at the lambda calculus as a core model of computation, and operational semantics as way of precisely specifying what programs mean. The last part of the course discusses the basics of software security and coding strategies for avoiding various vulnerabilities. I will dedicate a separate post to these last topics.
I’m very curious for your feedback on the course overall. I welcome suggestions for improvements/adjustments to the content!
Last week I attended the 44th ACM SIGPLAN Symposium on Principles of Programming Languages (POPL 2017). The event was hosted at Paris 6 which is part of the Sorbonne, University of Paris. It was one of the best POPLs I can remember. The papers are both interesting and informative (you can read them all, for free, from the Open TOC), and the talks I attended were generally of very high quality. (Videos of the talks will be available soon—I will add links to this post.) Also, the attendance hit an all-time high: more than 720 people registered for POPL and/or one of its co-located events.
In this blog post I will highlight a few of my favorite papers at this POPL, as well as the co-located N40AI event, which celebrated 40 years of abstract interpretation. Disclaimer: I do not have time to describe all of the great stuff I saw, and I could only see a fraction of the whole event. So just because I don’t mention something here doesn’t mean it isn’t equally great.
[This blog post was conceived by Steve Chong, at Harvard, and co-authored with Michael Hicks.]
Enforcing information security is increasingly important as more of our sensitive data is managed by computer systems. We would like our medical records, personal financial information, social network data, etc. to be “private,” which is to say we don’t want the wrong people looking at it. But while we might have an intuitive idea about who the “wrong people” are, if we are to build computer systems that enforce the confidentiality of our private information, we have to turn this intuition into an actionable policy.
Defining what exactly it means to “handle private information correctly” can be subtle and tricky. This is where programming language techniques can help us, by providing formal semantic models of computer systems within which we can define security policies for private information. That is, we can use formal semantics to precisely characterize what it means for a computer system to handle information in accordance with the security policies associated with sensitive information. Defining security policies is still a difficult task, but using formal semantics allows us to give security policies an unambiguous interpretation and to explicate the subtleties involved in “handling private information correctly.”
In this blog post, we’ll focus on the intuition behind some security policies for private information. We won’t dig deeply into formal semantics for policies, but we provide links to relevant technical papers at the end of the post. Later, we also briefly mention how PL techniques can help enforce policies of the sort we discuss here.
Buggy software doesn’t work. According to wikipedia:
A software bug is an error … in a computer program or system that causes it to produce an incorrect or unexpected result, or to behave in unintended ways. Most bugs arise from mistakes and errors made by people in either a program’s source code or its design ...
When something is wrong with a program, we rarely hear of it having one bug — we hear of it having many bugs. I’m wondering: Where does one bug end and the next bug begin?
To answer this question, we need an operational definition of a bug, not the indirect notion present in the Wikipedia quote.
This post starts to explore such a definition, but I’m not satisfied with it yet — I’m hoping you will provide your thoughts in the comments to move it forward.
In my last post, I summarized some of the topics and problems considered at a recent Dagstuhl seminar I co-organized on the Synergy between Programming Languages and Cryptography. The post surveyed how programming languages often interface with cryptography in the construction of secure systems, and in particular how they can make it easier to implement cryptography, use it, or verify its correctness.
Beyond using PLs as a tool for easier/safer use of Crypto, there is an opportunity for certain kinds of thinking, or reasoning, to cross over fruitfully between the PL an Crypto communities. In particular, both communities are interested in formalizing systems and proving properties about them but they often use different methods, either due to cultural differences, or because the properties and systems of interest are simply different. During the workshop we identified both analogous, similar styles of reasoning in two communities and connection points between the different styles of reasoning. In this post I briefly highlight a few examples of each, and point to future research opportunities.
In this post, I introduce the emerging area of probabilistic programming, showing how probabilistic programs will hopefully make it easier to perform Bayesian-style machine learning, among other applications. Probabilistic programming is an exciting, and growing, area of research, with fantastic people in both AI/ML and PL working together and making big strides. PL methods — including formal semantics, optimization techniques, and forms of static analysis — have proven very useful in advancing this area forward.
In response to my previous post defining memory safety (for C), one commenter suggested it would be nice to have a post explaining type safety. Type safety is pretty well understood, but it’s still not something you can easily pin down. In particular, when someone says, “Java is a type-safe language,” what do they mean, exactly? Are all type-safe languages “the same” in some way? What is type safety getting you, for particular languages, and in general?
In fact, what type safety means depends on language type system’s definition. In the simplest case, type safety ensures that program behaviors are well defined. More generally, as I discuss in this post, a language’s type system can be a powerful tool for reasoning about the correctness and security of its programs, and as such the development of novel type systems is a rich area of research.
I am in the process of putting together a MOOC on software security, which goes live in October. At the moment I’m finishing up material on buffer overflows, format string attacks, and other sorts of vulnerabilities in C. After presenting this material, I plan to step back and say, “What do these errors have in common? They are violations of memory safety.” Then I’ll state the definition of memory safety, say why these vulnerabilities are violations of memory safety, and conversely say why memory safety, e.g., as ensured by languages like Java, prevents them.
No problem, right? Memory safety is a common technical term, so I expected its definition would be easy to find (or derive). But it’s much trickier than I thought.
My goal with this post is to work out a definition of memory safety for C that is semantically clean, rules out code that seems intuitively unsafe, but does not rule out code that seems reasonable. The simpler, and more complete, the definition, the better. My final definition is based on the notion of defined/undefined memory and the use of pointers as capabilities. If you have better ideas, I’d love to know them!