What is PL research and how is it useful?

If you are in the world of programming languages research, the announcement that UW had hired Ras Bodik away from Berkeley was big news. Quoting UW’s announcement:

Ras’s arrival creates a truly world-class programming languages group in UW CSE that crosses into systems, databases, security, architecture, and other areas. Ras joins recent hires Emina Torlak,[ref]We previously interviewed Emina on this blog.[/ref] Alvin Cheung, Xi Wang, and Zach Tatlock, and senior faculty members Dan Grossman and Mike Ernst.

And there’s also Luis Ceze, a regular publisher at PLDI, who ought to be considered as part of this group. With him, UW CSE has 8 out of 54 faculty with strong ties to PL. Hiring five PL-oriented faculty in three years, thus making PL a significant fraction of the faculty’s expertise, is (highly) atypical. What motivated UW CSE in its decision-making? I don’t know for sure, but I suspect they see that PL-oriented researchers are making huge inroads on important problems, bringing a useful perspective to unlock new results.

In this post, I argue why studying PL (for your PhD, Masters, or just for fun) can be interesting and rewarding, both because of what you will learn, and because of the increasing opportunities that are available, e.g., in terms of impactful research topics and funding for them.

What is PL Research?

When your hear that someone’s research area is programming languages, what do you think they do?

Some people I’ve talked to think that programming languages researchers work on, well, programming languages. They imagine someone developing a new esoteric, idiosyncratic programming language that few people (maybe just their hapless students) will use. Even if these people believe that developing new languages has merit, they still wonder whether such research is a waste of time because, frankly, aren’t existing languages good enough, and completely entrenched?

First, I should point out that not all academic/research-driven languages remain in the academic ivory tower. Scala, Haskell, Racket,[ref]Racket started as a dialect of Scheme, and from this point of view, it has enjoyed long-running real-world use.[/ref] and OCaml are at least four examples of academically driven languages that have serious use in the real world. (For more about PL adoption, see my previous posts on PL adoption research and reality.)

But the main point I want to make is that PL research is broader than designing and implementing new languages. To me, a PL researcher is someone who views the programming language as having a central place in solving computing problems. From this vantage point, PL researchers tend to focus on developing general abstractions, or building blocks, for solving problems, or classes of problems. PL research also considers software behavior in a rigorous and general way, e.g., to prove that (classes of) programs enjoy properties we want, and/or eschew properties we don’t. This approach has proven to be very valuable for solving a wide ranging set of problems.

Solutions via Simple, General-purpose Abstractions

The ethos of PL research is to not just find solutions to important problems, but to find the best expression of those solutions, typically in the form of a kind of language, language extension, library, program analysis, or transformation. The hope is for simple, understandable solutions that are also general: By being part of (or acting at the level of) a language, they apply to many (and many sorts of) programs, and possibly many sorts of problems.

As an example, consider probabilistic programming, which I’ve covered earlier on this blog. Here, the problem is general-purpose machine learning. A probabilistic programming language is a programming language extended with constructs for sampling from standard distributions (Gaussian, beta, Bernoulli, uniform, etc.) and observing (or assuming) possible outcomes. With these extensions, we can write programs that, under a standard semantics, act as a sampling functions for (non-standard) distributions. Then, we can apply a different semantics to a probabilistic program that interprets it as the distribution itself by allowing us to use Bayesian methods to learn its unknown parameters based on observations. Probabilistic programming is general-purpose, so it applies to many machine learning problems (and problems from other domains) while being simple and elegant.

There are many more examples of this approach. Here are two more.

Suppose I have a function F such that F(X) = Y. An incremental version of F would accept a small change to with which it could update the output Y with little extra work (compared to running re-running F on the entire modified input). Researchers can, and do, design novel incremental algorithms directly. For example, this 100-page 2002 PhD dissertation defines an incremental algorithm for computing the convex hull of a set of points. A PL approach to incremental computation is exemplified by Adapton,[ref]General-purpose incremental computation has a long history; Umut Acar‘s work on self-adjusting computation is notable in bringing it to maturity.[/ref] which defines a few simple language extensions and some run-time support, allows the programmer to write a normal-looking (non-incremental) Quickhull implementation and get an incremental version of it for free. Adapton is general-purpose so it can be used to incrementalize many algorithms, not just convex hull.

Our last example: An authenticated data structure (ADS) implements a form of verified computation, whereby an untrusted server can perform operations on the data structure and produce a proof with which a client can verify the operation was performed correctly. Cryptography researchers have proposed many such data structures, starting from (full, binary) Merkle trees, and progressing through red-black trees, B-trees, the Bitcoin blockchain, and more. Rather than treat each new ADS as a new research problem, the PL approach, exemplified by the LambdaAuth work previously considered on this blog, is to define a simple language extension with which you can program ADSs. The language metatheory ensures that the result produced by the LambdaAuth compiler is secure. And the language is general enough to be able to program the single-point results published previously.

And of course there are many other examples, from programming software-defined networks to information flow-secure software to compiler optimizations to program verification to program synthesis (e.g., per the multi-institution ExCape project). And of course I am only scratching the surface of the breadth of PL research generally, which consists of many exciting topics and directions.

Defining, and Understanding, Software Behavior

A key feature of making the above approach work is a deep knowledge of the semantics of programs. We have to be able to carefully state what programs do to be able to say that (e.g., when using a particular set of abstractions) programs do what we want and/or do not do what we do not want. Semantics is a decades-old topic, and still a rich area of research. But methods are mature enough to be applicable to most real-world problems while being easy to use.

A PL-minded approach to solving a problem often includes formulating the solution within a formal semantics (typically operational semantics[ref]Operational semantics is the most popular choice, but other forms of semantics, like denotational and axiomatic semantics, are also used.[/ref]) and then proving that all programs that employ the solution will behave as we would like them to.[ref]This approach boils down to the art of formalization, with proofs by structural induction (i.e., not just on mathematical numbers, but on inductively defined structures like syntax trees and proof derivations). This characterization is the neat summary of Sumit Gulwani; his recent CACM article also shows how PL-style thinking can be broadened outside of computer science/programming problems, e.g., to education and computational thinking.[/ref] For example, for probabilistic programming, Claret et al formulated the operational semantics of probabilistic programs as performing sampling, and then proved that their algorithm to infer the distribution denoted by that program matches the distribution of all possible (sampling) executions. For incremental computation, Hammer et al defined the semantics of normal execution in Adapton and proved that their algorithm for incremental update, embedded as a choice within that semantics, produces the same outcome as re-running from scratch. For authenticated data structures, Miller et al formulated the semantics of their language extension from the point of view of both server and client and proved that if the server attempts to lie about the result, the client will detect it with high probability. Importantly, these proofs apply to all (well-defined/well-typed) programs in the language, not to just particular algorithms.

The above examples target particular problems using extended languages and perhaps novel semantics. Another way to “understand software behavior” is to take a program in a standard language and prove a program-specific property mechanically, and ideally automatically, by using techniques such as static analysis, dynamic analysis, and formal mechanization (e.g., in Coq). Many of the success stories of PL, such as the ASTREE static analyzer for proving the absence of run-time errors in C programs and the SLAM model checker that is the core of Microsoft’s Static Driver Verifier, and the CompCert verified compiler, come from this tradition. Importantly, advances in ways to understand standard-language behavior cross-pollinate with ideas for developing better languages and language abstractions for addressing new problems.

PL+X is an effective combination

I’ve picked the three examples in the above discussion for a reason: They are an approach to solving general problems using PL-minded techniques in combination with techniques from other communities, like machine learning inference algorithms or cryptography. Returning to UW, several of the people listed in the announcement do not self-affiliate with the PL community, but rather with communities like Systems (Xi), Databases (Cheung), Software Engineering (Torlak), and Architecture (Ceze). But they point to PL as a key enabler of what they do. For example, Cheung states,

I work on tools that make use of programming language techniques to improve application performance.

And Xi states,

My research interests are in building secure and reliable systems. Past work: Jitk verified in-kernel interpreter, the Stack undefined behavior checker, the Retro intrusion recovery system, and the Resin language runtime for securing web applications.

Conversely, many “PL people” I know bring their PL-style mindset and expertise to new domains, with much success. For example, Dave Walker and Nate Foster have had a huge influence in the software-defined networks world via the Frenetic project. Andy Gordon and Johannes Borgström are leaders on applying probabilistic programming languages to machine learning with their Fun and Tabular languages. Umut Acar and Ozgur Sumer brought their expertise in incremental computation to solve open problems in adaptive Bayesian inference.

One notable success in bringing a PL perspective to an old problem is in the area of program synthesis, and Sumit Gulwani‘s FlashFill work, introduced in the paper Automating String Processing in Spreadsheets Using Input-Output Examples, from POPL’11. The problem of program synthesis has been around for a long time in the AI community (machine learning is essentially trying to synthesize a function using examples), and Sumit’s PL perspective was important to cracking it successfully. Sumit says:

With regards to comparison with AI: In AI, we typically perform heuristic based search. Whereas FlashFill is an instance of search based on deductive methods, where we systematically compute the set of all/many programs that are consistent with the examples that the user provided. The set of all such programs is the space over which ML based ranking techniques operate. I regard the deductive methods and the data structures for succinctly representing a huge set of programs (also called “version space algebras”) as PLish concepts.

FlashFill is now, of course, a feature of Excel. Sumit gave a nice talk about it, and his approach to the work, at PLMW’15.

Looking ahead

I hope I have convinced you by this point that PL research brings a very useful perspective to solving general problems. But if you study PL, or hire someone who does it, are their prospects really that good?

In terms of resources, I am struck by how much research funding is going toward problems that can readily incorporate a PL perspective. DARPA I20 has had a flurry of BAAs come out in the last few years for which PL should play, or is playing, an important role. I count at least 12 programs (APAC, CRASH, CSFV, HACMS, CFAR, CGC, STAC, BRASS, BRANDEIS, VET, PPAML, and MUSE) and probably there are more. These programs focus on problems in systems, software engineering, computer security, and machine learning. NSF also has a large funding profile for PL work, e.g., via SHF and SATC. And of course funding is available through AFOSR, ARL, IARPA, and many other agencies in the US and abroad.

In terms of jobs, I take the UW CSE hiring as a positive sign. Other great places regard PL highly as well. Microsoft Research’s (MSR) Redmond lab’s RiSE group is 30 researchers, out of 300 total there, and MSR Cambridge also has an incredible, and sizeable, PL group. CMU, Indiana, Northeastern, and UT Austin also have been actively growing the size of their PL groups (see Swarat’s ranking app). I did an informal poll of PL PhD graduates that were on the market this season, and people are taking faculty jobs (or have offers for such jobs) at CMU, Grinnell, Hendrix College, UIUC, Pomona, Princeton, University College London, University of Colorado (Boulder), and Wisconsin; and are taking industry jobs at Apple, Facebook (actively using PL research), Google, Intel Labs, Microsoft, Oracle, and Samsung. People are also taking post-doctoral positions at places like Imperial College London, National University of Cordoba, Northwestern, UCSD, the University of Cambridge, and the University of Maryland.

Today, there are many exciting and important problems in the public eye, from cybersecurity to data science to quantum computing to green computing to health informatics. PL research can bring, and is bringing, a great perspective and skillset to these problems, unlocking groundbreaking results. Check it out!

7 Comments

Filed under Research

7 Responses to What is PL research and how is it useful?

  1. Pingback: UW CSE News » What is Programming Language research and how is it useful?

  2. Incidentally, a very senior systems researcher told me that systems research is also no longer about operating systems, or even distributed systems, but about how to apply computing power to solving problems, and providing the proper abstractions to do so.

  3. Another nice example was the Metronome real-time Garbage Collector project at IBM Watson (David Bacon, Perry Cheng, etc):

    http://researcher.watson.ibm.com/researcher/view_group.php?id=174

    which had both a POPL’03 paper (also recognized with most influential paper) as well as a tremendous practical impact on the company.

    The work combined various efforts, both within IBM as well as Perry Cheng’s PhD thesis at CMU (advised by Guy Blelloch and Bob Harper).

  4. Pingback: "What is PL Research?" The Talk - The PL Enthusiast

  5. fauzanbaig16@gmail.com

    Is there a getting started guide on Programming Language Research, as an “industry programmer” there aren’t many who focus on this domain as much?

    • Great question. There is no single, go-to source, I don’t think. I’ll ponder your question and think about writing a post on this question. It’s sort of a companion this post on what the research area is about. To make sure I capture your interest: Which parts of what was written in this post are you most interested in?

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.