In the computing stack, PL sits between algorithms and systems. Without algorithms to implement or computer systems to run them on, there would be no need for programming languages. However, the research communities that study algorithms, PL, and systems don’t really have much of an overlap. This is perhaps unavoidable: computer science is now a mature field, and researchers in mature fields tend to pursue specialized and technical research questions.
At the same time, it seems important that the approaches — assumptions and methods — of different subfields of computing be compatible to some extent. At the end of the day, computer science aims to produce solutions for the whole computing stack. An “impedence mismatch” between its subfields compromises our ability to come up with such end-to-end solutions.
This suggests that the comparative study of assumptions, techniques and cultures of different CS fields (a sort of “comp lit” for computer science) is potentially valuable.
Personally, I have always been intrigued by the relationship between the fields of programming languages and algorithms. In this post, I discuss similarities and differences between these two areas, and argue that their synthesis could be interesting for both research and teaching.
What do PL researchers study?
To compare PL with other fields, we need to first decide what PL is. I can think of a few themes that run through our field:
- Language design. Unsurprisingly, the design of programming languages is a fundamental goal of PL research. A designed language may be a real, usable language, or a “calculus” that captures the mathematical core of a bigger language. In either case, a key challenge is semantics: the rigorous definition of how programs are evaluated. Such definitions are not arbitrary. Typically, researchers aim to guarantee that programs in their languages satisfy some desirable mathematical properties, for example memory safety or data race-free concurrency. Commonly, a language designer also wants to exploit practical features of hardware on which programs execute and the application domain where the language is used.
- Reasoning about programs (“verification”). Rigorous reasoning about program properties — for example, verifying termination or deadlock-freedom — is a major theme in PL. Typically, PL researchers want to accomplish this task in an algorithmic or at least semi-automated way. As automated analysis of this sort is undecidable for Turing-complete languages, PL researchers focus either on weaker classes of languages where proving properties is decidable (model checking), or on sound but incomplete methods (type systems and static analysis), or on semi-algorithms guided by human programmers (interactive theorem proving).
- Compiler design. A programming system must generate executable code, and designing algorithms that can do so is another core PL problem. Traditional challenges here include parsing and program optimization. In recent times, research on this theme has expanded to designing verifying compilers that can reason about correctness, or program synthesizers that use sophisticated search algorithms to generate executable code from nondeterministic specifications.
What do algorithmists study?
Now let’s consider the field of algorithms. Any algorithms research starts with a model of computation on which algorithms are to execute. Typically, such a model distills the essence of a real-world computational setting, and constrains what an algorithm is allowed or not allowed to do. For instance, in the so-called streaming model of computation, an algorithm can only process a few items in an input stream at one time. In sublinear algorithms, the algorithm cannot read its input data fully (capturing real-world intuitions about large datasets), but is permitted to probe or sample from this data.
Any given model of computation comes with a rigorously defined notion of efficiency. The algorithms researcher’s goal is now to come up with algorithms that are provably efficient. Over the years, algorithmists have studied many different ways to make algorithms efficient. For example, a common approach to achieving efficiency is to weaken the guarantees that the algorithm offers. In particular, randomized algorithms are allowed to have one-way or two-way probabilistic error in their outputs. Approximation algorithms must solve hard global optimization problems in polynomial time, but are allowed to produce outputs that are only within an approximation factor of real optima.
PL and algorithms: technical differences
There are several commonalities between PL and algorithms and, to my mind, one key difference.
One obvious commonality is the object of study: an algorithm and a program are basically the same thing (algorithms are commonly defined to be programs that terminate on all inputs, but this is a technical detail). Second, the models of computations that algorithms researchers assume are, fundamentally, nothing but programming languages: a demarcation of the set of permissible programs/algorithms and a specification of their dynamics. One could think of streaming algorithms or sublinear algorithms as the classes of programs expressible in domain-specific languages with restrictions on how a program accesses data.
The fundamental difference between algorithms and PL, it seems to me, lies in a quantifier:
An algorithms researcher’s goal is to produce an algorithm/program with desirable properties. In contrast, whether a PL researcher is working on language design, verification or compilers, he or she aims to come up with proofs and procedures that apply to all algorithms that can be written in a language.
For instance, a program verifier is a procedure that can take an arbitrary algorithm (within a certain model of computation) and check that it satisfies certain properties. The designer of a language with guarantees of memory safety aims to guarantee a theorem about the set of all algorithms possible under their model.
Interestingly, this makes the field of PL closer to complexity and theory of computation, which study properties of classes of algorithms. Think of two elementary statements from complexity theory: “every comparison-based sorting algorithm must have Omega(n log n) complexity” and “PSPACE = NPSPACE”. The first statement is a meta-theorem about a programming language whose semantics comes with a quantitative model of resource consumption (that comparisons of values can be done in unit time). The second is a comparison between two different programming languages.
PL and algorithms: cultural differences
Other differences between PL and algorithms are cultural rather than technical.
The first such difference is that PL research is usually designed to maintain a path to execution on real computers. Models of computation in the algorithms literature are an artifact of analysis. In contrast, even the core languages designed by programming linguists tend to have the eventual goal of implementation. It also seems fair to say that PL researchers implement their ideas more often than algorithmists do. Many if not most POPL papers come with an experimental evaluation. SODA papers, on the other hand, rarely do. One corollary of this is that algorithmists are more susceptible to conflating models with reality. As one example, most algorithms hold polynomial time complexity as the notion of efficiency. However, NP-hard problems are not always hard to solve in practice — for instance, boolean satisfiability may be NP-complete, but SAT-solvers work very well on large formulas in many domains. The empirical focus of PL researchers has allowed them to leverage such practical tools.
A second cultural difference between algorithms and PL lies in their attitude towards abstraction, and their definition of what makes an algorithm/program desirable. Models of computation in the algorithms and complexity literature tend to be low-level, without the abstractions available in most modern programming languages. The main objective of design is efficiency, defined in terms of the number of low-level steps taken by executions.
As Bob Harper argues in his talk on “two notions of beauty in programming”, PL researchers often have a different take on the cost of a computation. For many PL researchers (especially those who work on language design), the true notion of a computation’s cost is the effort involved in programming and reasoning about programs. A desirable computation or proof is one that can be expressed effectively using high-level programming abstractions, and composed easily with other computations/proofs. The design of new languages or models of computation is driven by an interest in making programs more desirable in this sense.
A third, related difference is that algorithms research is traditionally much heavier on quantitative reasoning. Notions of efficiency studied by algorithmists are easy to quantify. Also, in contemporary algorithms, notions of correctness are often randomized or approximate, so that correctness analysis also requires quantitative reasoning. In contrast, PL research has historically focused on reasoning about boolean properties such as type safety, deadlock-freedom, and termination. It is true that there is a growing literature in the PL world on quantitative reasoning about programs (including reasoning about probabilistic and approximate computation), but this kind of work is still in the minority.
Finally — and I realize that I may get some flak for saying this — it seems to me that algorithms researchers are historically much more agile than PL folks in identifying new application areas. There are many thriving research topics of the form “x + algorithms”: algorithmic genomics, algorithmic game theory, and network science are only three prominent examples. Algorithms researchers have been able to distill interesting algorithmic problems from these applications and come with many exciting technical results. In contrast, PL researchers have tended to focus on classic applications in system software, or chosen to be application-agnostic.
The technical differences between algorithms and PL are innate — there is after all a reason why these are different areas. However, the cultural differences between PL and algorithms exist primarily for historical reasons, and it seems that a synthesis of the “positive” cultural attitudes of PL and algorithms can benefit both fields.
Specifically, it seems clear to me that there’s a “PL angle” to most application domains that algorithmists study, and applying it would benefit of the entirety of computer science. Take, as an example, cryptography. The vast majority of work on cryptography is algorithmic, and at a theoretical level there’s nothing wrong about these algorithms. However, as we saw from the Snowden revelations, because of vulnerabilities at the software and hardware level, guarantees at the level of abstractly defined algorithms may not transfer to implementations of these algorithms. PL techniques could have potentially prevented these errors from happening.
It seems to me that we can amend our list of activities of PL researchers as follows:
- Language design. Come up with high-level programming abstractions for application domains where there are interesting algorithms to be designed. Consider any property of the model of computation that the algorithm designer cares about. For example, this property could be “A program can only read k items of a data stream at a time” or “A program can sample from certain probability distributions” or even “A program must terminate in polynomial time”. Make sure that every program in the language actually satisfies this property.
- Reasoning about programs. Consider any of those rich properties of algorithms that algorithms researchers like to prove. For each such property, there’s a program analysis/verification problem: show that a given program/algorithm satisfies the property. Properties of interest could include bounds on the algorithm’s complexity, optimality of the algorithm’s output, approximate optimality, and bounds on the algorithm’s probabilistic errors.
- Compiler design. Design compiler transformations and optimizations that do not compromise guarantees developed at the algorithmic level, and instead preserve them all the way to executable code.
There are quite a few existing PL research efforts that can be said to fit this agenda. Examples include recent work by PL researchers on reasoning about statistical privacy, verification of cryptographic protocols, and analysis of probabilistic programs. In each of these cases, our community targeted an algorithmic application and extracted PL problems from them. I would like to see many more efforts like this, in areas ranging from biology to graphics to robotics to economics (disclosure: PL approaches to robotics and computational economics are two of my current research interests). The technical problems of interest in these areas are likely to be quantitative, which opens up possibilities of generalizing the qualitative techniques that PL researchers have traditionally used.
There are also pedagogical benefits to a synthesis of PL and algorithms. For one, it would tie the undergraduate sequence of programming courses to undergraduate theory classes. An effort of this sort is CMU’s two-course introduction to functional programming and algorithms (15-150 and 15-210), which emphasizes the benefits of functional program abstractions while teaching algorithms and data structures. For another example, PL ideas are a natural fit for a theory of computation class. Today, the standard model of programs in a class on automata and computability is the Turing machine, which is far removed from the abstraction-rich programming languages in which students implement their algorithms. Complementing the Turing machine model with recursive functions and the lambda-calculus would partly bridge this gap. Finally, finite and pushdown automata are presented in undergrad automata theory using old-fashioned applications in parsing. These applications can and should be complemented with the use of automata in modern program verification, as models of software and hardware systems.
All in all, it seems clear to me that bridging some of the gaps between PL and algorithms is both possible and desirable. Some encouraging steps in this direction have already been taken, but much more remains to be done. Perhaps the best way to develop synergy between the fields is to create funding opportunities for research at their interface. Bob Harper’s recent NSF whitepaper, which builds on the talk of his that I mentioned earlier, is an example of an effort to mobilize such funding. A less ambitious goal is to create fora with representation from both communities. I personally can think of worse ways to spend a few days than hanging out with some algorithms researchers, hearing about recent exciting results in their field and thinking about their language-level analogs.
[Note: This article has been edited since it was originally posted.]