In my last article I discussed how the failure to find the Heartbleed bug sooner was in some sense a failure to refine or deploy what is otherwise effective technology for static analysis. In particular, commercial static analysis tools purposely will ignore potential bugs so as to avoid reporting too many false alarms, i.e., favoring completeness over soundness. The companies that make these tools aim to provide a profitable service to a broad market, and their own investigations indicate soundness is not important for sales. Instead, to be viable, tools must help developers find real, important bugs efficiently, and not necessarily every bug. A challenge to researchers is to find ways to push the business proposition back toward soundness while retaining efficiency (and other desirable criteria); Andy Chou’s POPL’14 keynote outlines other useful challenges.
While Heartbleed is ostensibly about the adoption and improvement of static analysis, in this article I explore the related question of fostering the adoption of programming languages. I summarize impressive research by Leo Meyerovich and Ariel Rabkin on adoption research questions and adoption practices that appeared at OOPSLA’12 and OOPSLA’13, respectively. I think there are some interesting results here, with implications for improving the adoption of languages. Their results also raise new questions for further research (but too late for yesterday’s POPL deadline — good luck to all submitters!).
Research and language adoption
Researchers (and the government agencies and companies that fund them) have expended significant resources on the study, and invention, of programming languages and their features.
Some of the results have been adopted in practice. For example, the languages Haskell, Scala, and OCaml started as academic research projects and have seen mainstream use. Mainstream languages have also adopted researcher-designed features such as garbage collection, exceptions, closures, type inference, parametric polymorphism (generics), and more, though often after a decades-long delay. Nascent languages Swift and Rust have continued this trend, causing PL research icons like Bob Harper to express hope that good ideas from research eventually do take hold.
The question is: Why do some languages succeed (in getting adopted) where others fail? Answering this question would help researchers do more impactful research, either by packaging their work better, or by changing it to address problems they hadn’t appreciated.
Ariel Rabkin and Leo Meyerovich, together as graduate students at UC Berkeley, decided to attempt to answer this question. They refer to their investigations as SocioPLT.
At OOPSLA’12 they published a research agenda: the paper contains observations from PL history, some comparisons to related fields (such as the general theory of diffusion of innovations), and some particular hypotheses and research questions.
At OOPSLA’13 they published some results: this paper (which I summarize below) answers several questions posed in the first paper using survey data and source code analysis.
There were three surveys. One was carried out at the outset of a massively open online course (MOOC) on a software as a service (SaaS), garnering 1,142 responses; one came from a website called The Hammer Principle which allows respondents to compare languages in various ways, garnering roughly 13K responses; and one came from a link via Slashdot that announced a visualization of the data from the Hammer survey, garnering 1,679 responses. The survey participants were largely professional developers; the median age of the MOOC participants was 30, and that of the Slashdot participants was 37.
The source code analysis considered 217,368 projects hosted by Sourceforge between 2000 and 2010, considering in particular project metadata including the programming languages used, the primary project category (e.g., accounting), date of creation, and the project’s owners. They also considered data from Ohloh that tracks over 590,000 projects hosted on SourceForce, Github, and elsewhere, and supports fine-grained queries about project contents.
Which languages are most preferred?
The six most popular languages used in Sourceforge projects are probably not surprising: Java, C++, PHP, C, Python, and C#. Overall, the use of languages follows a heavy-tailed power law, with the top six languages accounting for 75% of projects, and 20 languages accounting for 95% of projects. The top six are diverse in their character. Java and C# are statically typed (i.e., they must be deemed type correct before they run), PHP and Python are dynamically typed (i.e., type errors are caught during execution), whereas C and C++ are weakly typed (i.e., objects of one type can be treated, perhaps incorrectly, as if they had another type). Java, C#, and C++ are all object-oriented languages.
Notably, no functional languages were in the top 20. I found this surprising (Tcl is more popular!) given various stories I’d heard, such as from the hype surrounding Microsoft’s release and support of F#, the use of OCaml at Citrix/Xen and Jane Street Capital, and the use of Haskell by financial firms like Morgan-Stanley. Obviously I was overgeneralizing these data points. On the other hand, perhaps the data is somewhat stale: the SocioPLT top 20 comes largely from 2010 data, and arguably there has been an increase in interest in functional programming since that time.
What factors correlate with adoption?
The paper shows convincingly that the single factor that most strongly correlates with both preferring and actually using a language is good libraries, particularly open source libraries. This is not surprising to me. Imagine Ruby without Rails. Ruby was released in 1995 and Rails in 2004: had you heard of Ruby before you heard for Rails? Or, imagine Java without the collection libraries, or the more recent concurrency libraries. Using the language before these things existed was just a lot more painful.
One interesting result was that simplicity was the least ranked of the factors respondents deemed important, identified by about 25% of respondents, compared to 60% for libraries. Safety/correctness was deemed important by nearly 40% of respondents. So if we take this result at face value, programmers are willing to deal with a complicated language in order to get other benefits, like correctness. On the other hand, there are mixed messages here. For example, development speed was important to 40% of respondents, and we would think that simplicity would help that. Perhaps the definition of the word “simplicity” is key. The lambda calculus is very simple by one definition (syntax and semantics), but using it to write Windows is not a simple task!
Interestingly, a language’s performance was not in the top five factors when choosing it for a project. Instead, other extrinsic factors dominated, including the language used for existing code bases and the experience and comfort of programmers on the project development team. On the other hand, when asked why they prefer a language independent of its use for a particular project, respondents favored performance just below good library support. Perhaps these results are consistent in that many projects are written in high performance languages, and many people are familiar with those languages, so the extrinsic factors tend to line up with good-performance languages. In general, developers claim to enjoy languages the believe are expressive, and produce elegant code.
The PL research community thinks a lot about (static) types; e.g., see Benjamin Pierce‘s well-respected book, Types in Programming Languages. The survey results show that developers place comparatively less value on static types. According to the MOOC survey, only 36% of respondents “see the value” of static types, and only 18% “enjoy using” static types. Unfortunately, the MOOC survey population is probably biased in favor of dynamic languages given the MOOC course topic on Software-as-a-Service. Indeed, the Hammer survey showed a more positive view of types, with statically typed languages strongly correlated with statements such as “If my code in this language successfully compiles there is a good chance my code is correct.” However, this survey agreed with the MOOC survey on the lower developer preference for statically typed languages.
Education had a strong influence on whether respondents knew functional or mathematical languages, but little influence on whether they knew imperative/OO or dynamic languages. For example, respondents who had seen functional programming in college claimed to know a functional language 40% of the time, whereas those who had not seen one in school only knew one 15% of the time. Those who had seen an imperative/OO language in college knew one 95% of the time, but those who hadn’t knew one 87% of the time. This makes sense, given the state of language popularity and the language decision making process: If most code is in Java/C/C++ (imperative/OO) and most new projects are strongly influenced by the language of past projects, then most developers will (by now) know Java/C/C++, and this familiarity will further strengthen the preference for Java/C/C++ in the future. This pattern could ensure that if you did not see a functional language in school, you might never see one.
There are many other results in the paper that I have not covered; I encourage you to read it. All of the results provide useful food for thought for PL researchers when aiming to increase adoption.
The most obvious thing to do is focus on libraries, broadly construed (e.g., think of Rails as a library). This is already happening in some cases: If a functional language is to break into the top 20, then perhaps it will be Haskell due to the rise of Hackage, or OCaml due to the rise of OPAM. Scala, which supports functional programming paradigms, almost certainly got a bump in popularity by interfacing easily with Java’s libraries.
In terms of pushing the benefits of types, perhaps we can have our cake and eat it too: Aim for both the expressiveness of dynamic typing and the documentation and safety benefits of static typing (both of these benefits were recognized in the survey). One way to do this is to push research on scripts to programs and gradual typing, which aim to make static typing optional, but in a sensible way. Academic languages like Racket, and industrial languages like Typescript, have adopted this approach.
It is important to note that none of the results I’ve summarized consider the actual effectiveness of languages, just the state of their use and programmers’ stated preferences. It would be very interesting to attempt to gather evidence of effectiveness, and use that evidence to motivate change.
I’ve heard Joe Armstrong tell the story that in the early days of Erlang‘s development at Ericsson they had two teams build the same system, one in Erlang and one in C++. The Erlang system was completed successfully and the C++ project kept missing deadlines and was ultimately abandoned. This experience led Erlang to be adopted company-wide (though that mandate has since lapsed). The motivation for the ICFP programming contest was in part to provide similar evidence, but I do not know if an analysis of outcomes has ever been done. Gathering evidence for effectiveness is also behind our Build-it, Break-it, Fix-it contest; we’ll see what happens there.
Another obvious next step is to continue to perform SocioPLT research and use it to motivate the technical research the PL community is already doing. There are many open questions in the OOPSLA’12 paper, and much validation still to be done on the results I’ve summarized above.