Scientific results intersect with all aspects of the modern world, and they underpin many important decisions made at the level of governments, corporations, and individuals. When you read about a scientific result—like the correlation of a particular gene’s mutation with a certain disease, or that it’s better for children to engage in unstructured play rather than structured learning at early ages—why should you believe it?
If you were an expert, you might be able to (with time) capably judge the work itself. But what if you are a non-expert, which is increasingly likely as branches of study become hyper-specialized? In this case, you are left to trust the scientific process, i.e., the way in which scientific work is judged to be true and important. At the heart of this process is peer review.
In this post I describe elements of the peer review process we use in scientific research about programming languages.
Of particular note is PL’s heavy use of peer reviewed conferences. Like other areas of computer science, conferences tend to be the main dissemination vehicle of the best results, and as such use rigorous peer review processes to select them.
While processes are important, we cannot lose sight of the critical role played by peer reviewers. Just like a democracy relies on motivated, informed, and conscientious citizens to vote for the best leaders, science relies on similarly motivated, informed, and conscientious peer reviewers for bringing the best results to light. In a future post, I will present advice for reviewers on writing what I believe are high quality peer reviews.
What is peer review?
Peer review is the process by which a scientific result is (initially) accepted by the scientific community. This process is straightforward.
A group of scientists carries out some research on a particular topic and writes a paper explaining their results. This paper is submitted to a venue that publishes scientific results. After an initial judgment by an editor or program chair (after a quick check of topic appropriateness), the paper is subjected to peer review: a group of peers with knowledge in the topic of study are given the paper to review. After reading the paper, they render a judgment about whether the paper should be accepted for publication. This judgment is based on whether, in their view, the result
- is sufficiently important or thought-provoking,
- is correct, and
- whether it is well presented, i.e., so that it can be understood by the community that the publication venue represents.
Whether the paper is accepted or not, reviewer comments are sent back to the authors (anonymously) so that the authors can improve the paper and the work.
Conferences and journals
Peer review is nearly always employed by academic journals. Typically, a journal has an editorial staff, consisting of respected researchers in the field, who solicit peer reviews, and based on them decide whether to accept a paper. Acceptance might involve several rounds of review, each following paper revisions, so that the reviewers can confirm that any flaws they identified previously have been fixed. Reviews could take weeks to months to perform.
In computer science, peer review is also employed by conferences. This can be a source of confusion for those outside of computer science because in other fields, conferences rarely employ peer review. Not only are CS conference publications reviewed, the review processes are often extremely rigorous, and on par with processes used by top journals. There has been a debate for many years about whether conference review processes should be made more like journal processes, with strongly held beliefs on either side.
Peer review is the gateway, not the final judge
Whatever the peer review process, it will never be perfect. As such, publication of a scientific paper is the first important step of establishing the validity of the work, not the last. What comes next is vigorous discussion within the scientific community, and the general public, as well as followup research to continue to explore the validity of the published results.
Peer review in programming languages research
Top venues in PL employ rigorous peer review processes.
In fact, the steering committees of two of PL’s flagship conferences, POPL and PLDI, have recently issued documents, called Principles of POPL and Practices of PLDI, that describe a “contract” with authors (and the public) about the review process that will be used from year to year. These review processes have several notable features:
Blinding. Despite reviewers’ best efforts, implicit (unconscious) bias can creep in (check out the implicit association test to see this effect in yourself). Sometimes the review process aims to correct for this. In particular, reviewers are typically anonymous to authors—such single-blind reviewing (SBR) empowers the reviewers to make accurate judgments without fear of retribution. Sometimes, authors are also anonymous to reviewers—such double-blind reviewing (DBR) aims to avoid bias in reviewer judgments in favor of known groups, or against unknown ones. Both POPL and PLDI employ a “light” form of DBR that hopefully helps reduce bias without imposing high costs on authors.
Author response. A paper might be nearly publishable, but not quite ready. Whereas journal review processes allow reviewers to iterate with authors, conferences are usually a one-shot process: papers with flaws are rejected. POPL and PLDI support a lightweight form of iteration called “author response” or “rebuttal” to the reviews they have received, which is taken into consideration before a final judgment is made. If the paper is not accepted, it can go to the next conference in a few months’ time.
Three or more reviews. Despite our best efforts, reviewers will make mistakes, so it would be unwise to rely on a single reviewer when rendering a judgment. Moreover, some aspects of reviewer judgment, like the determination of the importance of a result, are very subjective. For both reasons, peer review employs more than one reviewer in making a final judgement. POPL and PLDI aim for at least three reviews, oftentimes four, which I note is more than the two reviews often employed in top journals in other areas of science.
Program committees. Instead of soliciting reviewers for each paper on an ad hoc basis, as is done with journals, a conference has a program committee (PC) that reviews all papers submitted to the conference. Program committees confer several advantages:
- Committee members review many papers in a concentrated period, and so they can develop a sense of quality when judging papers (and get a wider sense of the happenings in the field).
- Committee members are chosen carefully, in advance, from a diverse population, and provide valuable input (e.g., by “bidding”) to the process of assigning papers to reviewers. Ad hoc reviewers selected by journal editors might inadvertently be drawn from a narrower population (e.g., famous people immediately familiar to the editor). On the other hand, a PC may turn out not to have sufficient expertise in a particular area; in this case external reviews are sought to fill in the gaps.
- Papers are discussed, either in person, or on-line, by the reviewers, the PC Chair, and possibly other members of the committee, in the context of the whole program. Such discussion leads to better outcomes and keeps reviewers honest (a shoddy review will be seen by one’s peers). By contrast, the decision maker in journals is typically an associate editor (AE), whose judgment is based on the original reviews but without discussion. As an AE for Transactions on Programming Languages and Systems (TOPLAS), I often ask disagreeing reviewers to discuss a paper, inspired by conference-based review.
Which process is best?
The details of peer review processes can generate a healthy debate. We should be glad for this, because as consumers of scientific results we rely on peer review judgments to be good ones. We can feel more confident in a result if we feel confident about the process that produced it.
Single- vs. double-blind review. Not everyone agrees on that double-blind review is worth it. As program chair of POPL a few years back, I followed Kathryn McKinley’s lead for PLDI and pushed to use double blind review. My feeling is that this approach does indeed reduce bias and increases the quality of judgments. On the other hand, double-blind reviewing can be a burden on authors. For example, making the paper anonymous could force some contortions (e.g., if a researched system is well known), and authors may be restricted from talking about their results (e.g., at an interview) until the review process completes. To overcome these costs, we employed a light form of DBR that we hope reduces bias and burden; and reviewers and authors seemed to feel it worked, as detailed in my Chair report.
Journal vs. conference reviewing. Conference-based review and publication has disadvantages. Because they read many papers in a short period, reviewers may spend less time per paper than they would for a journal submission, which means they may miss important details. Conference paper page lengths are restricted, which can encourage better writing quality, but can also hurt it when authors cut helpful examples to save space. (Does it really make sense to limit page lengths in an age of on-line dissemination?) And there often is no reviewer followup to make sure issues are actually fixed (though SPLASH/OOPLSA is currently experimenting with two-phase review process). Many people are unhappy with this situation. Alessio Guglielmi characterizes the problems well (and links to other thoughtful points of view, e.g., from Matthias Felleisen and Moshe Vardi).
Science of peer review? One way to settle these debates would be to use science. 1 Unfortunately, this is very hard to do. For example, we could imagine comparing the outcomes of conferences that do and do not use double-blind review, and seeing whether one tends to bias toward toward (say) more famous, male authors at well-known institutions. But such a conclusion would be hard to distinguish from random chance because of the many other variables involved, e.g., differences in the papers considered, differing reviewers, and differences in other details of the review process. A more controlled study would be to have two committees review the same papers, one set blinded and one set not. But such a study would be incredibly costly, and the difference in reviewers might still have more effect than SBR vs. DBR. A believable, lightweight approach to assessing different processes would be incredibly valuable.
Calling on good reviewers
As my colleague Peter Sewell pointed out in his POPL’14 report, whatever the process used, in the end it only works when it involves reviewers who are trying, in good conscience, to render thoughtful and informed judgments.
Good reviewing is not easy. It requires taking the time to read papers carefully, being informed about the area, knowing when problems are fundamental as opposed to when they shouldn’t stand in the way of publication, and taking the time to write constructive feedback. The golden rule applies: How would you like your paper to be reviewed? Apply your answer to the papers you review yourself. Take sufficient time to do your reviews, and don’t say ‘yes’ to so many reviewing duties that you can’t do a good job.
In a future post, I’ll present some advice on writing good peer reviews, based on my experience as a reviewer, conference chair, and editor.
The PL community is blessed to have a culture of good reviewing. This is a good thing: Peer review is the heart of the scientific process—it is a gateway for new ideas and the foundation of our trust in published results, some of which go on to have a big impact in our lives.
- It would be interesting if the paper that was published assessing the process might be subjected to a process that is sub-par, according to the paper. ↩