Peer review, and why it matters

Scientific results intersect with all aspects of the modern world, and they underpin many important decisions made at the level of governments, corporations, and individuals. When you read about a scientific result—like the correlation of a particular gene’s mutation with a certain disease, or that it’s better for children to engage in unstructured play rather than structured learning at early ages—why should you believe it?

If you were an expert, you might be able to (with time) capably judge the work itself. But what if you are a non-expert, which is increasingly likely as branches of study become hyper-specialized? In this case, you are left to trust the scientific process, i.e., the way in which scientific work is judged to be true and important. At the heart of this process is peer review.

In this post I describe elements of the peer review process we use in scientific research about programming languages.

Of particular note is PL’s heavy use of peer reviewed conferences. Like other areas of computer science, conferences tend to be the main dissemination vehicle of the best results, and as such use rigorous peer review processes to select them.

While processes are important, we cannot lose sight of the critical role played by peer reviewers. Just like a democracy relies on motivated, informed, and conscientious citizens to vote for the best leaders, science relies on similarly motivated, informed, and conscientious peer reviewers for bringing the best results to light. In a future post, I will present advice for reviewers on writing what I believe are high quality peer reviews.

What is peer review?

Peer review is the process by which a scientific result is (initially) accepted by the scientific community. This process is straightforward.

A group of scientists carries out some research on a particular topic and writes a paper explaining their results. This paper is submitted to a venue that publishes scientific results. After an initial judgment by an editor or program chair (after a quick check of topic appropriateness), the paper is subjected to peer review: a group of peers with knowledge in the topic of study are given the paper to review. After reading the paper, they render a judgment about whether the paper should be accepted for publication. This judgment is based on whether, in their view, the result

  • is sufficiently important or thought-provoking,
  • is correct, and
  • whether it is well presented, i.e., so that it can be understood by the community that the publication venue represents.

Whether the paper is accepted or not, reviewer comments are sent back to the authors (anonymously) so that the authors can improve the paper and the work.

Conferences and journals

Peer review is nearly always employed by academic journals. Typically, a journal has an editorial staff, consisting of respected researchers in the field, who solicit peer reviews, and based on them decide whether to accept a paper. Acceptance might involve several rounds of review, each following paper revisions, so that the reviewers can confirm that any flaws they identified previously have been fixed. Reviews could take weeks to months to perform.

In computer science, peer review is also employed by conferences. This can be a source of confusion for those outside of computer science because in other fields, conferences rarely employ peer review. Not only are CS conference publications reviewed, the review processes are often extremely rigorous, and on par with processes used by top journals. There has been a debate for many years about whether conference review processes should be made more like journal processes, with strongly held beliefs on either side.

Peer review is the gateway, not the final judge

Whatever the peer review process, it will never be perfect. As such, publication of a scientific paper is the first important step of establishing the validity of the work, not the last. What comes next is vigorous discussion within the scientific community, and the general public, as well as followup research to continue to explore the validity of the published results.

Peer review in programming languages research

Top venues in PL employ rigorous peer review processes.

In fact, the steering committees of two of PL’s flagship conferences, POPL and PLDI, have recently issued documents, called Principles of POPL and Practices of PLDI, that describe a “contract” with authors (and the public) about the review process that will be used from year to year. These review processes have several notable features:

Blinding. Despite reviewers’ best efforts, implicit (unconscious) bias can creep in (check out the implicit association test to see this effect in yourself). Sometimes the review process aims to correct for this. In particular, reviewers are typically anonymous to authors—such single-blind reviewing (SBR) empowers the reviewers to make accurate judgments without fear of retribution. Sometimes, authors are also anonymous to reviewers—such double-blind reviewing (DBR) aims to avoid bias in reviewer judgments in favor of known groups, or against unknown ones. Both POPL and PLDI employ a “light” form of DBR that hopefully helps reduce bias without imposing high costs on authors.

Author response. A paper might be nearly publishable, but not quite ready. Whereas journal review processes allow reviewers to iterate with authors, conferences are usually a one-shot process: papers with flaws are rejected. POPL and PLDI support a lightweight form of iteration called “author response” or “rebuttal” to the reviews they have received, which is taken into consideration before a final judgment is made. If the paper is not accepted, it can go to the next conference in a few months’ time.

Three or more reviews. Despite our best efforts, reviewers will make mistakes, so it would be unwise to rely on a single reviewer when rendering a judgment. Moreover, some aspects of reviewer judgment, like the determination of the importance of a result, are very subjective. For both reasons, peer review employs more than one reviewer in making a final judgement. POPL and PLDI aim for at least three reviews, oftentimes four, which I note is more than the two reviews often employed in top journals in other areas of science.

Program committees. Instead of soliciting reviewers for each paper on an ad hoc basis, as is done with journals, a conference has a program committee (PC) that reviews all papers submitted to the conference. Program committees confer several advantages:

  • Committee members review many papers in a concentrated period, and so they can develop a sense of quality when judging papers (and get a wider sense of the happenings in the field).
  • Committee members are chosen carefully, in advance, from a diverse population, and provide valuable input (e.g., by “bidding”) to the process of assigning papers to reviewers. Ad hoc reviewers selected by journal editors might inadvertently be drawn from a narrower population (e.g., famous people immediately familiar to the editor). On the other hand, a PC may turn out not to have sufficient expertise in a particular area; in this case external reviews are sought to fill in the gaps.
  • Papers are discussed, either in person, or on-line, by the reviewers, the PC Chair, and possibly other members of the committee, in the context of the whole program. Such discussion leads to better outcomes and keeps reviewers honest (a shoddy review will be seen by one’s peers). By contrast, the decision maker in journals is typically an associate editor (AE), whose judgment is based on the original reviews but without discussion. As an AE for Transactions on Programming Languages and Systems (TOPLAS), I often ask disagreeing reviewers to discuss a paper, inspired by conference-based review.

Which process is best?

The details of peer review processes can generate a healthy debate. We should be glad for this, because as consumers of scientific results we rely on peer review judgments to be good ones. We can feel more confident in a result if we feel confident about the process that produced it.

Single- vs. double-blind review. Not everyone agrees on that double-blind review is worth it. As program chair of POPL a few years back, I followed Kathryn McKinley’s lead for PLDI and pushed to use double blind review. My feeling is that this approach does indeed reduce bias and increases the quality of judgments. On the other hand, double-blind reviewing can be a burden on authors. For example, making the paper anonymous could force some contortions (e.g., if a researched system is well known), and authors may be restricted from talking about their results (e.g., at an interview) until the review process completes. To overcome these costs, we employed a light form of DBR that we hope reduces bias and burden; and reviewers and authors seemed to feel it worked, as detailed in my Chair report.

Journal vs. conference reviewing. Conference-based review and publication has disadvantages. Because they read many papers in a short period, reviewers may spend less time per paper than they would for a journal submission, which means they may miss important details. Conference paper page lengths are restricted, which can encourage better writing quality, but can also hurt it when authors cut helpful examples to save space. (Does it really make sense to limit page lengths in an age of on-line dissemination?) And there often is no reviewer followup to make sure issues are actually fixed (though SPLASH/OOPLSA is currently experimenting with two-phase review process). Many people are unhappy with this situation. Alessio Guglielmi characterizes the problems well (and links to other thoughtful points of view, e.g., from Matthias Felleisen and Moshe Vardi).

Science of peer review? One way to settle these debates would be to use science. 1 Unfortunately, this is very hard to do. For example, we could imagine comparing the outcomes of conferences that do and do not use double-blind review, and seeing whether one tends to bias toward toward (say) more famous, male authors at well-known institutions. But such a conclusion would be hard to distinguish from random chance because of the many other variables involved, e.g., differences in the papers considered, differing reviewers, and differences in other details of the review process. A more controlled study would be to have two committees review the same papers, one set blinded and one set not. But such a study would be incredibly costly, and the difference in reviewers might still have more effect than SBR vs. DBR. A believable, lightweight approach to assessing different processes would be incredibly valuable.

Calling on good reviewers

As my colleague Peter Sewell pointed out in his POPL’14 report, whatever the process used, in the end it only works when it involves reviewers who are trying, in good conscience, to render thoughtful and informed judgments.

Good reviewing is not easy. It requires taking the time to read papers carefully, being informed about the area, knowing when problems are fundamental as opposed to when they shouldn’t stand in the way of publication, and taking the time to write constructive feedback. The golden rule applies: How would you like your paper to be reviewed? Apply your answer to the papers you review yourself. Take sufficient time to do your reviews, and don’t say ‘yes’ to so many reviewing duties that you can’t do a good job.

In a future post, I’ll present some advice on writing good peer reviews, based on my experience as a reviewer, conference chair, and editor.

The PL community is blessed to have a culture of good reviewing. This is a good thing: Peer review is the heart of the scientific process—it is a gateway for new ideas and the foundation of our trust in published results, some of which go on to have a big impact in our lives.

Notes:

  1. It would be interesting if the paper that was published assessing the process might be subjected to a process that is sub-par, according to the paper.

15 Comments

Filed under Process, Science

15 Responses to Peer review, and why it matters

  1. Here’s a recent study from CRA on the trends in how computer scientists are evaluated, with (an excess of) conference papers potentially playing a negative role.

  2. Interesting result, Aws, thanks for sharing! If you haven’t seen it, Snodgrass has a long review of DBR, with some consideration of gender bias. But the review is now 8 years old. http://www.cs.utexas.edu/users/mckinley/notes/snodgrass-sigmod-2006.pdf

  3. Something I wouldn’t have guessed before being asked to do reviews myself is how hard it is to make a review. (I knew it would also be extremely interesting; it’s very nice to be a reviewer if you don’t have too many papers to review.) You mention the effort of reading the paper in depth (and the annexes, and sometimes the related work you need to look at more closely to compare), and the knowledge requirements, but I found the hardest part was to actually *judge* the paper. Sometimes it’s easy (you find the thing horrendous, or on the contrary it’s the greatest paper you’ve read this year), but in my experience a large part of the review work is to actually force yourself to come to a conclusion: what do I actually think of this paper?

    I’ve read some “how to do good reviews” advice but it often feels rather hollow. “Write reviews as you would like to get back from your own submissions” is good, but it’s more about the form (it’s not easy to write a negative review that is still nice, and you know the author will be disappointed angry inside however you write it, if only just for a moment). Another advice is to avoid neutral notes (if notes go from -2 to +2, avoid 0 at all cost). But I haven’t read much advice (if that exists) on how to actually formulate a judgment on a paper — besides validity checking.

    I think three things could help writing better reviews:

    – a checklist of errors not to make when you write a review
    – a checklist of questions to ask oneself to make a judgment; the questions in the review interfaces of conferences sometimes help (eg. one conference asked “how would you rate the importance of the problem attacked by this paper?” and that’s actually something you may forget to wonder about during your review if you’re focused on something more vague like “do I like this paper?”), but I think we could have centralized, more complete checklists. (This would probably help writing the papers as well!)
    – reading reviews written by other people; unfortunately the practice of publishing one’s review is not accepted right now (and reviewers may not consent to this), so for beginners that are <not yet in program committee the only source of reviews is our own submission

    • Great comments! Later this week I expect I’ll post my advice on writing reviews. Some of your points are addressed in my draft, so I’ll be curious as to your comments once it’s out.

      One quick comment: I’m not opposed to neutral reviews. It may be that the paper is really well written, solid and correct, but not that interesting (to you). A weakly positive review is perfectly appropriate. If all the reviews are (only) weakly positive, then the paper is probably not a good one to accept. Inoffensive, but not something that will move the community. The key is to be systematic and honest, and then let the process play out, IMO.

    • Is there something taboo against publishing your reviews? This is never stated in the call or instructions for authors, nor is it common knowledge if one of those “silent” rules everyone follows. Publish your reviews; consent from the reviewer is not necessary, especially if you have no idea who the reviewer is! It is also great from a transparency standpoint: in the best case, it helps us understand what the PC is looking for; in the worst case conference communities should be held accountable for their collective quality of their reviews.

      • It’s an interesting idea. Some conferences do publish reviews. Andrew McCallum has been hosting OpenReview which makes clear that certain forms of openness, like publishing reviews, can happen, and this system was used for ICLR’13. SIGCOMM and other systems conferences have published “public reviews” (e.g., here for HotNets III, and here for SIGCOMM’13) but I’m not sure how these relate to the actual peer reviews.

        I imagine one danger of publishing reviews is deanonymization. NLP techniques are pretty good these days.

        • Systems seems to be more open than PL in this regards (but we haven’t even gone open access yet so….); EuroSys provided mini-reviews for each accepted paper last year (not for 2014 though). Chairs should be looking at ways in providing for more transparency in why papers are accepted and rejected. As a bonus these reviews are also useful in promoting the accepted papers.

          What might be a cool experiment is something like a published comment section for an accepted paper. Consider the interesting debate recorded at the end of Dijkstra’s “Go To Considered Harmful.”

      • This is a controversial point. If publishing your reviews without any previous agreement with the reviewers was an accepted practice, I would be glad to do it. Although some do that (and I’m grateful to them because it’s always interesting to read reviews), it is not an accepted practice; when I discussed it with my colleagues, some choked at the idea of publishing other people’s review.

        While that may be harmless for an established researcher, a PhD student is not in position to be controversial about his or her practice of research. I decided to not publish my reviews unless I got explicit consent, which means that the PC needs to notify reviewers (being a reviewer for this PC implying consent), or at least give them the choice and ask them to confirm consent in their reviews directly, which some people do in any case.

        • Be bold. Cultural norm changes only happen if someone makes the first step. PhD students are given latitude to experiment and break conventions; it is us established researchers who would get pegged for it (if anything, no one has ever cared in my case).

  4. Pingback: Advice on reviewing papers | The PL Enthusiast

  5. Pingback: PL conference papers to get a journal? - The PL EnthusiastThe Programming Languages Enthusiast

  6. Pingback: Carbon Footprint of Conference Travel - The PL Enthusiast

  7. Pingback: Unblinding Double-blind Reviewing - The PL Enthusiast

  8. Pingback: Measuring Single vs. Double-blind Reviewing - The PL Enthusiast

Leave a Reply