Recently I wrote a document for my students giving them advice on reviewing scientific papers, particularly those in programming languages. John Regehr recently blogged about reviewing papers efficiently (and Shriram Krishnamurthi before him), and it reminded me that I had this document, so I decided I would post it here, following up on my recent post on the importance of peer review. I hope that students and those interested in peer reviewing will find it useful.
The ideas in this post come from my experiences as a journal reviewer and editor, and as a program committee member and Chair. I outline how I believe that papers should be judged, and how to write a review to express that judgment. Judgments should involve usefulness/appeal, novelty, correctness, and exposition. Reviews should aim to be self-contained, clearly expressing support for their recommendation; constructive, providing feedback for improving the work; and respectful of the authors who put a lot of time into their research. In general, think of the kind of review you’d like to receive, and act accordingly.
(In what follows, I assume you are familiar with the basic conference-based review process used in computer science generally, and programming languages in particular; if not, this post gives a summary.)
Judging the work
Reviews serve two purposes. They judge whether research is well done, and they give feedback to the authors to improve their research.
There are four high-level things you are judging about the paper:
- Is this an interesting/useful idea or study?
- Is it new, or is it significantly overlapping with prior work?
- Are the conclusions drawn in the paper correct?
- Is the paper sufficiently accessible/well-written that others will understand it?
If all of these things are true, you should accept it. If not all are true, the paper may still be acceptable (e.g., 3/4 on average) or it may not (e.g., 2/4 or fewer, on average).
For the first, the definition of “interesting” is obviously qualitative. You want to attempt judge the paper according to your own interests (i.e., treating yourself as a sample drawn from the whole community) and according to more general interests (i.e., ignoring your own preferences and applying instead your notion of what others might find interesting or useful). Your consideration will involve the problem being tackled (is it important, or inconsequential?), and the way it is tackled (is the solution beautiful, or clunky?). This judgment requires the most experience and care; see below for more on this.
To determine novelty, you must have knowledge of the related work, and you must be able to judge qualitatively and fairly. For example, if the paper is performing an analysis using one method (e.g., a type system) but a previously published paper performs the same analysis using another method (e.g., abstract interpretation) such that the submitted paper’s analysis could be transliterated into the prior paper’s analysis, then the submitted paper is not new, despite the different methods (on the surface). On the other hand, a proposed solution may overlap with a prior solution, but still provide new ideas and useful data, and therefore enjoy some novelty.
When judging correctness, you are considering things like: (a) are the theorems correct? (b) is the empirical evaluation over a representative set of benchmarks? (c) is the empirical evaluation statistically sound? You are also judging factual claims that might motivate the relevance of the problem, particularly if it’s not one you’ve heard of.
Finally, you must judge whether the paper is clear enough about what the research actually consists of. Ideally the paper is sufficiently detailed that the whole idea is captured (e.g., a reasonable semantics and type system), and the main details are there to the point that the idea could be reproduced. Moreover, the paper should be clear in its exposition, motivation, etc. This requirement is often tied up in the other ones, since a poorly written paper may not present its idea crisply enough that it can be judged against related work, or judged to actually be solving the problem it purports to.
Interesting or useful?
Judging quality is the hardest part of writing a review. Even if the result is advancing the state of the art, is correct, and the paper is well written, it might not be particularly important, interesting (the solution could be downright ugly! 1), or useful. How do you decide whether it’s above the bar?
Making such a decision requires some sense of the values of the venue/community you are reviewing for. A paper submitted to a top conference (like POPL) is held to a higher standard than a paper sent to a good workshop (like PLAS), while a paper that gets into the Journal of the ACM is judged by a higher standard still. In terms of problems, solving P ≠ NP is more important than defining a domain-specific language for programming toasters. But fine distinctions are hard to make. Try to consider the quality of the papers you have seen at a venue in the past in judging the contribution of the present paper. Ultimately, the collective judgment of all reviewers will determine whether the paper is good enough; it’s more art than science.
Writing the review
The review will contain your summary of the paper, a judgment of its merits, and a justification of that judgment. It will additionally contain suggestions for improvement.
Start with a summary
Write a concise, judgement-free summary of the paper before the judgement and justification. Authors and other reviewers can use this summary to gauge whether you understood the paper, and/or what parts you found most important. The summary should not just be a rehashing of the abstract or introduction; convey in your own words what the paper is about, what problem is being solved, how is it being solved, and what evidence is given for the solution being a good one.
Scores abstract text
A review typically employs numerical scores, and textual description. You should view the text as primary, and the scores as an abstraction of the text, not a complement to it.
As such, if you are judging the paper acceptable, and giving it an “A”, then your textual review should make it clear why you think it’s acceptable. Too many reviews will give the paper an “A” and then have the entire text of the review be critical. Or they may give the paper a “C” and have largely positive comments. The idea is that if the author reads your review, (s)he should be able to reasonably guess what the scores would be.
This approach is important for two reasons.
First, the textual support for the judgment is extremely helpful to authors. If the paper is not accepted, they will know why, and have ideas for revision. More pragmatically, oftentimes authors will be asked to rebut reviews prior to a final decision being made. A PC Chair may elect to not include the scores when sending the reviews to authors for rebuttal. If your review does not match your score, the authors will be at a disadvantage in determining how to spend their limited response budget.
Second, without a justification for the score in the review, there is no opportunity for discussing the paper with other reviewers, at a low-level, based only on the review. Ideally, reviewer 1 can read reviewer 2’s review, and understand exactly why reviewer 2 thinks as she does, and think about whether he disagrees. If the review only contains numerical scores and little support, no such thought process can take place. And there simply is not enough time at the PC meeting to discuss every paper in depth.
Put the most important parts first
A very long, disorganized review is neither great for the authors nor for the other reviewers. Put yourself in the position of both constituencies, and make sure the key parts of your argument appear at the top. Supporting details should be afterward.
In general, shorter is better, but be sure your reasoning for acceptance/rejection is clear.
Suggestions for improvement
Authors will be very happy to receive suggestions for improvement. These can come in the latter part of the review. You might have ideas for a new way of formulating the result, or of presenting it, e.g., with a good running example. You really help science go by doing a good job in helping the authors. You’ve already sunk a few hours into reading the paper; why not spend another 30 minutes to an hour helping the authors make their paper even better?
Be respectful of the authors, even if the paper fails on every count. Criticize the paper, not the people. Avoid hyperbole.
Other things to think about
Reading the paper
The first two criteria for judging a paper (utility/interestingness and novelty) tend to be lynchpins: if either one of them fails to hold, the paper is dead, no matter how well written and/or correct it is. Therefore, a useful strategy in reviewing papers is to read the introduction and conclusions carefully, along with the related work, and then skim the middle parts. (If there’s an “overview” section, read that carefully too.) If at that point the paper fails to convince that it is above the bar in criteria (1) and (2), there’s no need to dig into the details in the middle, at least for judging the paper. You might well spend more time on it to provide constructive feedback.
Otherwise, general advice on reading papers applies (though you may have to try harder than usual to understand the paper, to be a good judge of point (3) above). Here are some ones I’ve read and found useful:
- How to read a technical paper by Jason Eisner (JHU)
- How to Read a Paper by Michael Mitzenmacher (Harvard)
- Efficient Reading of Papers in Science and Technology by Michael J. Hanson (and updated by Dylan J. McNamee)
Submissions (double-blind or not) are made in confidence. Do not reveal that a paper has been submitted somewhere. Do not pass them on to others. Do not discuss them with anyone but the program committee. Do not contact the authors. Never reveal who reviewed a paper.
See the ACM SIGPLAN policies regarding review of conference papers for further details.
Judge papers based on their usefulness/appeal, novelty, correctness, and exposition. Write reviews that are self-contained, justifying their position, while providing constructive feedback in a respectful manner. Great reviews lead to stronger programs, fairer outcomes, and ultimately better science (whether a paper is published now, or later), so it is worth taking the time to do reviewing well!
- Ugliness, or inelegance, is a particularly strong indictment of a mathematical paper. A friend of mine relayed to me the wisdom of his advisor: “There’s no place is the world for ugly mathematics!” ↩