My colleague Emery Berger recently pointed me to the paper Single versus Double Blind Reviewing at WSDM 2017. This paper describes the results of a controlled experiment to test the impact of hiding authors’ identities during parts of the peer review process. The authors of the experiment—PC Chairs of the 2017 Web Search and Data Mining (WSDM’17) conference—examined the reviewing behavior of two sets of reviewers for the same papers submitted to the conference. They found that author identities were highly significant factors in a reviewer’s decision to recommend the paper be accepted to the conference. Both the fame of an author and the author’s affiliation were influential. Interestingly, whether the paper had a female author or not was not significant in recommendation decisions. [Update: a different look at the data found a penalty for female authors; see addendum to this post.]
Fairness is blind
I find this study very interesting, and incredibly useful. Many people I have talked to have suggested that we scientifically compare single- with double-blind reviewing (SBR vs. DBR, for short). A common idea is to run one version of a conference as DBR and compare its outcomes to a past version of the conference that used SBR. The problem with this approach is that both the papers under review and the people reviewing them would change between conference iterations. These are potentially huge confounding factors. While the WSDM’17 study is not perfect, it gets past some of these big issues.
In the rest of the post I will summarize the details of the WSDM’17 study and offer some thoughts about its strengths and weaknesses. I think we should attempt more studies like this for other conferences. Continue reading
While scientific papers were once available only to those willing to pay expensive fees to journal publishers, papers are now increasingly made available for free, as they enjoy some form of open access (OA). Not all forms of open access are the same, however. While the ACM SIGPLAN Executive Committee (of which I am the Chair) is generally happy with the OA rights SIGPLAN authors currently enjoy, it may be time to push for even stronger rights. Reasonable people may disagree about the costs and benefits of such rights. As such, we would like your feedback (even if you are not a SIGPLAN member).
Please consider filling out this open access survey. It should take 5-10 minutes.
The remainder of this blog post discusses the issues in more depth. We invite your feedback!
Peer review is at the heart of the scientific process. As I have written about before, scientific results are deemed publishable by top journals and conferences only once they are given a stamp of approval by a panel of expert reviewers (“peers”). These reviewers act as a critical quality control, rejecting bogus or uninteresting results.
But peer review involves human judgment and as such it is subject to bias. One source of bias is a scientific paper’s authorship: reviewers may judge the work of unknown or minority authors more negatively, or judge the work of famous authors more positively, independent of the merits of the work itself.
Double-blind: Authors are blind to their reviewers, who are blind to authors
The double-blind review process aims to mitigate authorship bias by withholding the identity of authors from reviewers. Unfortunately, simply removing author names from the paper (along with other straightforward prescriptions) may not be enough to prevent the reviewers from guessing who the authors are. If reviewers often guess and are correct, the benefits of blinding may not be worth the costs.
While I am a believer in double-blind reviewing, I have often wondered about its efficacy. So as part of the review process of CSF’16, I carried out an experiment: I asked reviewers to indicate, after reviewing a paper, whether they had a good guess about the authors of the paper, and if so to name the author(s). This post presents the results. In sum, reviewers often (2/3 of the time) had no good guess about authorship, but when they did, they were often correct (4/5 of the time). I think these results support using a double-blind process, as I discuss at the end.
Filed under Process, Science
Conferences are the heart of the PL research community. The best PL research is published at conferences, following a rigorous peer review process on par or better than the process of high-quality journals. Conferences are also where science gets done. As the respective community gathers to learn about the latest results, its members also network and interact, developing collaborations or carrying on projects that could produce the next breakthroughs. At conferences, students and young professors can rub elbows with luminaries, and researchers can develop problems and exchange ideas with practitioners.
Air travel warms the planet disproportionately
One drawback of conferences (compared to other forms of research publication) is their cost. The monetary cost is obvious: at least one of a paper’s authors must pay to attend the conference, a cost that includes registration, travel, and hotel. Another cost that is less often considered is the environmental cost. In particular, I’m thinking of the impact that travel to/from the conference has on global warming. Most conference attendees travel great distances, and so travel by airplane. But air travel is particularly bad for global warming. So I wondered: what is the cost of conference travel, in terms of carbon footprint?
To get some idea, I decided to estimate the carbon footprint of the PLDI’16 program committee (PC) meeting, held just before and at the same venue as POPL’16. The result directly sheds some light on the carbon footprint of in-person PC meetings, and by treating it as a sample of the PL community overall, sheds light on the carbon footprint of PL conferences. In this blog post I present the results of my analysis and conclude with thoughts about possible actions to mitigate environmental cost.
The IEEE Cybersecurity Development (SecDev) Conference is a new conference focused on designing and building systems to be secure. It will be offered for the first time in Boston, MA, on November 3-4, 2016. This event was conceived, and is being organized, by Rob Cunningham; I’m pleased to be the PC Chair.
As stated in the call for papers, this first iteration of the conference is seeking short (5-page) papers, extended (1-page) abstracts, and tutorial proposals. The submission deadline is June 21, 2016 — if you have new results, old results you’d like to repackage, a tool, a process, a vision, or an idea you’d like to share with those working to make systems more secure, please consider submitting a paper!
This blog post explains why I think we need this conference, what I expect the first year to look like, and what sort of papers we hope to get, in question & answer format. Continue reading
[This guest post is by David Walker, a professor at Princeton, and recent winner of the SIGPLAN Robin Milner Young Researcher Award. –Mike]
Every once in a while it is useful to take a step back and consider where fruitful new research directions come from. One such place is from the confluence of two independent streams of thought. This is an idea that I picked up from George Varghese, who gave a wonderful talk on the topic at ACM SIGCOMM 2014 and summarized the ideas in a short paper for CCR. This blog post considers confluences in the context of programming languages research, reflects upon the role such confluences have played in my own research, and suggests some things we might learn from the process. My keynote talk from POPL 2016 touches on many of these same themes.
Last week I attended a multi-day meeting for the DARPA STAC program; I am the PI of a UMD-led team. STAC supports research to develop “Space/time Analysis for Cybersecurity.” More precisely, the goal is to develop tools that can analyze software to find exploitable side channels or denial-of-service attacks involving space usage or running time.
In general, DARPA programs focus on a very specific problem, and so are different from the NSF style of funded research that I’m used to, in which the problem, solution, and evaluation approach are proposed by each investigator. One of STAC’s noteworthy features is its use of engagements, during which research teams use their tools to find vulnerabilities in challenge problems produced by an independent red team. Our first engagement was last week, and I found the experience very compelling. I think that both the NSF style and the DARPA style have benefits, and it’s great that both styles are available.
This post talks about my experience with STAC so far. I discuss the interesting PL research challenges the program presents, the use of engagements, and the opportunities STAC’s organizational structure offers, when done right.
Consider this claim
Quality is more important than quantity
I expect few people would disagree with it, and yet we do not always act as if it were true. In Academia, when considering candidates to hire or promote, we count their papers, their citations, their funding, their software download rates, their graduated students, the number of their committee memberships or journal editorships, and more.
Researchers are getting the message: quantity matters. Ugo Bardi proposes the economic underpinnings of this apparent trend, cleverly arguing that scientific papers are currency, subject to phenomena like inflation (more papers!), assaying (peer review validates papers, which support funding proposals, which finance more papers), and counterfeiting (papers published without review by unscrupulous publishers). Moshe Vardi, in a recent blog post, concurs that “we have slid down the slippery path of using quantity as a proxy for quality” and that “the inflationary pressure to publish more and more encourages speed and brevity, rather than careful scholarship.”
In this post I consider the problem of incentivizing, and assessing, research quality, starting with a recent set of guidelines put out by the CRA. I conclude with a set of questions—I hope you will share your opinion. Continue reading
As I have written previously, academic computer science differs from other scientific disciplines in its heavy use of peer-reviewed conference publications.
Since other disciplines’ conferences typically do not employ peer review, results published at highly selective computer science conferences may not be given the credit they deserve, i.e., the same credit they would receive if published in a similarly selective journal.
The main remedy has simply been to explain the situation to the possibly confused party, be it a dean or provost or a colleague from another department. But this remedy is sometimes ineffective: At some institutions, scientists risk a poor evaluation if they publish too few journal articles, but they risk muting the influence of their work in their own community if they publish too few articles at top conferences.
The ACM publications board has recently put forth a proposal that takes this problem head on by formally recognizing conference publications as equal in quality to journal publications. How? By collecting them in a special journal series called the Proceedings of the ACM (PACM).
In this post I briefly summarize the motivation and substance of the ACM proposal and provide some thoughts about it. In the end, I support it, but with some caveats. You have the opportunity to voice your own opinion via survey. You can also read other opinions for (by Kathryn McKinley) and against (by David S. Rosenblum) the proposal (if you can get past the ACM paywall, but that’s a topic for another day…).
Update: PACM has been approved, as has a new journal series called PACM PL that will collect papers accepted by major SIGPLAN Conferences. It will debut during late 2017.
Filed under Process, Science
Anyone familiar with American academia will tell you that the US News rankings of academic programs play an outsized role in this world. Among other things, US News ranks graduate programs of computer science, by their strength in the field at large as well as certain specialties. One of these specialties is Programming Languages, the focus of this blog.
The US News rankings are based solely on surveys. Department heads and directors of graduate studies at a couple of hundred universities are asked to assign numerical scores to graduate programs. Departments are ranked by the average score that they receive.
It’s easy to see that much can go wrong with such a methodology. A reputation-based ranking system is an election, and elections are meaningful only when their voters are well-informed. The worry here is that respondents are not necessarily qualified to rank programs in research areas that are not their own. Also, it is plausible that respondents would give higher scores to departments that have high overall prestige or that they are personally familiar with.
In this post, I propose using publication metrics as an input to a well-informed ranking process. Rather than propose a one-size-fits-all ranking, I provide a web application to allow users to compute their own rankings. This approach has limitations, which I discuss in detail, but I believe it’s a reasonable start to a better system.