Category Archives: Science

Evaluating Empirical Evaluations (for Fuzz Testing)

How do we know what we know? That question is the subject of study for the field of epistemology. Per Wikipedia, “Epistemology studies the nature of knowledge, justification, and the rationality of belief.” Science is one powerful means to knowledge. Per the linked Wikipedia article, “Science is viewed as a refined, formalized, systematic, or institutionalized form of the pursuit and acquisition of empirical knowledge.”

Stopwatch

Gathering Empirical Evidence

Most people are familiar with the basic scientific method: Pose a hypothesis about the world and then carry out an experiment whose empirical results can either support or falsify the hypothesis (and, inevitably, suggest additional hypotheses).  In PL research we frequently rely on empirical evidence. Compiler optimizations, static and dynamic analyses, program synthesizers, testing tools, memory management algorithms, new language features, and other research developments each depend on some empirical evidence to demonstrate their effectiveness.

A key question for any experiment is: What is the standard of empirical evidence needed to adequately support the hypothesis?

This post is a summary of a paper co-authored by George T. Klees, Andrew Ruef, Benji Cooper, Shiyi Wei, and me that will appear in the Conference on Computer and Communications Security, this Fall, titled “Evaluating Fuzz Testing.” 1 It starts to answer the above question for research on fuzz testing (or simply, fuzzing), a process whose goal is to discover inputs that cause a program to crash.  We studied empirical evaluations carried out by 32 research papers on fuzz testing. Looking critically at the evidence gathered by these papers, we find that no paper adheres to a sufficiently high standard of evidence to justify general claims of effectiveness (though some papers get close). We carry out our own experiments to illustrate why failure to meet the standard can produce misleading or incorrect conclusions.

Why have researchers systematically missed the mark here? I think the answer owes in part to the lack of an explicit standard of evidence. Our paper can be a starting point for such a standard for fuzz testing. More generally, several colleagues and I have been working on a checklist for empirical evaluations in PL/SE-style research. We welcome your feedback and participation!

Continue reading

Notes:

  1. I also presented our work at the ISSISP’18 summer school, though the work has matured a bit since then.

10 Comments

Filed under Process, Science, Software Security, Software Testing

What does the SIGPLAN EC do?

I’ve served as Chair of ACM SIGPLAN for the last two years. It’s been a pleasure and a privilege to support the programming languages community, working with my fellow members on the SIGPLAN Executive Committee (EC). The current SIGPLAN EC is entering its third and final year of service. Elections for the next EC will be held in early 2018, and the newly elected members will begin serving in July of that year. Who will they be?

In this blog post I describe, in Q&A format, the activities and responsibilities of the SIGPLAN EC and its officers. My hope is that this post will inform possible volunteers about what they can expect to do if elected to the EC, and will help voters match candidates’ aptitudes to each position’s responsibilities. This post will also highlight some of the accomplishments of the current and past ECs, hopefully giving the community an idea of what we’ve been up to, on their behalf.SIGPLAN logo

Continue reading

Leave a Comment

Filed under Process

Measuring Single vs. Double-blind Reviewing

My colleague Emery Berger recently pointed me to the paper Single versus Double Blind Reviewing at WSDM 2017. This paper describes the results of a controlled experiment to test the impact of hiding authors’ identities during parts of the peer review process. The authors of the experiment—PC Chairs of the 2017 Web Search and Data Mining (WSDM’17) conference—examined the reviewing behavior of two sets of reviewers for the same papers submitted to the conference. They found that author identities were highly significant factors in a reviewer’s decision to recommend the paper be accepted to the conference. Both the fame of an author and the author’s affiliation were influential. Interestingly, whether the paper had a female author or not was not significant in recommendation decisions. [Update: a different look at the data found a penalty for female authors; see addendum to this post.]

Blinded scales of justice.

Fairness is blind

I find this study very interesting, and incredibly useful. Many people I have talked to have suggested that we scientifically compare single- with double-blind reviewing (SBR vs. DBR, for short). A common idea is to run one version of a conference as DBR and compare its outcomes to a past version of the conference that used SBR. The problem with this approach is that both the papers under review and the people reviewing them would change between conference iterations. These are potentially huge confounding factors. While the WSDM’17 study is not perfect, it gets past some of these big issues.

In the rest of the post I will summarize the details of the WSDM’17 study and offer some thoughts about its strengths and weaknesses. I think we should attempt more studies like this for other conferences. Continue reading

5 Comments

Filed under Process, Research, Science

Gold Open Access publication: Is now the time?

While scientific papers were once available only to those willing to pay expensive fees to journal publishers, papers are now increasingly made available for free, as they enjoy some form of open access (OA)Not all forms of open access are the same, however. While the ACM SIGPLAN Executive Committee (of which I am the Chair) is generally happy with the OA rights SIGPLAN 1 authors currently enjoy, it may be time to push for even stronger rights. Reasonable people may disagree about the costs and benefits of such rights. As such, we would like your feedback (even if you are not a SIGPLAN member).

Please consider filling out this open access survey. It should take 5-10 minutes.

The remainder of this blog post discusses the issues in more depth. We invite your feedback!

Continue reading

15 Comments

Filed under Policy, Process, Science, Uncategorized

Unblinding Double-blind Reviewing

Peer review is at the heart of the scientific process. As I have written about before, scientific results are deemed publishable by top journals and conferences only once they are given a stamp of approval by a panel of expert reviewers (“peers”). These reviewers act as a critical quality control, rejecting bogus or uninteresting results.

But peer review involves human judgment and as such it is subject to bias. One source of bias is a scientific paper’s authorship: reviewers may judge the work of unknown or minority authors more negatively, or judge the work of famous authors more positively, independent of the merits of the work itself.

Double-blind: Authors are blind to their reviewers, who are blind to authors

The double-blind review process aims to mitigate authorship bias by withholding the identity of authors from reviewers. Unfortunately, simply removing author names from the paper (along with other straightforward prescriptions) may not be enough to prevent the reviewers from guessing who the authors are. If reviewers often guess and are correct, the benefits of blinding may not be worth the costs.

While I am a believer in double-blind reviewing, I have often wondered about its efficacy. So as part of the review process of CSF’16, I carried out an experiment: 1 I asked reviewers to indicate, after reviewing a paper, whether they had a good guess about the authors of the paper, and if so to name the author(s). This post presents the results. In sum, reviewers often (2/3 of the time) had no good guess about authorship, but when they did, they were often correct (4/5 of the time). I think these results support using a double-blind process, as I discuss at the end.

Continue reading

Notes:

  1. The structure of this experiment was inspired by the process Emery Berger put in place for PLDI’16, following a suggestion by Kathryn McKinley.

7 Comments

Filed under Process, Science

Carbon Footprint of Conference Travel

Conferences are the heart of the PL research community. The best PL research is published at conferences, following a rigorous peer review process on par or better than the process of high-quality journals. Conferences are also where science gets done. As the respective community gathers to learn about the latest results, its members also network and interact, developing collaborations or carrying on projects that could produce the next breakthroughs. At conferences, students and young professors can rub elbows with luminaries, and researchers can develop problems and exchange ideas with practitioners.

Photo of airplane and sunset

Air travel warms the planet disproportionately

One drawback of conferences (compared to other forms of research publication) is their cost. The monetary cost is obvious: at least one of a paper’s authors must pay to attend the conference, a cost that includes registration, travel, and hotel. Another cost that is less often considered is the environmental cost. In particular, I’m thinking of the impact that travel to/from the conference has on global warming. Most conference attendees travel great distances, and so travel by airplane. But air travel is particularly bad for global warming. So I wondered: what is the cost of conference travel, in terms of carbon footprint?

To get some idea, I decided to estimate the carbon footprint of the PLDI’16 program committee (PC) meeting, held just before and at the same venue as POPL’16. The result directly sheds some light on the carbon footprint of in-person PC meetings, and by treating it as a sample of the PL community overall, sheds light on the carbon footprint of PL conferences. In this blog post I present the results of my analysis and conclude with thoughts about possible actions to mitigate environmental cost.

Continue reading

3 Comments

Filed under Policy, Process, Science

SecDev: Bringing Security Innovation Into Design & Development

The IEEE Cybersecurity Development (SecDev) Conference is a new conference focused on designing and building systems to be secure. It will be offered for the first time in Boston, MA, on November 3-4, 2016. This event was conceived, and is being organized, by Rob Cunningham; I’m pleased to be the PC Chair.

As stated in the call for papers, this first iteration of the conference is seeking short (5-page) papers, extended (1-page) abstracts, and tutorial proposals. The submission deadline is June 21, 2016 — if you have new results, old results you’d like to repackage, a tool, a process, a vision, or an idea you’d like to share with those working to make systems more secure, please consider submitting a paper!

This blog post explains why I think we need  this conference, what I expect the first year to look like, and what sort of papers we hope to get, in question & answer format. Continue reading

Leave a Comment

Filed under Process, Research, Software Security

Confluences in Programming Languages Research

[This guest post is by David Walker, a professor at Princeton, and recent winner of the SIGPLAN Robin Milner Young Researcher Award. –Mike]

Every once in a while it is useful to take a step back and consider where fruitful new research directions come from. One such place is from the confluence of two independent streams of thought. This is an idea that I picked up from George Varghese, who gave a wonderful talk on the topic at ACM SIGCOMM 2014 and summarized the ideas in a short paper for CCR. 1 This blog post considers confluences in the context of programming languages research, reflects upon the role such confluences have played in my own research, and suggests some things we might learn from the process. My keynote talk from POPL 2016 2  touches on many of these same themes.

Continue reading

Notes:

  1. George Varghese. Life in the Fast Lane: Viewed from the Confluence Lens. ACM SIGCOMM Computer Communication Review 45 (1), pp 19-25, January 2015. (link)
  2. David Walker. Confluences in Programming Languages Research (Keynote).  ACM SIGPLAN Symposium on Principles of Programming Languages. pp. 4-4, January 2016. (abstract, videoslides)

Leave a Comment

by | April 11, 2016 · 1:00 pm

Interview with Matt Might

This post presents an interview I did on March 10th, 2015, with Matt Might, a PL researcher who is an Associate Professor in the School of Computing at the University of Utah.

Matt Might headshot

Matt Might

Matt has made strong scientific contributions to the field of programming languages, and he has done much more. He maintains an incredibly popular blog on wide-ranging topics (13 million pageviews since 2009 on topics from abstract interpretation to how to lose weight to how to be more productive). He has also become deeply committed to supporting people with rare diseases, including his own son, Bertrand, who was the first person diagnosed with NGLY1 deficiency. His work on rare disease propelled him to the White House: He met the President on January 31st, 2015, and he took a position in the Executive Office of the President to accelerate the implementation of the Precision Medicine Initiative on March 21st.

We had an engaging conversation covering all of these topics. It is too long for one post, so this post is the first of two. Continue reading

1 Comment

Filed under Abstract interpretation, Bioinformatics, Dynamic languages, Interviews, Program Analysis, Science, Scientists

DARPA STAC: Challenge-driven Cybersecurity Research

Last week I attended a multi-day meeting for the DARPA STAC program; I am the PI of a UMD-led team. STAC supports research to develop “Space/time Analysis for Cybersecurity.” More precisely, the goal is to develop tools that can analyze software to find exploitable side channels or denial-of-service attacks involving space usage or running time.

In general, DARPA programs focus on a very specific problem, and so are different from the NSF style of funded research that I’m used to, in which the problem, solution, and evaluation approach are proposed by each investigator. One of STAC’s noteworthy features is its use of engagements, during which research teams use their tools to find vulnerabilities in challenge problems produced by an independent red team. Our first engagement was last week, and I found the experience very compelling. I think that both the NSF style and the DARPA style have benefits, and it’s great that both styles are available.

This post talks about my experience with STAC so far. I discuss the interesting PL research challenges the program presents, the use of engagements, and the opportunities STAC’s organizational structure offers, when done right.

Continue reading

1 Comment

Filed under Process, Program Analysis, Research, Science, Software Security