Author Archives: Michael Hicks

Evaluating Empirical Evaluations (for Fuzz Testing)

How do we know what we know? That question is the subject of study for the field of epistemology. Per Wikipedia, “Epistemology studies the nature of knowledge, justification, and the rationality of belief.” Science is one powerful means to knowledge. Per the linked Wikipedia article, “Science is viewed as a refined, formalized, systematic, or institutionalized form of the pursuit and acquisition of empirical knowledge.”

Stopwatch

Gathering Empirical Evidence

Most people are familiar with the basic scientific method: Pose a hypothesis about the world and then carry out an experiment whose empirical results can either support or falsify the hypothesis (and, inevitably, suggest additional hypotheses).  In PL research we frequently rely on empirical evidence. Compiler optimizations, static and dynamic analyses, program synthesizers, testing tools, memory management algorithms, new language features, and other research developments each depend on some empirical evidence to demonstrate their effectiveness.

A key question for any experiment is: What is the standard of empirical evidence needed to adequately support the hypothesis?

This post is a summary of a paper co-authored by George T. Klees, Andrew Ruef, Benji Cooper, Shiyi Wei, and me that will appear in the Conference on Computer and Communications Security, this Fall, titled “Evaluating Fuzz Testing.” 1 It starts to answer the above question for research on fuzz testing (or simply, fuzzing), a process whose goal is to discover inputs that cause a program to crash.  We studied empirical evaluations carried out by 32 research papers on fuzz testing. Looking critically at the evidence gathered by these papers, we find that no paper adheres to a sufficiently high standard of evidence to justify general claims of effectiveness (though some papers get close). We carry out our own experiments to illustrate why failure to meet the standard can produce misleading or incorrect conclusions.

Why have researchers systematically missed the mark here? I think the answer owes in part to the lack of an explicit standard of evidence. Our paper can be a starting point for such a standard for fuzz testing. More generally, several colleagues and I have been working on a checklist for empirical evaluations in PL/SE-style research. We welcome your feedback and participation!

Continue reading

Notes:

  1. I also presented our work at the ISSISP’18 summer school, though the work has matured a bit since then.

10 Comments

Filed under Process, Science, Software Security, Software Testing

Software Security is a Programming Languages Issue

This is the the last of three posts on the course I regularly teach, CS 330, Organization of Programming Languages. The first two posts covered programming language styles and mathematical concepts. This post covers the last 1/4 of the course, which focuses on software security, and related to that, the programming language Rust.

This course topic might strike you as odd: Why teach security in a programming languages course? Doesn’t it belong in, well, a security course? I believe that if we are to solve our security problems, then we must build software with security in mind right from the start. To do that, all programmers need to know something about security, not just a handful of specialists. Security vulnerabilities are both enabled and prevented by various language (mis)features, and programming (anti)patterns. As such, it makes sense to introduce these concepts in a programming (languages) course, especially one that all students must take.

This post is broken into three parts: the need for security-minded programming, how we cover this topic in 330, and our presentation of Rust. The post came to be a bit longer than I’d anticipated; apologies!

Software source code

Security is a programming (languages) concern

Continue reading

27 Comments

Filed under Education, Software Security, Types

Teaching Programming Languages (part 2)

This is my second post on UMD CS’s programming languages course, CS 330.

We introduce Ruby and OCaml as exemplars of dynamic/scripting and functional languages, respectively, in the first half of the course. I described these in the first post. This post covers the third quarter of the course. This part goes over technologies that underpin a language’s design and implementation, in particular regular expressions, finite automata, context-free grammars, and LL-k parsing. It also looks at the lambda calculus as a core model of computation, and operational semantics as way of precisely specifying what programs mean. The last part of the course discusses the basics of software security and coding strategies for avoiding various vulnerabilities. I will dedicate a separate post to these last topics.

I’m very curious for your feedback on the course overall. I welcome suggestions for improvements/adjustments to the content!

Continue reading

2 Comments

Filed under Education, Semantics

Teaching at Scale with Clickers

Computer science is wildly popular at Universities right now, owing in no small part to the robust job market for CS graduates. This market is driven by the voracious appetite of businesses and the public for “tech.” A consequence of increasing CS popularity is the substantial growth of CS course sizes.  Such growth presents a significant challenge to instructors of those courses, as the number of CS instructional staff have generally not scaled linearly with the number of students. How can we teach many more students with roughly the same number of instructors but without a (substantial) reduction in overall quality? One part of my answer to this question is: “clicker” quizzes.

I recently blogged about CMSC 330, the undergraduate programming languages course I (co-)teach at the University of Maryland. That post focused on the content of the course. This post considers the more general question of how to deliver a computer science course at scale, focusing on the lecture component and a key piece of my approach: the use of clicker quizzes during class.  I’ll consider other aspects of large course management — student Q&A and grading — in a future post.

Continue reading

9 Comments

Filed under Education

Teaching Programming Languages

I often have the pleasure of teaching UMD CS‘s undergraduate programming languages course, CMSC 330. Taken by every sophomore in our program, it has evolved into a pretty interesting course, and I find myself talking about it with various people I meet. As such, I thought it might be worth writing down what it’s about, in case others might also find the course (or elements of it) interesting or its materials useful. (Most of the course’s materials — lecture notes, projects, exams, homeworks — are freely available at the course homepage.) There is too much to say for one post, so I’ll break it into three parts. This post covers the overview and first 1/2 of the course, and the second post will cover the third quarter, and the last post will cover the last quarter while also making some broader connections.

Programming languages bubble chart

Programming languages galore!

Continue reading

14 Comments

Filed under Dynamic languages, Education, Functional programming

What is soundness (in static analysis)?

The PLUM reading group recently discussed the paper, DR CHECKER: A Soundy Analysis for Linux Kernel Drivers, which appeared at USENIX Securty’17. This paper presents an automatic program analysis (a static analysis) for Linux device drivers that aims to discover instances of a class of security-relevant bugs. The paper is insistent that a big reason for DR CHECKER’s success (it finds a number of new bugs, several which have been acknowledged to be true vulnerabilities) is that the analysis is soundy, as opposed to sound. Many of the reading group students wondered: What do these terms mean, and why might soundiness be better than soundness?

To answer this question, we need to step back and talk about various other terms also used to describe a static analysis, such as completeness, precision, recall, and more. These terms are not always used in a consistent way, and they can be confusing. The value of an analysis being sound, or complete, or soundy, is also debatable. This post presents my understanding of the definitions of these terms, and considers how they may (or may not) be useful for characterizing a static analysis. One interesting conclusion to draw from understanding the terms is that we need good benchmark suites for evaluating static analysis; my impression is that, as of now, there are few good options. Continue reading

20 Comments

Filed under Abstract interpretation, Program Analysis

What does the SIGPLAN EC do?

I’ve served as Chair of ACM SIGPLAN for the last two years. It’s been a pleasure and a privilege to support the programming languages community, working with my fellow members on the SIGPLAN Executive Committee (EC). The current SIGPLAN EC is entering its third and final year of service. Elections for the next EC will be held in early 2018, and the newly elected members will begin serving in July of that year. Who will they be?

In this blog post I describe, in Q&A format, the activities and responsibilities of the SIGPLAN EC and its officers. My hope is that this post will inform possible volunteers about what they can expect to do if elected to the EC, and will help voters match candidates’ aptitudes to each position’s responsibilities. This post will also highlight some of the accomplishments of the current and past ECs, hopefully giving the community an idea of what we’ve been up to, on their behalf.SIGPLAN logo

Continue reading

Leave a Comment

Filed under Process

Jean Sammet, a Remembrance

This past weekend, trailblazing computer scientist Jean Sammet passed away at the age of 88. I learned this news through emeritus colleague Marv Zelkowitz who lives in the same retirement community that Jean did. He saw the notice of her passing over the weekend on a community bulletin board. (There’s an interesting story about this; see the end of this post!)

Jean Sammet visiting UMD in the late 1970s. Photo by Ben Shneiderman

Continue reading

2 Comments

Filed under Scientists

Measuring Single vs. Double-blind Reviewing

My colleague Emery Berger recently pointed me to the paper Single versus Double Blind Reviewing at WSDM 2017. This paper describes the results of a controlled experiment to test the impact of hiding authors’ identities during parts of the peer review process. The authors of the experiment—PC Chairs of the 2017 Web Search and Data Mining (WSDM’17) conference—examined the reviewing behavior of two sets of reviewers for the same papers submitted to the conference. They found that author identities were highly significant factors in a reviewer’s decision to recommend the paper be accepted to the conference. Both the fame of an author and the author’s affiliation were influential. Interestingly, whether the paper had a female author or not was not significant in recommendation decisions. [Update: a different look at the data found a penalty for female authors; see addendum to this post.]

Blinded scales of justice.

Fairness is blind

I find this study very interesting, and incredibly useful. Many people I have talked to have suggested that we scientifically compare single- with double-blind reviewing (SBR vs. DBR, for short). A common idea is to run one version of a conference as DBR and compare its outcomes to a past version of the conference that used SBR. The problem with this approach is that both the papers under review and the people reviewing them would change between conference iterations. These are potentially huge confounding factors. While the WSDM’17 study is not perfect, it gets past some of these big issues.

In the rest of the post I will summarize the details of the WSDM’17 study and offer some thoughts about its strengths and weaknesses. I think we should attempt more studies like this for other conferences. Continue reading

5 Comments

Filed under Process, Research, Science

POPL’17 in Paris: Some highlights

Last week I attended the 44th ACM SIGPLAN Symposium on Principles of Programming Languages (POPL 2017). The event was hosted at Paris 6 which is part of the Sorbonne, University of Paris. It was one of the best POPLs I can remember. The papers are both interesting and informative (you can read them all, for free, from the Open TOC), and the talks I attended were generally of very high quality. (Videos of the talks will be available soon—I will add links to this post.) Also, the attendance hit an all-time high: more than 720 people registered for POPL and/or one of its co-located events.

In this blog post I will highlight a few of my favorite papers at this POPL, as well as the co-located N40AI event, which celebrated 40 years of abstract interpretation. Disclaimer: I do not have time to describe all of the great stuff I saw, and I could only see a fraction of the whole event. So just because I don’t mention something here doesn’t mean it isn’t equally great. 1

Continue reading

Notes:

  1. I also attended PLMW just before POPL, and gave a talk. I may discuss that in another blog post.

14 Comments

Filed under Program Analysis, Research, Scientists, Semantics, Software Security, Types