Category Archives: Software engineering

Evaluating Empirical Evaluations (for Fuzz Testing)

How do we know what we know? That question is the subject of study for the field of epistemology. Per Wikipedia, “Epistemology studies the nature of knowledge, justification, and the rationality of belief.” Science is one powerful means to knowledge. Per the linked Wikipedia article, “Science is viewed as a refined, formalized, systematic, or institutionalized form of the pursuit and acquisition of empirical knowledge.”

Stopwatch

Gathering Empirical Evidence

Most people are familiar with the basic scientific method: Pose a hypothesis about the world and then carry out an experiment whose empirical results can either support or falsify the hypothesis (and, inevitably, suggest additional hypotheses).  In PL research we frequently rely on empirical evidence. Compiler optimizations, static and dynamic analyses, program synthesizers, testing tools, memory management algorithms, new language features, and other research developments each depend on some empirical evidence to demonstrate their effectiveness.

A key question for any experiment is: What is the standard of empirical evidence needed to adequately support the hypothesis?

This post is a summary of a paper co-authored by George T. Klees, Andrew Ruef, Benji Cooper, Shiyi Wei, and me that will appear in the Conference on Computer and Communications Security, this Fall, titled “Evaluating Fuzz Testing.” 1 It starts to answer the above question for research on fuzz testing (or simply, fuzzing), a process whose goal is to discover inputs that cause a program to crash.  We studied empirical evaluations carried out by 32 research papers on fuzz testing. Looking critically at the evidence gathered by these papers, we find that no paper adheres to a sufficiently high standard of evidence to justify general claims of effectiveness (though some papers get close). We carry out our own experiments to illustrate why failure to meet the standard can produce misleading or incorrect conclusions.

Why have researchers systematically missed the mark here? I think the answer owes in part to the lack of an explicit standard of evidence. Our paper can be a starting point for such a standard for fuzz testing. More generally, several colleagues and I have been working on a checklist for empirical evaluations in PL/SE-style research. We welcome your feedback and participation!

Continue reading

Notes:

  1. I also presented our work at the ISSISP’18 summer school, though the work has matured a bit since then.

10 Comments

Filed under Process, Science, Software Security, Software Testing

Rise of the Robots: Review and Reflection

I recently read Martin Ford’s Rise of the Robots with the UMD CS faculty book club. The book considers the impact of the growth of information technology (IT) on the human labor market, and how the trend towards greater automation could eventually eliminate a substantial number of jobs. The result could be a radical, and disruptive, reshaping of the global economy.81fncUPB6cL

I would recommend the book. I found it well-written and thought provoking. Ford capably argues from past economic and technology trends and also digs into particular problems, products, and research in order to extrapolate future impact. Of the ten faculty who discussed the book, nine of us (including me) were convinced that future automation will be increasingly disruptive to human labor markets.

While reading the book, I found myself wondering about my own role, and that of my field, in addressing this situation we’ve contributed to. Many computer scientists have high-minded ideals and wish to help society through IT innovation. What can we do to ensure that those ideals are realized, rather than perverted into the dystopian future that Ford is warning us about? Continue reading

10 Comments

Filed under Algorithms, Book Reviews, Policy, Software engineering

What is a bug?

Buggy software doesn’t work. According to wikipedia

A software bug is an error … in a computer program or system that causes it to produce an incorrect or unexpected result, or to behave in unintended ways. Most bugs arise from mistakes and errors made by people in either a program’s source code or its design ...

When something is wrong with a program, we rarely hear of it having one bug — we hear of it having many bugs. I’m wondering: Where does one bug end and the next bug begin?

To answer this question, we need an operational definition of a bug, not the indirect notion present in the Wikipedia quote. 1

This post starts to explore such a definition, but I’m not satisfied with it yet — I’m hoping you will provide your thoughts in the comments to move it forward.

Continue reading

Notes:

  1. Andreas Zeller, in his book Why Programs Fail, prefers the term defect to bug since the latter term is sometimes used to refer to erroneous behavior, rather than erroneous code. I stick with the term bug, in this post, and use it to mean the problematic code (only).

25 Comments

Filed under Semantics, Software engineering