Author Archives: Michael Hicks
How to Write a Grad School Personal Statement
BullFrog: Online Schema Migration, On Demand
I did my PhD on a topic I called dynamic software updating (DSU), a process by which a running application is updated with new functionality, whether to add features or fix bugs, without shutting it down. As a faculty member, I supervised several PhD students on DSU projects. These considered the semantics of DSU and ways of reasoning about and/or testing a dynamic update’s correctness (including while it’s deployed), and ways of implementing DSU, using compilation, libraries, and/or code rewriting. All of this work resulted in what are, as far as I’m aware, still the most full-featured and efficient implementations of DSU for C and Java, to date.
While DSU handles the update of long-running, single-process applications, many long-running applications also involve a database management system (DBMS) to store persistent application data. For example, an online market will have a front end to present the user interface, but the market’s inventory, purchase log, user reviews, etc. will be stored in the back-end database. As such, a single logical change to an application could well involve individual changes to both the front-end code and the contents and format of the database. Maybe our upgraded market now provides access to an item’s price history, which is implemented by extending the DB schema and by adding front-end functionality to query/access this information. To realize this upgrade dynamically, we need to change the application and the database, in one logical step.
In a paper presented at SIGMOD this month, we describe Bullfrog, a new DBMS that supports online schema updates in a way that enables whole-application upgrades. An application change can be applied by DSU for the front-end instances and by Bullfrog for the back-end DB. A key feature of Bullfrog is that on the one hand, the schema change is immediate, which simplifies front-end/back-end coordination of the update, especially when schema changes are backward incompatible. On the other hand, data migration to the new schema is lazy, as the application demands it. Lazy migration avoids a potentially lengthy update-time pause, which would result in loss of availability, defeating the
whole point of DSU. There were lots of challenges to realizing the lazy updating model. I give a flavor of the approach here; the paper has the details. Continue reading
Increasing the Impact of PL Research
[This article is cross-posted on PL Perspectives, the SIGPLAN blog.]
Even in this vibrant environment, my sense is that the size of the PL research community, and the impact of its research, is lower than it should be. My social network tells me that grad school applications signaling PL interest are declining, and while many researchers have won Turing awards for PL ideas these are becoming fewer and further between. Compare this state of affairs to that in the machine learning and security communities, which are growing rapidly in size and stature. Is there anything the PL community can do to increase the impact of its great work?
My recommendation is a concentrated effort to diversify PL research enthusiasts, and through them broaden the impact of PL-minded work.
We in PL can expand our tent. Education and outreach can help others to see that PL—its problems, methods, and ethos—is different and more exciting than they realized. We can lower the barrier to entry by engaging in a little housecleaning around expectations of core knowledge. We can also venture outside our tent, taking our knowledge and ideas to join other communities and address their problems. All of these steps will follow naturally from a focus on collaborative efforts attacking substantial problems, such as deployable AI or a quantum programming stack, the solution to which involves PL techniques, but many others besides. Continue reading
“What is PL Research?” The Talk
I was invited to give a talk at the Programming Languages Mentoring Workshop (PLMW) colocated with POPL’19 in January. The talk topic was What is programming languages research? I was excited to give this talk. It’s a topic I’ve thought a lot about over the years; in 2015 I wrote a blog post about it. Shortly thereafter I was elected SIGPLAN Chair, and over the ensuing three years came to know the exciting depth and breadth of the field even more deeply.
Like the blog post, the talk presents what I view as the goals, ethos, and benefits of PL research. Because the PLMW audience is senior undergraduates and early graduate students, the talk also presents an overview of PL as a field. In particular, it presents a tutorial of sorts of the areas and methods that PL researchers often develop and employ. To capture what these are, I skimmed a sampling of the conference proceedings of PLDI and POPL from the last 30 years. Doing so, I abstracted the “shape” of a PL research paper, and identified the broad areas PL researchers tend to focus on. The talk presents a flavor of these areas. Because the talk took place just before POPL, I focused most on topics that appear in POPL-published research; the talk highlights particular POPL’19 papers as examples.
Several people afterward told me that they enjoyed the talk and asked about whether a video of the talk might be available. Unfortunately, the talks were not recorded. SIGPLAN main conference talks are regularly video-recorded, but workshops and co-located events are hit and miss. I certainly understand the financial reasons for this situation. Nevertheless, it’s really too bad that PLMW talks are not recorded. In my experience, PLMW speakers put an exceptional amount of time and care into their talks, so they are often very well done. The talks also target a general audience, so they are potentially valuable to many more people than just those attending the actual event.
In the hopes that others might find it useful, I decided to video-record myself giving the PLMW talk. The recording is not great, but I hope that fact doesn’t get in the way of the conveying the content. If it does, maybe just the slide deck will prove useful. If you have comments or thoughts, I’m glad to hear them!
Many thanks to the organizers of PLMW@POPL’19 for a great event, and the opportunity to speak!
Filed under Abstract interpretation, Process, Program Analysis, Research, Semantics, Types
How to Write a Conference Talk
As we are gearing up for POPL’19, my students putting their talks together. For one of them, this is his first conference talk, and for the other it’s his first PL talk. So both of them separately asked me for advice about putting the talk together. After answering the second time, I realized I should follow Matt Might’s advice, and “reply to public” by sharing what I wrote to them.
There’s lots of good advice out there about making conference talks, which I summarize at the end. I think what I say here reinforces much of that good advice, and adds a bit to it.Continue reading
Evaluating Empirical Evaluations (for Fuzz Testing)
How do we know what we know? That question is the subject of study for the field of epistemology. Per Wikipedia, “Epistemology studies the nature of knowledge, justification, and the rationality of belief.” Science is one powerful means to knowledge. Per the linked Wikipedia article, “Science is viewed as a refined, formalized, systematic, or institutionalized form of the pursuit and acquisition of empirical knowledge.”
Most people are familiar with the basic scientific method: Pose a hypothesis about the world and then carry out an experiment whose empirical results can either support or falsify the hypothesis (and, inevitably, suggest additional hypotheses). In PL research we frequently rely on empirical evidence. Compiler optimizations, static and dynamic analyses, program synthesizers, testing tools, memory management algorithms, new language features, and other research developments each depend on some empirical evidence to demonstrate their effectiveness.
A key question for any experiment is: What is the standard of empirical evidence needed to adequately support the hypothesis?
This post is a summary of a paper co-authored by George T. Klees, Andrew Ruef, Benji Cooper, Shiyi Wei, and me that will appear in the Conference on Computer and Communications Security, this Fall, titled “Evaluating Fuzz Testing.”[ref]I also presented our work at the ISSISP’18 summer school, though the work has matured a bit since then.[/ref] It starts to answer the above question for research on fuzz testing (or simply, fuzzing), a process whose goal is to discover inputs that cause a program to crash. We studied empirical evaluations carried out by 32 research papers on fuzz testing. Looking critically at the evidence gathered by these papers, we find that no paper adheres to a sufficiently high standard of evidence to justify general claims of effectiveness (though some papers get close). We carry out our own experiments to illustrate why failure to meet the standard can produce misleading or incorrect conclusions.
Why have researchers systematically missed the mark here? I think the answer owes in part to the lack of an explicit standard of evidence. Our paper can be a starting point for such a standard for fuzz testing. More generally, several colleagues and I have been working on a checklist for empirical evaluations in PL/SE-style research. We welcome your feedback and participation!
Filed under Process, Science, Software Security, Software Testing
Software Security is a Programming Languages Issue
This is the the last of three posts on the course I regularly teach, CS 330, Organization of Programming Languages. The first two posts covered programming language styles and mathematical concepts. This post covers the last 1/4 of the course, which focuses on software security, and related to that, the programming language Rust.
This course topic might strike you as odd: Why teach security in a programming languages course? Doesn’t it belong in, well, a security course? I believe that if we are to solve our security problems, then we must build software with security in mind right from the start. To do that, all programmers need to know something about security, not just a handful of specialists. Security vulnerabilities are both enabled and prevented by various language (mis)features, and programming (anti)patterns. As such, it makes sense to introduce these concepts in a programming (languages) course, especially one that all students must take.
This post is broken into three parts: the need for security-minded programming, how we cover this topic in 330, and our presentation of Rust. The post came to be a bit longer than I’d anticipated; apologies!
Filed under Education, Software Security, Types
Teaching Programming Languages (part 2)
This is my second post on UMD CS’s programming languages course, CS 330.
We introduce Ruby and OCaml as exemplars of dynamic/scripting and functional languages, respectively, in the first half of the course. I described these in the first post. This post covers the third quarter of the course. This part goes over technologies that underpin a language’s design and implementation, in particular regular expressions, finite automata, context-free grammars, and LL-k parsing. It also looks at the lambda calculus as a core model of computation, and operational semantics as way of precisely specifying what programs mean. The last part of the course discusses the basics of software security and coding strategies for avoiding various vulnerabilities. I will dedicate a separate post to these last topics.
I’m very curious for your feedback on the course overall. I welcome suggestions for improvements/adjustments to the content!
Teaching at Scale with Clickers
Computer science is wildly popular at Universities right now, owing in no small part to the robust job market for CS graduates. This market is driven by the voracious appetite of businesses and the public for “tech.” A consequence of increasing CS popularity is the substantial growth of CS course sizes. Such growth presents a significant challenge to instructors of those courses, as the number of CS instructional staff have generally not scaled linearly with the number of students. How can we teach many more students with roughly the same number of instructors but without a (substantial) reduction in overall quality? One part of my answer to this question is: “clicker” quizzes.
I recently blogged about CMSC 330, the undergraduate programming languages course I (co-)teach at the University of Maryland. That post focused on the content of the course. This post considers the more general question of how to deliver a computer science course at scale, focusing on the lecture component and a key piece of my approach: the use of clicker quizzes during class. I’ll consider other aspects of large course management — student Q&A and grading — in a future post.
Filed under Education
Teaching Programming Languages
I often have the pleasure of teaching UMD CS‘s undergraduate programming languages course, CMSC 330. Taken by every sophomore in our program, it has evolved into a pretty interesting course, and I find myself talking about it with various people I meet. As such, I thought it might be worth writing down what it’s about, in case others might also find the course (or elements of it) interesting or its materials useful. (Most of the course’s materials — lecture notes, projects, exams, homeworks — are freely available at the course homepage.) There is too much to say for one post, so I’ll break it into three parts. This post covers the overview and first 1/2 of the course, and the second post will cover the third quarter, and the last post will cover the last quarter while also making some broader connections.
Filed under Dynamic languages, Education, Functional programming