At the recent PLDI conference, Armando Solar-Lezama and I organized a workshop called PLOOC: “Programming Language Tools for Massive Open Online Courses.” The high-level goal of the workshop was to discuss ways in which tools coming out of PL research can be used in K-16 education. Over the years, PL researchers have developed many techniques for automating and simplifying the design and analysis of programs. For the most part, these techniques have targeted the professional programmer. However, techniques developed for industrial code can also be applied to student-written programs in computer science courses.

As an example, think of a student struggling with a programming assignment in a Computer Science 101 class. Normally, the student would have to take her partial solution to a TA and ask for guidance, and this would take hours or days. PL research opens up the possibility of an *automated TA *that would offer continuous customized feedback to the student.

For instance, suppose the student has completed a programming assignment but isn’t sure that her code is correct. An automated “TA” could use * program analysis* techniques to thoroughly analyze the student’s solution, pointing out corner cases on which it does not work correctly. Or suppose the student is struggling to finish the assignment. In this case, the TA can utilize techniques from the emerging area of *program synthesis. *In synthesis, the goal is to algorithmically generate correct programs from specifications that capture partial knowledge. Applied in this context, a synthesis algorithm would treat the student’s unfinished program as a specification, then generate a correct, complete program from this spec. (Naturally, the TA would not want to give away this program in its entirety, but only reveal portions of it.)

These ideas can be applied beyond computer science courses, in any domain where student work needs to be mathematically rigorous. For example, a formula in high school algebra is really a simple computer program that evaluates an arithmetic expression over a set of input variables. A student who is working on proving a theorem in Euclidean geometry is stitching together a set of axioms and inference rules following certain precise rules for composition — in other words, writing a program. If tools from PL research can be used to reason about complex, industrial code bases, they can also be used to analyze and complete programs — proofs and constructions — in algebra and geometry.

In addition to helping students, our automated TA could assist instructors in creating course material. For example, they could find bugs in the homework assignments that the instructor has created, or help the instructor generate new homework problems.

These ideas are not mere speculation. In the last few years, we have seen several new teaching tools based on state-of-the-art PL research, as well as attempts to integrate PL tools into curricula. For example, at my alma mater Penn, Benjamin Pierce designed a new approach to teaching programming language foundations using the Coq theorem prover. At a recent PLDI, Singh, Gulwani and Solar-Lezama described a system, based on program synthesis, for automatically grading and fixing Python programs written by students in a Programming 101 course. In an IJCAI 2013 paper, Researchers from Penn, Illinois, Berkeley, and Microsoft present a system for teaching automata constructions that use techniques from automata-theoretic program analysis and has been used in undergraduate automata theory classes. The PLOOC workshop reported on several new and ongoing efforts in this space.

The rise of MOOCs, or Massive Open Online Courses, presents an exciting opportunity for PL research motivated by educational applications. If PL tools are useful for enriching the student experience in traditional courses, they are arguably *critical* in MOOCS. A typical MOOC has tens of thousands of students, and traditional mechanisms for evaluating and guiding students simply do not scale to a student population of this size. The MOOC platforms of today have come up with two broad solutions to this issue. One of these is *peer evaluation*, where students grade other students’ work. The other is to use automated testing on a test harness designed by the instructor.

Neither of these methods is entirely satisfactory. Peer evaluation is effective in many cases, and there are many studies that show that students can learn a lot by grading each other’s assignments. At the same time, peer evaluation can lead to a case of the blind leading the blind (this is a particular problem in MOOCs, given that they serve students who vary widely in ability and preparation). There are other problems too: for example, lazy or malicious peers may compromise the quality of evaluation, the vast differences in evaluators’ abilities can cause unfair differences in the grades and feedback that students receive, and even if a student is willing to grade a peer’s final submission, he would be unlikely to be available to offer feedback on her intermediate drafts.

Program testing is closer to what we want to accomplish using PL tools. However, testing in its traditional form is limited in its applicability. For instance, how do you effectively test a proof of an algebraic identity or a theorem in geometry? Secondly, coming up with test harnesses that achieve comprehensive coverage is often a nontrivial task. Third, while testing may tell a student that her submission is incorrect, it does not tell her much about how to fix the problem. The hope is that advanced PL techniques for debugging, verification, and synthesis of programs would allow us to do more than either of these approaches. Of course, the design of specific techniques that work well in real-life classrooms is a challenging problem. But then, that only means that we need to do more research in this area.

In this series of posts, I will write about some recent results and open questions in the PL-for-education problem space. In the next post, I will discuss in more detail some new results in this area that were presented at PLOOC 2014. In the final post, I will identify some research challenges whose solutions will help me as an undergraduate instructor.

There is some similar work outside PL. For example, Jeff Ullman uses http://www.gradiance.com for the automata courses he teaches on Coursera.

I took that course, so I can tell you what I remember of how it works. It is somewhat similar to the testing approach. Instead of designing tests, the instructor writes a program that generates tests. These tests are then posed as questions to the student, one by one (Whether the student solves them by hand or with a program is up to them.)

I should mention that there are several programming competitions that work in a similar way — ’here are tests, you give us the answers’. But there is a difference. In these competitions the tests are usually huge and you must use a program. In Jeff Ullman’s course, the tests/questions were all tiny but interesting. The challenge of generating small and interesting tests has a PL flavour.