IBM Programming Languages Day - PL Day 2014


The 2014 Programming Languages Day (PL day) will be held at the IBM T.J. Watson Research Center in the Main Auditorium on Tuesday, November 18, 2014. This is a one day event that will feature two keynote speakers, several conference-style talks, as well as a poster session for closer interaction among the attendees. The event is open to all IBMers and we encourage local participation from professors and students.

PL day is held in cooperation with the New England and New Jersey Programming Languages and Systems Seminars. The main goal of the event is to increase awareness of each other's work, and to encourage interaction and collaboration in the areas of programming languages and systems.

Announcements

Selection Committee

 

Keynote Speakers

Robert Grimm is a Vice President in the Technology Division of Goldman Sachs, where he works on the company's risk and trading platform. Before joining the firm in 2013, Robert was an Associate Professor of Computer Science at New York University, researching how to leverage programming language technologies to make complex systems easier to build, maintain and extend. He received a Ph.D. in Computer Science from the University of Washington at Seattle in 2002. His honors include the Best Paper award at the 6th ACM International Conference on Distributed Event-Based Systems, a Junior Fellowship at NYU's Center for Teaching Excellence, and an NSF CAREER award.

David Greenberg is the architect of Two Sigma's cluster computing environment. He is passionate about systems engineering and programming languages. He primarily works with Clojure, and thinks about how to bring functional programming principals to applied distributed systems. 

 

Logistics and Program

Attendees are welcome to arrive at the IBM T.J. Watson Research Center in Yorktown Heights starting at 9:30AM. The first presentation will start promptly at 10:00AM in the Main Auditorium.

  9:30-10:00 Breakfast and welcome
Keynote
10:00-11:00
SecDB: Technology for Nimble yet Systematic Trading
Robert Grimm, Goldman Sachs

  11:00-11:20 Break
Session 1
11:20-12:20
Session Chair: Stephen Chong

Hybrid Security Analysis of Web JavaScript Code via Dynamic Partial Evaluation

Marco Pistoia, IBM Research

Impact of Community Structure on SAT Solver Performance
Vijay Ganesh, University of Waterloo

Space-Efficient Manifest Contracts
Michael Greenberg, Princeton University

  12:20-1:10 Lunch
Session 2
1:10-2:10
Session Chair: Philippe Suter

Abstractions for scalable concurrent data structures

Irina Calciu, Brown University

Synthesizing Data-Structure Manipulations with Natural Proofs
Xiaokang Qiu, MIT

The Push/Pull model of transactions
Eric Koskinen, IBM Research

  2:10-2:30 Break
Keynote
2:30-3:30
The Economics of Types
David Greenberg, Two Sigma

  3:30-3:50 Break
Session 3
3:50-4:50
Session Chair: Stephen Fink

Functioning Hardware from Functional Specifications

Stephen A. Edwards, Columbia University

Asynchronous Functional Reactive Processes
Daniel Winograd-Cort, Yale University

Korz: Simple, Symmetric, Subjective, Context-Oriented Programming
Harold Ossher, IBM Research

 

Keynotes 

  • SecDB: Technology for Nimble yet Systematic Trading
    Robert Grimm, Goldman Sachs

    Running a successful trading business requires being nimble and systematic at the same time. Being nimble means continually monitoring one’s risk exposure, and swiftly adjusting to changing market as well as regulatory conditions. Yet being systematic means also accounting for every trade through its lifecycle, from order to settlement, while following applicable rules and regulations at every step. In this talk, I discuss how SecDB, Goldman Sachs’ trade and risk management system, directly helps meet the two requirements. First, SecDB integrates an in-house scripting language, called Slang, with a set of globally replicated object databases. As a scripting language, Slang simplifies the task of developing and evolving functionality. Furthermore, scripts are treated just like all other data and stored as objects in a database. Consequently, updating code across the firm can be as simple as storing a new version of a script in the database. Second, the issue tracking, code review, and regression testing systems are implemented in Slang and tightly integrated with the SecDB development environment. Similarly, most controls on trading are implemented as scripts, with some basic checks performed in the underlying C++ code. Consequently, sensitive code can only be updated after going through a rigorous vetting process. I present an overview of the SecDB scripting and database technology, sketch a development process that favors releasing code early and often, and discuss how technology and process reinforce each other with beneficial effect.
  • The Economics of Types [pdf]
    David Greenberg, Two Sigma

    Types remain a highly contentious issue for programmers in industry. Rails and Python have massive communities, and drive high-profile, high-traffic, and successful businesses, like GitHub, Reddit, and Dropbox. Clearly, dynamic languages are a better choice than Java or C; unless, of course, we count the myriad more businesses powered by the previous generation of statically typed languages. Upon hearing this, some engineers and researchers will be infuriated, as we all know that Hindley-Milner type systems are the only type systems that reduce defects and improve code quality. In this talk, we'll take a look at type systems through a different lens: how type systems are actually a business decision that is frequently cast as an emotional decision.

Abstracts 

  • Hybrid Security Analysis of Web JavaScript Code via Dynamic Partial Evaluation [pdf]
    Marco Pistoia, IBM Research

    We address the problem of detecting JavaScript security vulnerabilities in the client side of Web applications. Such vulnerabilities are becoming a source of growing concern due to the rapid migration of server-side business logic to the client side, combined with new JavaScript-backed Web technologies, such as AJAX, HTML5 and Web 2.0. Detection of client-side vulnerabilities is challenging given the dynamic and event-driven nature of JavaScript.

    We present a new form of hybrid JavaScript analysis, which augments static analysis with (semi) concrete information by applying partial evaluation to JavaScript functions according to dynamic data recorded by the Web crawler. The dynamic component rewrites the program per the HTML environment containing the JavaScript code, and the static component then explores all possible behaviors of the partially evaluated program (while treating user-controlled aspects of the environment conservatively).

    We have implemented this hybrid architecture as the JSA tool, which we recently integrated into a commercial product. We formalize the static analysis and prove useful properties over it. We also tested the system across a set of 170,000 Web pages, comparing it with purely static and dynamic alternatives. The results we obtained provide conclusive evidence in favor of our hybrid approach. Only 10% of the reports by JSA are false alarms compared to 63% of the alarms flagged by its purely static counterpart, while not a single true warning is lost. This represents a reduction of 94% in false alarms. Compared with a commercial dynamic testing algorithm, JSA is able to detect vulnerabilities in >4x more Web sites with only 4 false alarms.
  • Impact of Community Structure on SAT Solver Performance [pdf]
    Vijay Ganesh, University of Waterloo

    In the last 15 years SAT and SMT solvers have undergone dramatic improvements in efficiency, and as a consequence these solvers have had a transformative impact on software engineering (construed broadly to include formal methods, testing, analysis, synthesis and programming languages). While this impact is now widely recognized, it is also known that we don't have a full account of why these solvers are so efficient. Such an account would precisely characterize the inputs on which solvers are efficient, provide meaningful complexity-theoretic bounds on the core solver algorithms, and eventually lead to much more predicable and efficient solvers.

    An example of this phenomenon is the modern SAT solver. These solvers routinely solve very large industrial SAT instances in relatively short periods of time. It is clear that these solvers somehow exploit the structure of real-world instances. However, to-date there have been few results that precisely characterize this structure. In this talk, we provide several pieces of empirical evidence that show that the community structure (characterized using a metric called Q or quality factor) of real-world SAT instances is strongly correlated with the running time of CDCL SAT solvers. First, we show that there is a strong correlation between the Q value and Literal Block Distance metric of quality of conflict clauses used in clause-deletion policies in Glucose-like solvers. Second, using regression analysis, we show that the number of communities and the Q value of the graph of real-world SAT instances is more predictive of the running time of CDCL solvers than traditional metrics like number of variables or clauses. Finally, we show that randomly-generated SAT instances with 0.05 <= Q <=0.13 are dramatically harder to solve for CDCL solvers than otherwise.
  • Space-Efficient Manifest Contracts [pdf]
    Michael Greenberg, Princeton University

    The standard algorithm for higher-order contract checking can lead to unbounded space consumption and can destroy tail recursion, altering a program's asymptotic space complexity.

    While space efficiency for gradual types---contracts mediating untyped and typed code---is well studied, sound space efficiency for manifest contracts---contracts that check stronger properties than simple types, e.g., ``is a natural'' instead of ``is an integer''---remains an open problem.

    We show how to achieve sound space efficiency for manifest contracts with strong predicate contracts. The essential trick is breaking the contract checking down into coercions: structured, blame-annotated lists of checks. By carefully preventing duplicate coercions from appearing, we can restore space efficiency while keeping the same observable behavior.
  • Abstractions for scalable concurrent data structures [pdf]
    Irina Calciu, Brown University

    For a long time, Moore’s law ensured that performance would increase with each new CPU iteration. But the free ride is over and the demand for faster computation is now satisfied through parallelism. Server machines with hundreds of cores are becoming ubiquitous. Ideally, performance would increase linearly with the number of cores, but in practice that is rarely the case. Communication and synchronization between cores running the same application are often necessary, and usually come at a high cost, causing negative scalability and a significant drop in performance. In order to leverage the huge potential of multicores, we need better synchronization methods and updated programming abstractions.

    My research is focused on designing fast and scalable concurrent data structures. In particular, I used abstractions such as delegation and elimination to design scalable and efficient concurrent stacks and priority queues. Delegation uses message passing on top of shared memory instead of the cache coherency protocol.  A dedicated thread, called server, is responsible for performing work on behalf of other threads, known as clients. Overall, delegation is simpler than traditional synchronization based on locking and it can disconnect the implementation details from the algorithm design, resulting in a powerful abstraction that can be used on very diverse platforms.

    Elimination has been proposed in the literature as a method to improve scalability of contended data structures by grouping together inverse operations – operations that cancel each other out.  The eliminating operations do not access the shared data structure, therefore reducing contention. The operations that do not eliminate get collected by a dedicated server thread and executed on the data structure as one combined operation. This reduces contention and speeds up computation.
    Developers can use these abstractions as building blocks to create efficient code that scales on very diverse platforms, without requiring specialized knowledge of parallel programming.
  • Synthesizing Data-Structure Manipulations with Natural Proofs [pdf]
    Xiaokang Qiu, MIT

    We present a new approach to the synthesis of provably correct implementations of data-structure manipulation routines. The input to our system is a program template together with a specification that the synthesized program must meet. Our approach produces a program implementation along with necessary loop invariants, and guarantees the correctness with a proof.

    The starting point for our work is the synthesis-enabled language Sketch, which can synthesize simple data-structure manipulations, but only guarantees their correctness on small finite domains. Our approach is to leverage the natural proof strategy, which reduces the correctness on unbounded heap to the correctness of a bounded heap representation. By encoding the natural proof strategy as part of the sketch of the data-structure manipulation, the bounded check performed by sketch will actually imply that the implementation produced by Sketch will be correct even for unbounded heaps.
    To evaluate our approach we synthesize standard manipulations on various list and tree data-structures. Experiments show that our system can efficiently produce correct, recursive or iterative implementations in a matter of minutes.
  • The Push/Pull model of transactions [pdf]
    Eric Koskinen, IBM Research

    We present a general theory of serializability, unifying a wide range of transactional algorithms, including some that are yet to come. To this end, we provide a compact semantics in which concurrent transactions push their effects into the shared view (or unpush to recall effects) and pull the effects of potentially uncommitted concurrent transactions into their local view (or unpull to detangle). Each operation comes with simple side-conditions given in terms of commutativity (Lipton moverness).

    The benefit of this model is that most of the elaborate reasoning (coinduction, simulation, subtle invariants, etc.) necessary for proving the serializability of a transactional algorithm is already proved within the semantic model. Thus, proving serializability (or opacity) amounts simply to mapping the algorithm on to our rules, and showing that it satisfies the rules' side-conditions.
  • Functioning Hardware from Functional Specifications [pdf]
    Stephen A. Edwards, Columbia University

    For performance at low power, tomorrow's chips will be mostly application-specific logic only powered when needed.  Furthermore, programs will have to cope with the realities of complex, distributed memory systems and higher-level languages will have to conceal even more implementation details to deliver reasonable programmer productivity.

    We propose synthesizing application-specific logic from the functional language Haskell.  Our approach -- rewriting to a simple dialect that enables a syntax-directed translation -- enables parallelization and distributed memory systems.  Transformations include scheduling arithmetic operations, replacing recursion with iteration, and improving data locality by inlining recursive types.  We are developing a compiler based on these principles and will present our techniques along with some preliminary results.
  • Asynchronous Functional Reactive Processes [pdf]
    Daniel Winograd-Cort, Yale University

    Functional Reactive Programming (FRP) is a model that facilitates programming real-time and reactive systems by utilizing signal functions that transform streams of inputs into streams of outputs. One of the great benefits of FRP is specifically that it allows the programmer to believe that instantaneous values on those streams are processed infinitely fast, an illusion that we call the fundamental abstraction of FRP.  To maintain this illusion, FRP enforces a synchronization to the program as a whole. However, when we desire portions of a program to run faster than others, this synchronization becomes a burden. Our work explores the repercussions of selectively asynchronizing individual functional reactive processes, particularly in the presence of other arbitrary effects (i.e. mutation and physical resource interaction).

    Introducing asynchrony cannot be as simple as merely providing an operator that asynchronizes - indeed, doing so haphazardly could shatter this fundamental abstraction entirely.  However, if we consider each asynchronous process as having its own notion of time and carefully mediate communication between them, we can retain the abstraction on a per-process level and still reap the benefits of asynchrony.
    Asynchrony raises interesting questions when confronted with dynamic control flow.  FRP can be entirely static, but it is often extended with some means of self-alteration, such as higher order signal expression.  Thus, we additionally devise a scheme wherein asynchronous processes that are removed from the control flow graph are safely terminated.

    Lastly, we discuss the possibility of resynchronization. Resynchronization can be quite useful for the purposes of performance, where one may wish two processes to run in parallel but have their results synchronize.  Although this is technically feasible, allowing resynchronization points forces us to weaken the fundamental abstraction.  We explore why this is and propose an alternative solution.
  • Korz: Simple, Symmetric, Subjective, Context-Oriented Programming [pdf]
    Harold Ossher, IBM Research

    Korz is a new computational model that provides deep support for the sort of dynamic contextual variation required for shared services. It is especially important for cognitive services, which must adapt their behavior to context in increasingly-sophisticated ways. Korz combines implicit arguments and multiple dispatch in a slot-based model. This synthesis enables the writing of software that supports dynamic contextual variation along multiple dimensions, and graceful evolution of that software to handle new, unexpected dimensions of variability, without the need for additional mechanism such as layers or aspects. Dimensions can also be used to provide and organize different language-level views of software, such as type checking, symbolic execution, normal execution and reflection, both introspection and intercession.

    With Korz, a system consists of a sea of method and data slots in a multidimensional space. There is no fixed organization of slots into objects – a slot pertains to a number of objects instead of being contained by a single object – and slots can come together according to the implicit context in any given situation, yielding subjective objects. There is no dominant decomposition, and no dimension holds sway over any other. IDE support is essential for managing complexity when working with the slot space and with subjectivity, allowing the task at hand to dictate what subspaces to isolate and what dominance of dimensions to use when presenting nested views to the user. We have implemented a prototype interpreter and IDE, and used it on several examples. This early experience has revealed much that needs to be done, but has also shown considerable promise. It seems that Korz's particular combination of concepts, each well-known from the past, is indeed more powerful than the sum of its parts.