We present a novel approach to automatically discover the best performing code from a given set of possible implementations. It involves a branch and bound algorithm with two distinctive features: (1) an analytic performance model of a lower bound on the execution time, and (2) the ability to estimate such bounds on a partially-specified implementation.

Dataflow synchronous languages such as Lustre have a purely functional semantics. This incurs a high overhead when dealing with arrays, as they have to be copied at each update. We propose to tackle this problem at the source, by constraining programs so that every functional array definition can be optimized into an in-place update.

We present PENCIL, a rigorously-defined subset of GNU C99-enriched with additional language constructs-that enables compilers to exploit parallelism and produce highly optimized code when targeting accelerators. PENCIL aims to serve both as a portable implementation language for libraries, and as a target language for DSL compilers.

We present VOBLA, a domain-specific language designed for programming linear algebra libraries. VOBLA is compiled to PENCIL, wich is in turn compiled to efficient, platform-specific OpenCL code. Given that the VOBLA coding takes significantly less effort compared to hand-optimizing OpenCL code, we believe our approach leads to improved productivity and performance portability.