Short Course Description:


This course will cover techniques for generating low-level vectorized

code for CPUs and GPUs from high-level data-parallel programs composed

of map and reduce-style operators.  The course will be half paper

reading and half compiler writing.  Assignments will walk each student

through building a new code-generation backend for a high-level

language that we will provide.  (A choice of OpenCL or Cilk/ICC

compilation targets is provided.)  The compiler assignments will be

reminiscent of P423, and P423/P523 or permission of the instructor is

a prerequisite.



Extended Course Description:


There is a renaissance underway in data-parallel programming.

Numerous languages and libraries have been proposed to enable

higher-level expression of data-parallelism both in fine-grained

domains (GPU programming) and coarse (MapReduce).  Similar projects

projects are cropping up within many major languages, including:

JavaScript [1], Python [2], .NET [3], C++ [4], Haskell [5].  A

remarkable consensus has emerged---almost all of these systems expose

the same high-level, aggregate operators: map, reduce (fold), scan,

filter, and so on.


Thus, irrespective of language, there is a great need to develop the

compiler techniques that will generate efficient code from

compositions of these aggregate operators.  That is the topic of this

course.  We will read papers from the primary literature that describe

transformations of operator graphs, scheduling in time, reducing

synchronization, and scheduling onto [GPU] memory hierarchies.


The practical component of the course will consist of a series of

assignments, each of which requires developing one or more compiler

passes (similar to P423).  These assignments focus on the backend

only; we provide a language front-end that produces a simplified

abstract syntax tree for subsequent manipulation.  All students

complete the same assignments, but there are two choices of final

target for code generation: OpenCL and C++ with Cilk Plus extensions.

At the end of the course we will all participate in a performance

contest, pitting our backends against previously published systems.


Haskell is the implementation language for constructing this course's

compiler.  However, only basic Haskell features are used in the course

infrastructure, and the compiler passes may use the same style.  The

Haskell language in this case should not be a barrier for anyone who

has used Scheme to transform abstract syntax using the 'match' macro.



[1] http://games.slashdot.org/story/11/09/16/1922243/river-trail-intels-parallel-javascript

[2] http://code.google.com/p/copperhead/

[3] http://research.microsoft.com/en-us/projects/accelerator/

[4] http://software.intel.com/en-us/articles/intel-array-building-blocks/

[5] https://github.com/AccelerateHS/accelerate/