**Halide DSL** describes image processing pipelines in a simple functional style.

– Scheduling representation composably models a range of tradeoffs between locality, parallelism, and avoiding redundant work.

– Split representation separates schedules from the underlying algorithm.

– Arithmetic and logical operations, loads from external images, if-then-else expressions, references to named values, calls to other functions including external C ABI functions.

– Reduction function: programmer specifies the order by defining a reduction domain, bounded by minimum and maximum expressions for each dimension.

**Scheduling**: values need to be computed before they can be used, but still some choices remain to be made…

– when and where should the value at each coordinate in each function be computed?

– where should they be stored?

– how long are values cached and communicated across multiple consumers, and when are they independently recomputed by each?

__Motivation__

One example, scheduling choices can be divided into granularity of storage and granularity of computation.

Breadth first: first do for x-axis and then y-axis –> lots of parallelism, not much locality *(all things are saved to be overwritten and then reloaded)*

Loop fusion: do x-axis and y-axis together for each point –> lots of parallelism, some locality *(some parts are calculated then used in that iteration)*, redundant work

Sliding window: redundancy is removed by saving things in buffer –> limited parallelism, lost locality *(some parts are reused among iterations but the locality depends)*, no redundant work.

Overlapped tiling: tiling –> enough parallelism, increase locality *(some parts are calculated then used for that tile)*, some redundant work

Sliding window tiling: …

__Model for scheduling choice space__

Model for the space of important choices in scheduling: granularity to compute each of its inputs, granularity to store each for reuse, order of traversal for compute (domain order)…

Domain order: each dimension can be traversed sequentially or in parallel / constant-size dimensions can be unrolled or vectorized / dimensions can be reordered (column-major vs row-major) / dimensions can be split to inner and outer.

Call schedule: breadth-first / fused / sliding window

**Compiling**

**Autotuning**: a lot of possible schedules. *Choosing starting point by leveraging the domain knowledge seems like a good thing to learn from!*