Non-Uniform Parallelism on a GPGPU

Benjamin Lerner, Trevor Jim and Yitzhak Mandelbaum

Downloads

Experiences coding Non-Uniform Parallelism using the CUDA GPGPU Architecture, (presented at NJPLS, August 2008)

Overview

It is well known that certain kinds of tasks are better suited for parallel architectures than others. Heavily imperative, sequential code does not parallelize well at all, while data-parellel code runs exceptionally well. The gray area in between, of non-uniform parallelism, is more difficult to map onto parallel architectures.

Recent years have seen an increasing interest in the use of General Purpose GPU computation, where graphics cards are being used as coprocessors to speed up many kinds of software problems. In this work, we examined how well non-uniform parallel problems fit onto a GPU architecture, and specifically focused on parsing, a well-studied problem with known opportunities for parallelism. The goal was to see how much of a speedup was provided by the GPGPU, and how difficult it was to achive that improvement.

Over the course of a ten-week intership at AT&T Reseach, I implemented the Earley parsing algorithm with several different optimizations for the GPGPU. Ultimately, we failed to achieve any speedup (by the end of the summer, we had reached amortized parity with the simple CPU algorithm), but in so doing we learned several programming idioms and difficulties with the GPGPU style of coding.