An integer programming framework for optimizing shared memory use on GPUs

Wenjing Ma; Gagan Agrawal

doi:10.1145/1854273.1854348

An integer programming framework for optimizing shared memory use on GPUs

Wenjing Ma, Gagan Agrawal

Research output: Contribution to conference › Paper › peer-review

15 Scopus citations

Abstract

General purpose computing using GPUs is becoming increasingly popular, because of GPU's extremely favorable performance/price ratio. Besides application development using CUDA, automatic code generation for GPUs is also receiving attention. Like standard processors, GPUs also have a memory hierarchy, which must be carefully optimized for in order to achieve efficient execution. Specifically, modern NVIDIA GPUs have a very small programmable cache, referred to as shared memory, accesses to which are nearly 100 to 150 times faster than accesses to the regular device memory. An automatically generated or hand-written CUDA program can explicitly control what variables and array sections are allocated on the shared memory at any point during the execution. This, however, leads to a difficult optimization problem. In this paper, we formulate and solve the shared memory allocation problem as an integer programming problem. We present a global (intraprocedural) framework which can model structured control flow, and is not restricted to a single loop nest. We consider allocation of scalars, arrays, and array sections on shared memory. We also briefly show how our framework can suggest useful loop transformations to further improve performance. Our experiments using several non-scientific application show that our integer programming framework outperforms a recently published heuristic method, and our loop transformations also improve performance for many applications.

Original language	English (US)
Pages	553
DOIs	https://doi.org/10.1145/1854273.1854348 https://doi.org/10.1109/HIPC.2010.5713187
State	Published - 2010
Externally published	Yes
Event	17th International Conference on High Performance Computing, HiPC 2010 - Goa, India Duration: Dec 19 2010 → Dec 22 2010

Conference

Conference	17th International Conference on High Performance Computing, HiPC 2010
Country/Territory	India
City	Goa
Period	12/19/10 → 12/22/10

ASJC Scopus subject areas

Computational Theory and Mathematics
Theoretical Computer Science

Access to Document

http://portal.acm.org/citation.cfm?doid=1854273.1854348

Cite this

@conference{f69d26bf6b294cf1adda1f98032ccce8,

title = "An integer programming framework for optimizing shared memory use on GPUs",

abstract = "General purpose computing using GPUs is becoming increasingly popular, because of GPU's extremely favorable performance/price ratio. Besides application development using CUDA, automatic code generation for GPUs is also receiving attention. Like standard processors, GPUs also have a memory hierarchy, which must be carefully optimized for in order to achieve efficient execution. Specifically, modern NVIDIA GPUs have a very small programmable cache, referred to as shared memory, accesses to which are nearly 100 to 150 times faster than accesses to the regular device memory. An automatically generated or hand-written CUDA program can explicitly control what variables and array sections are allocated on the shared memory at any point during the execution. This, however, leads to a difficult optimization problem. In this paper, we formulate and solve the shared memory allocation problem as an integer programming problem. We present a global (intraprocedural) framework which can model structured control flow, and is not restricted to a single loop nest. We consider allocation of scalars, arrays, and array sections on shared memory. We also briefly show how our framework can suggest useful loop transformations to further improve performance. Our experiments using several non-scientific application show that our integer programming framework outperforms a recently published heuristic method, and our loop transformations also improve performance for many applications.",

author = "Wenjing Ma and Gagan Agrawal",

year = "2010",

doi = "10.1145/1854273.1854348",

language = "English (US)",

pages = "553",

note = "17th International Conference on High Performance Computing, HiPC 2010 ; Conference date: 19-12-2010 Through 22-12-2010",

}

TY - CONF

T1 - An integer programming framework for optimizing shared memory use on GPUs

AU - Ma, Wenjing

AU - Agrawal, Gagan

PY - 2010

Y1 - 2010

N2 - General purpose computing using GPUs is becoming increasingly popular, because of GPU's extremely favorable performance/price ratio. Besides application development using CUDA, automatic code generation for GPUs is also receiving attention. Like standard processors, GPUs also have a memory hierarchy, which must be carefully optimized for in order to achieve efficient execution. Specifically, modern NVIDIA GPUs have a very small programmable cache, referred to as shared memory, accesses to which are nearly 100 to 150 times faster than accesses to the regular device memory. An automatically generated or hand-written CUDA program can explicitly control what variables and array sections are allocated on the shared memory at any point during the execution. This, however, leads to a difficult optimization problem. In this paper, we formulate and solve the shared memory allocation problem as an integer programming problem. We present a global (intraprocedural) framework which can model structured control flow, and is not restricted to a single loop nest. We consider allocation of scalars, arrays, and array sections on shared memory. We also briefly show how our framework can suggest useful loop transformations to further improve performance. Our experiments using several non-scientific application show that our integer programming framework outperforms a recently published heuristic method, and our loop transformations also improve performance for many applications.

AB - General purpose computing using GPUs is becoming increasingly popular, because of GPU's extremely favorable performance/price ratio. Besides application development using CUDA, automatic code generation for GPUs is also receiving attention. Like standard processors, GPUs also have a memory hierarchy, which must be carefully optimized for in order to achieve efficient execution. Specifically, modern NVIDIA GPUs have a very small programmable cache, referred to as shared memory, accesses to which are nearly 100 to 150 times faster than accesses to the regular device memory. An automatically generated or hand-written CUDA program can explicitly control what variables and array sections are allocated on the shared memory at any point during the execution. This, however, leads to a difficult optimization problem. In this paper, we formulate and solve the shared memory allocation problem as an integer programming problem. We present a global (intraprocedural) framework which can model structured control flow, and is not restricted to a single loop nest. We consider allocation of scalars, arrays, and array sections on shared memory. We also briefly show how our framework can suggest useful loop transformations to further improve performance. Our experiments using several non-scientific application show that our integer programming framework outperforms a recently published heuristic method, and our loop transformations also improve performance for many applications.

UR - http://www.scopus.com/inward/record.url?scp=79952788812&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=79952788812&partnerID=8YFLogxK

U2 - 10.1145/1854273.1854348

DO - 10.1145/1854273.1854348

M3 - Paper

SP - 553

T2 - 17th International Conference on High Performance Computing, HiPC 2010

Y2 - 19 December 2010 through 22 December 2010

ER -

An integer programming framework for optimizing shared memory use on GPUs

Abstract

Conference

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this