An optimized GPU-based 2D convolution implementation

Affiliation auteurs!!!! Error affiliation !!!!
TitreAn optimized GPU-based 2D convolution implementation
Type de publicationJournal Article
Year of Publication2016
AuteursPerrot G, Domas S, Couturier R
JournalCONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE
Volume28
Pagination4291-4304
Date PublishedNOV
Type of ArticleArticle; Proceedings Paper
ISSN1532-0626
Mots-clésconvolution, Filter, GPU
Résumé

With the increasing sophistication of image processing algorithms, and because of its low computation complexity, convolution should fully benefit from the ever-increasing capacities of state-of-the-art graphics processing units, such as Nvidia's Kepler and Maxwell family cards. Currently, it tends to be used as a preprocessing stage within more intricate image manipulations and has recently been implemented quite efficiently by several teams. However, either their implementations do not come near hardware's peak performance or are unable to process large mask sizes. Such limitations are overrun by our original parallel register-only convolution filter implementation of two-dimensional convolution filters that can process 32-bit floating-point images on a NVidia K40 card using mask sizes up to 127 x 127 and at the same time achieving pixel throughputs over 29GP/s, which is, as far as we know, the highest rate known to date. Such results were obtained by using registers sparingly and by designing memory access patterns that cancel both load and store replays at warp levels, along with optimizing cache use. Copyright (C) 2015 John Wiley & Sons, Ltd.

DOI10.1002/cpe.3752