Publications by Roman Cattaneo
2017
Systems Group Master's Thesis, no. 168; Department of Computer Science, May 2017
Supervised by: Prof. Torsten Hoefler
Supervised by: Prof. Torsten Hoefler
Filed-programmable gate arrays (FPGA) are gaining interest in the high performance computing community due to their potential for high performance at low power. Programming FPGAs has traditionally been done by hardware engineers in languages working on the register transfer level. High-level synthesis opens FPGAs up to a wider audience, by facilitating the transformation of imperative code into hardware circuits. This thesis builds kernels from a C++ source code, exploiting higher level language features such as objects and templates to increase expresiveness and productivity.
By modeling performance in terms of FPGA resources, a scalable matrixmatrix multiplication kernel is constructed. Performance and resource utilization are verified experimentally on an AlphaData 7V3 board, hosting a Xilinx Virtex-7 FPGA. For single precision floating point data,
performance up to 95 GFLOP/s was measured.
A single source-file solution is constructed, solving not only matrixmatrix multiplication, but also the all-pairs shortest path problem by substituting operations and data types. The blocked, hybrid CPUFPGA approach was used to gain further insights in resource utilization for integer data. These results demonstrate that HLS can indeed enable FPGA programming for high performance with little to no prior experience in hardware design.
@mastersthesis{abc, abstract = {Filed-programmable gate arrays (FPGA) are gaining interest in the high performance computing community due to their potential for high performance at low power. Programming FPGAs has traditionally been done by hardware engineers in languages working on the register transfer level. High-level synthesis opens FPGAs up to a wider audience, by facilitating the transformation of imperative code into hardware circuits. This thesis builds kernels from a C++ source code, exploiting higher level language features such as objects and templates to increase expresiveness and productivity. By modeling performance in terms of FPGA resources, a scalable matrixmatrix multiplication kernel is constructed. Performance and resource utilization are verified experimentally on an AlphaData 7V3 board, hosting a Xilinx Virtex-7 FPGA. For single precision floating point data, performance up to 95 GFLOP/s was measured. A single source-file solution is constructed, solving not only matrixmatrix multiplication, but also the all-pairs shortest path problem by substituting operations and data types. The blocked, hybrid CPUFPGA approach was used to gain further insights in resource utilization for integer data. These results demonstrate that HLS can indeed enable FPGA programming for high performance with little to no prior experience in hardware design.}, author = {Roman Cattaneo}, school = {168}, title = {High-level synthesis of dense matrix operations on FPGA}, year = {2017} }