Publications by Hantian Zhang

×

Status message

The Publications site is currently under construction, as a result some publications might be missing.

2017

Proceedings of the 27th International Conference on Field Programmable Logic and Applications (FPL), Ghent, Belgium, October 2017
Decision tree ensembles are commonly used in a wide range of applications and becoming the de facto algorithm for decision tree based classifiers. Different trees in an ensemble can be processed in parallel during tree inference, making them a suitable use case for FPGAs. Large tree ensembles, however, require careful mapping of trees to on-chip memory and management of memory accesses. As a result, existing FPGA solutions suffer from the inability to scale beyond tens of trees and lack the flexibility to support different tree ensembles. In this paper we present an FPGA tree ensemble classifier together with a software driver to efficiently manage the FPGA's memory resources. The classifier architecture efficiently utilizes the FPGA's resources to fit half a million tree nodes in on-chip memory, delivering up to 20× speedup over a 10-threaded CPU implementation when fully processing the tree ensemble on the FPGA. It can also combine the CPU and FPGA to scale to tree ensembles that do not fit in on-chip memory, achieving up to an order of magnitude speedup compared to a pure CPU implementation. In addition, the classifier architecture can be programmed at runtime to process varying tree ensemble sizes.
@inproceedings{abc,
	abstract = {Decision tree ensembles are commonly used in a wide range of applications and becoming the de facto algorithm for decision tree based classifiers. Different trees in an ensemble can be processed in parallel during tree inference, making them a suitable use case for FPGAs. Large tree ensembles, however, require careful mapping of trees to on-chip memory and management of memory accesses. As a result, existing FPGA solutions suffer from the inability to scale beyond tens of trees and lack the flexibility to support different tree ensembles. In this paper we present an FPGA tree ensemble classifier together with a software driver to efficiently manage the FPGA{\textquoteright}s memory resources. The classifier architecture efficiently utilizes the FPGA{\textquoteright}s resources to fit half a million tree nodes in on-chip memory, delivering up to 20{\texttimes} speedup over a 10-threaded CPU implementation when fully processing the tree ensemble on the FPGA. It can also combine the CPU and FPGA to scale to tree ensembles that do not fit in on-chip memory, achieving up to an order of magnitude speedup compared to a pure CPU implementation. In addition, the classifier architecture can be programmed at runtime to process varying tree ensemble sizes.},
	author = {Muhsen Owaida and Hantian Zhang and Ce Zhang and Gustavo Alonso},
	booktitle = {Proceedings of the 27th International Conference on Field Programmable Logic and Applications (FPL)},
	title = {Scalable inference of decision tree ensembles: Flexible design for CPU-FPGA platforms},
	venue = {Ghent, Belgium},
	year = {2017}
}