Guyue Huang

Logo

View My GitHub Profile

Contact

hguyue1@gmail.com

Bio

I am currently an engineer at NVIDIA working on GPU architecture and systems for deep learning. Previously I obtained my Ph.D. and Masters from University of California Santa Barbara where I worked with Prof. Zheng Zhang, Prof. Yufei Ding and Prof. Yuan Xie. My PhD research is about deep learning systems and architecture, particularly focused on DL compiler and DL sparsity. I received my B.E. from Department of Electronic Engineering, Tsinghua University in Beijing, China.

[Google Scholar page]

Selected Publications

[MICRO’23] Guyue Huang, Zhengyang Wang, Po-An Tsai, Chen Zhang, Yufei Ding, Yuan Xie. RM-STC: Row-Merge Dataflow Inspired GPU Sparse Tensor Core for Energy-Efficient Sparse Acceleration. To appear in 56th IEEE/ACM International Symposium on Microarchitecture (MICRO-56), 2023.

[MLsys’23] Guyue Huang, Yang Bai, Liu Liu, Yuke Wang, Bei Yu, Yufei Ding, Yuan Xie. ALCOP: Automatic Load-Compute Pipelining in Deep Learning Compiler for AI-GPUs. Machine Learning and Systems, 2023. [preprint][code]

[DAC’22] Guyue Huang, Haoran Li, Minghai Qin, Fei Sun, Yufei Ding and Yuan Xie. Shfl-BW: Accelerating Deep Neural Network Inference with Tensor-Core Aware Weight Pruning. DAC’22 [preprint][code][bibtex]

[ACM-SRC’21 Poster] Guyue Huang, Guohao Dai, Yu Wang, Yufei Ding and Yuan Xie. Efficient Sparse Matrix Kernels based on Adaptive Workload-Balancing and Parallel-Reduction. 2021. ACM Student Research Competition (SRC), Graduate 3rd Place (https://src.acm.org)

[SC’20] Guyue Huang, Guohao Dai, Yu Wang and Huazhong Yang. GE-SpMM: General-purpose Sparse Matrix-Matrix Multiplication on GPUs for Graph Neural Networks. The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC), 2020. [preprint][code][bibtex]

Job Experience

Research Experiences

My PhD research is about supporting sparsity in AI/DL on GPU. Sparisty is a fascinating feature in modern deep learning that is both highly potential and extremely difficult for hardware to tackle. I investigate software and architecture methods to empower many forms of sparsity including weight sparsity, activation sparsity, graphs, embedding layers, and MoE. Refer to RM-STC (to appear, MICRO 2023), Shfl-BW (DAC’22), DA-SpMM (DAC’22) and GE-SpMM (SC’20).

I also do research about deep learning compiler. I am interested in how to integrate advanced hardware features and analytical performance models into DL compilers to close the gap between compiler generated and manually developed kernels on DL accelerators. I mainly work on the TVM stack. My recent work ALCOP(MLSys’23) studies how to realize load-compute pipelining via compiler automation.

Awards

Academic Services

Talks

Open-source

ALCOP

ALCOP is short for Automatic Load-COmpute Pipelining. This project presents a compiler pass based on TVM that pipelines the data movement and computation in GPU kernels. Code released at this repo. Paper at this link.

ShflBW

ShflBW is a sparse NN kernel library, and also a pattern pruning method. Code released at this repo. Paper at this link.

dgSPARSE

The dgSPARSE project contains high-performance GPU kernels for sparse matrix primitives. We provide an interface to easily replace cuSPARSE in your existing applications. It contains