Qinsi Wang

Qinsi Wang

Hello! My name is Qinsi Wang. I am a first-year PhD student in the CEI lab of the Department of Electrical and Computer Engineering at Duke University. I am fortunate to be advised by Prof. Yiran Chen and Prof. Hai "Helen" Li.

Before that, I conducted research at the University of Science and Technology of China and the National University of Singapore, where I was fortunate to be mentored by Prof. Lin Shao. I received my undergraduate degree in Electronic Science and Technology from Huazhong University of Science and Technology.

News

[ May 2025 ] : I’ll be joining Adobe in San Jose as a summer research intern in 2025! Always open to coffee chats ☕😊 — feel free to reach out!

[ Apr 2025 ] : A little personal milestone :I’ve had first-author papers accepted at all three major AI conferences — ICML, ICLR, and NeurIPS! Cheers🎉!

[ Apr 2025 ] : Our paper CoreMatching: A Co-adaptive Sparse Inference Framework with Token and Neuron Pruning for Comprehensive Acceleration of Vision-Language Models has been accepted by ICML 2025. CoreMatching essentially combines the two sparse modes of token and neuron in VLMs, and theoretically explains why cosin similarity is a better metric than attention score. Code is released at project page.

[ Feb 2025 ] : Our paper Dobi-SVD: Differentiable SVD for LLM Compression and Some New Perspectives has been accepted by ICLR 2025. Dobi-SVD is a novel LLM compression solution on low-cost computation devices! Visit the project page for more information.

[ Oct 2024 ] : Our paper CoreInfer: Accelerating Large Language Model Inference with Semantics-Inspired Adaptive Sparse Activation has been uploaded to arXiv. CoreInfer achieves a 10.33x speedup on an NVIDIA Titan XP without sacrificing performance! Visit the project page for more information.

[ Mar 2024 ] : I am excited to announce that I will join the Department of Electrical and Computer Engineering at Duke University as a PhD student in Fall 2024! Looking forward to my PhD life!

[ Sep 2023 ] : Our paper MathNAS: If Blocks Have a Role in Mathematical Architecture Design has been accepted by NeurIPS 2023. MathNAS achieves 82.5% top-1 accuracy on ImageNet-1k! See project page for more information.

[ Apr 2023 ] : Our paper DGL: Device Generic Latency model for Neural Architecture Search has been accepted by the IEEE Transaction on Mobile Computing (CCF A). DGL is dedicated to accelerating the deployment of NAS on mobile devices, and has conducted experiments on 50+ different mobile phones! Visit the project code for more information.

PDF

Code

Cite

News

Selected Publications

CoreMatching: A Co-adaptive Sparse Inference Framework with Token and Neuron Pruning for Comprehensive Acceleration of Vision-Language Models

Qinsi Wang, Hancheng Ye, Ming-Yu Chung, Yudong Liu, Yueqian Lin, Martin Kuo, Mingyuan Ma, Jianyi Zhang, Yiran Chen

ICML 2025

Dobi-SVD: Differentiable SVD for LLM Compression and Some New Perspectives

Qinsi Wang, Jinghan Ke, Masayoshi Tomizuka, Yiran Chen, Kurt Keutzer, Chenfeng Xu

ICLR 2025

CoreInfer: Accelerating Large Language Model Inference with Semantics-Inspired Adaptive Sparse Activation

Qinsi Wang, Saeed Vahidian, Hancheng Ye, Jianyang Gu, Jianyi Zhang, Yiran Chen

arXiv: 2410.18311

MathNAS: If Blocks Have a Role in Mathematical Architecture Design

Qinsi Wang, Jinghan Ke, Zhi Liang, Sihai Zhang

Neural Information Processing Systems (NeurIPS) 2023

DGL: Device Generic Latency model for Neural Architecture Search

Qinsi Wang, Sihai Zhang

IEEE Transactions on Mobile Computing, 2023

Hobby

Happy <-

News

Selected Publications

CoreMatching: A Co-adaptive Sparse Inference Framework with Token and Neuron Pruning for Comprehensive Acceleration of Vision-Language Models

Qinsi Wang, Hancheng Ye, Ming-Yu Chung, Yudong Liu, Yueqian Lin, Martin Kuo, Mingyuan Ma, Jianyi Zhang, Yiran Chen

ICML 2025

Dobi-SVD: Differentiable SVD for LLM Compression and Some New Perspectives

Qinsi Wang*, Jinghan Ke*, Masayoshi Tomizuka, Yiran Chen, Kurt Keutzer, Chenfeng Xu

ICLR 2025

CoreInfer: Accelerating Large Language Model Inference with Semantics-Inspired Adaptive Sparse Activation

Qinsi Wang, Saeed Vahidian, Hancheng Ye, Jianyang Gu, Jianyi Zhang, Yiran Chen

arXiv: 2410.18311

MathNAS: If Blocks Have a Role in Mathematical Architecture Design

Qinsi Wang*, Jinghan Ke*, Zhi Liang, Sihai Zhang

Neural Information Processing Systems (NeurIPS) 2023

DGL: Device Generic Latency model for Neural Architecture Search

Qinsi Wang, Sihai Zhang

IEEE Transactions on Mobile Computing, 2023

Hobby

Happy <-

Qinsi Wang, Jinghan Ke, Masayoshi Tomizuka, Yiran Chen, Kurt Keutzer, Chenfeng Xu

Qinsi Wang, Jinghan Ke, Zhi Liang, Sihai Zhang