Kung-Hsiang (Steeve) Huang

Kung-Hsiang (Steeve) Huang

Member of Technical Staff · Amazon AGI SF Lab

San Francisco, California kh.huang@alumni.usc.edu

About Me

I'm a Member of Technical Staff at Amazon AGI SF Lab, where I lead tool-use post-training to enable agents to reliably complete long-horizon knowledge-work tasks across web, code, terminal, and enterprise systems.

Prior to that, I was a senior research scientist at Salesforce AI Research, where I worked on LLM/VLM post-training (ACL 2025 Findings, ACL 2025 Findings), computer use agents (Arxiv), environment simulation (NAACL 2025, TMLR), deep research (ICLR 2026, EMNLP 2025 Industry Track), reasoning (ICLR 2026), and trustworthiness & alignment (NeurIPS 2024, NAACL 2025 Findings, TMLR). I earned my PhD at the University of Illinois Urbana-Champaign, advised by Prof. Heng Ji.

Recent News

  • Apr, 2026 I joined Amazon AGI SF Lab as a Member of Technical Staff.
  • Feb, 2026 I was promoted to Senior Research Scientist at Salesforce AI Research.
  • Jan, 2026 I have two papers accepted by ICLR 2026 and one paper accepted by TMLR.
  • Nov, 2025 I have one paper accepted by TMLR.
  • May, 2025 I have three papers accepted by ACL 2025.
  • Jan, 2025 I have two papers accepted by NAACL 2025.
  • Nov, 2024 I have one paper accepted by COLING 2025.
  • Nov, 2024 I have one paper accepted by TKDE.
  • Nov, 2024 I am selected as a top reviewer for NeurIPS 2024.
  • Sep, 2024 I have one paper accpeted by NeurIPS 2024.

Selected Publications

* Equal Contribution. Please refer to my Google Scholar page for a complete list of publications.

2026

GTA: Generating Long-horizon Tasks for Web Agents at Scale
Tenghao Huang, Kung-Hsiang Huang, Prafulla Kumar Choubey, Yilun Zhou, Muhao Chen, Jonathan May, Chien-Sheng Wu
ACL 2026
Bibtex
Don’t Stop Early: Scalable Enterprise Deep Research with Controlled Information Flow and Evidence-Aware Termination
Prafulla Kumar Choubey*, Kung-Hsiang Huang*, Pranav Narayanan Venkit*, Jiaxin Zhang*, Vaibhav Vats, Yu Li, Xiangyu Peng, Chien-Sheng Wu
ACL 2026 Industry Track
Bibtex
CRMArena-Pro: Holistic Assessment of LLM Agents Across Diverse Business Scenarios and Interactions
Kung-Hsiang Huang, Akshara Prabhakar, Onkar Thorat, Divyansh Agarwal, Prafulla Kumar Choubey, Yixin Mao, Silvio Savarese, Caiming Xiong, Chien-Sheng Wu
TMLR
PDF Bibtex Code Dataset
Nudging the Boundaries of LLM Reasoning
Justin Chih-Yao Chen, Xiangyu Peng, Prafulla Kumar Choubey, Kung-Hsiang Huang, Jiaxin Zhang, Mohit Bansal, Chien-Sheng Wu
ICLR 2026
PDF Bibtex
DeepTRACE: Auditing Deep Research AI Systems for Tracking Reliability Across Citations and Evidence
Pranav Narayanan Venkit, Philippe Laban, Yilun Zhou, Kung-Hsiang Huang, Yixin Mao, Chien-Sheng Wu
ICLR 2026
PDF Bibtex
Agentic Uncertainty Quantification
Jiaxin Zhang, Prafulla Kumar Choubey, Kung-Hsiang Huang, Caiming Xiong, Chien-Sheng Wu
Arxiv
PDF Bibtex

2025

GUI-KV: Efficient GUI Agents via KV Cache with Spatio-Temporal Awareness
Kung-Hsiang Huang, Haoyi Qiu, Yutong Dai, Caiming Xiong, Chien-Sheng Wu
Arxiv
PDF Bibtex Code
MMPersuade: A Dataset and Evaluation Framework for Multimodal Persuasion
Haoyi Qiu, Yilun Zhou, Pranav Narayanan Venkit, Kung-Hsiang Huang, Jiaxin Zhang, Nanyun Peng, Chien-Sheng Wu
Arxiv
PDF Bibtex
Benchmarking Deep Search over Heterogeneous Enterprise Data
Prafulla Kumar Choubey, Xiangyu Peng, Shilpa Bhagavath, Kung-Hsiang Huang, Caiming Xiong, Chien-Sheng Wu
EMNLP 2025 Industry Track
PDF Bibtex
Multimodal Cultural Safety: Evaluation Frameworks and Alignment Strategies
Haoyi Qiu, Kung-Hsiang Huang, Ruichen Zheng, Jiao Sun, Nanyun Peng
TMLR
PDF Bibtex
Why Vision Language Models Struggle with Visual Arithmetic? Towards Enhanced Chart and Geometry Understanding
Kung-Hsiang Huang, Can Qin, Haoyi Qiu, Philippe Laban, Shafiq Joty, Caiming Xiong, Chien-Sheng Wu
ACL 2025 Findings
PDF Bibtex Code Dataset
LAM Simulator: Advancing Data Generation for Large Action Models Trainings via Online Exploration and Feedback Simulation
Thai Hoang, Kung-Hsiang Huang, Shirley Kokane, Jianguo Zhang, Zuxin Liu, Ming Zhu, Jake Grigsby, Tian Lan, Michael S Ryoo, Chien-Sheng Wu, Shelby Heinecke, Huan Wang, Silvio Savarese, Caiming Xiong, Juan Carlos Niebles
ACL 2025 Findings
PDF Bibtex
M2-TabFact: Multi-Document Multi-Modal Fact Verification with Visual and Textual Representations of Tabular Data
Mingyang Zhou, Lingyu Zhang, Sophia Horng, Maximillian Chen, Kung-Hsiang Huang, Shih-Fu Chang
ACL 2025 Findings
PDF Bibtex
CRMArena: Understanding the Capacity of LLM Agents to Perform Professional CRM Tasks in Realistic Environments
Kung-Hsiang Huang, Akshara Prabhakar, Sidharth Dhawan, Yixin Mao, Huan Wang, Silvio Savarese, Caiming Xiong, Philippe Laban, Chien-Sheng Wu
NAACL 2025.  
PDF Bibtex Code Dataset Leaderboard
Evaluating Cultural and Social Awareness of LLM Web Agents
Haoyi Qiu, Alexander R. Fabbri*, Divyansh Agarwal*, Kung-Hsiang Huang*, Sarah Tan, Nanyun Peng, Chien-Sheng Wu
NAACL 2025 Findings.  
PDF Bibtex

Service

ACL Rolling Review Area Chair 2024 — 2025
Reviewer
2025 2024 2023 2022 2021
ACL 
EMNLP 
NAACL 
NeurIPS 
ICLR 
ICML 
TMLR 
JAIR