Chuan Xiao — Homepage

Research Interests

My research interests focus on AI, NLP, computer simulation, data science, and data management. In detail, I explore the following research fields:

I am an associate professor with the Big Data Engineering Laboratory at Osaka University and a guest associate professor with the Database Laboratory at Nagoya University. I received my Ph.D. from the University of New South Wales in 2010 under the supervision of Xuemin Lin and Wei Wang.

Ph.D., Computer Science & Engineering — University of New South Wales, Australia (2007–2010)
B.E., Computer Science & Engineering — Northeastern University, China (2001–2005)

04/2021–present — Associate Professor, Osaka University / Guest Associate Professor, Nagoya University
04/2019–03/2021 — Specially Appointed Associate Professor, Osaka University / Guest Associate Professor, Nagoya University
03/2014–03/2019 — Designated Assistant Professor, Nagoya University
10/2011–02/2014 — Postdoc Research Associate, Nagoya University
09/2010–09/2011 — Postdoc Research Fellow, University of New South Wales

PhD Students

Master's Students

Undergraduate Students

For graduated students, please see my biography page.

Selected current topics and representative publications.

Using foundation-model agents (LLMs/VLMs) to formulate simulations in natural language and study phenomena across economics and behavioral science.

LLM-Based Social Simulations Require a Boundary [Paper]
Seven Security Challenges in Cross-domain Multi-agent LLM Systems [Paper]
Large Language Models as Urban Residents: An LLM Agent Framework for Personal Mobility Generation (NeurIPS 2024) [Paper] [Code]
Shall We Team Up: Spontaneous Cooperation of Competing LLM Agents (EMNLP 2024 Findings) [Paper] [Code]
Smart Agent-Based Modeling [Paper] [Slides]

Leveraging NLP techniques for table understanding, data cleaning, integration, and augmentation in large data lakes.

On the Use of LLMs for Table Tasks (CIKM 2024 Tutorial) [Slides]
Jellyfish: A Large Language Model for Data Preprocessing (EMNLP 2024) [Paper] [Models] [Dataset]
Large Language Models as Data Preprocessors (TaDA 2024, VLDB Workshop) [Paper]
BClean: A Bayesian Data Cleaning System (ICDE 2024) [Paper]
DeepJoin: Joinable Table Discovery with Pre-trained Language Models (PVLDB 2023) [Paper] [Code] (implemented by Deng et al. @ BIT)

Efficient methods for high-dimensional ANNS/MIPS, sets, strings, and generic similarity operations.

Probabilistic Kernel Function for Fast Angle Testing [Paper]
Probabilistic Routing for Graph-based ANNS (ICML 2024) [Paper] [Code]
MQH: Locality Sensitive Hashing on Multi-level Quantization Errors for Point-to-Hyperplane Distances (PVLDB 2022) [Paper] [Code]
HVS: Hierarchical Graph Structure Based on Voronoi Diagrams for Solving Approximate Nearest Neighbor Search (PVLDB 2021) [Paper] [Code]
High-Dimensional Similarity Query Processing for Data Science (KDD 2021 Tutorial) [Slides]

Computational approaches in sociology, law, education, and history through data-centric methods.

Legal Fact Prediction: The Missing Piece in Legal Judgment Prediction (EMNLP 2025) [Paper] [Benchmark]
Utilization of Information Entropy in Training and Evaluation of Students' Abstraction Performance and Algorithm Efficiency in Programming (ToE) [Paper]

For a complete and up-to-date list, see my Google Scholar or DBLP entry.

SABM case studies: number-guessing, emergency evacuation, plea bargain, Bertrand competition
Shall we team up: Spontaneous cooperation of competing LLM agents
LLMob: agentic workflow for personal mobility generation
SAS: software for LLM-empowered automated surveys