avatar
Chuan XIAO

Associate Professor, Osaka University

Guest Associate Professor, Nagoya University

Email: chuanx [at] nagoya-u.jp

News

Research Interests

My research interests focus on AI, NLP, computer simulation, data science, and data management. In detail, I explore the following research fields:

  • Agent-Based Modeling
  • Computational Social Science
  • Legal AI
  • Emergent Language
  • Data Lakes

Biography

I am an associate professor with the Big Data Engineering Laboratory at Osaka University and a guest associate professor with the Database Laboratory at Nagoya University. I received my Ph.D. from the University of New South Wales in 2010 under the supervision of Xuemin Lin and Wei Wang.

Education

  • Ph.D., Computer Science & Engineering — University of New South Wales, Australia (2007–2010)
  • B.E., Computer Science & Engineering — Northeastern University, China (2001–2005)

Professional Experience

  • 04/2021–present — Associate Professor, Osaka University / Guest Associate Professor, Nagoya University
  • 04/2019–03/2021 — Specially Appointed Associate Professor, Osaka University / Guest Associate Professor, Nagoya University
  • 03/2014–03/2019 — Designated Assistant Professor, Nagoya University
  • 10/2011–02/2014 — Postdoc Research Associate, Nagoya University
  • 09/2010–09/2011 — Postdoc Research Fellow, University of New South Wales

Supervision

PhD Students

Master's Students

  • Song Que
  • Rei Taniguchi
  • Yuzhou Jin
  • Ryosuke Mizui
  • Zichuan Xu

Undergraduate Students

  • Kei Karube
  • Hirotsuna Matsushita

For graduated students, please see my biography page.

Professional Activities

  • PC Chair: SFDI 2020, ADMA 2025, APWeb-WAIM 2026
  • PC Vice Chair: IEEE BigData 2023, DASFAA 2024
  • Track Chair: SoiCT 2017, APWeb-WAIM 2024, ADC 2024
  • Area Chair: ACL ARR 2025
  • International Relationship Chair: iDB 2012

Research

Selected current topics and representative publications.

Smart Agent-Based Modeling (SABM)

SABM

Using foundation-model agents (LLMs/VLMs) to formulate simulations in natural language and study phenomena across economics and behavioral science.

  • LLM-Based Social Simulations Require a Boundary [Paper]
  • Seven Security Challenges in Cross-domain Multi-agent LLM Systems [Paper]
  • Large Language Models as Urban Residents: An LLM Agent Framework for Personal Mobility Generation (NeurIPS 2024) [Paper] [Code]
  • Shall We Team Up: Spontaneous Cooperation of Competing LLM Agents (EMNLP 2024 Findings) [Paper] [Code]
  • Smart Agent-Based Modeling [Paper] [Slides]

Data Lake Management

Data Lake

Leveraging NLP techniques for table understanding, data cleaning, integration, and augmentation in large data lakes.

  • On the Use of LLMs for Table Tasks (CIKM 2024 Tutorial) [Slides]
  • Jellyfish: A Large Language Model for Data Preprocessing (EMNLP 2024) [Paper] [Models] [Dataset]
  • Large Language Models as Data Preprocessors (TaDA 2024, VLDB Workshop) [Paper]
  • BClean: A Bayesian Data Cleaning System (ICDE 2024) [Paper]
  • DeepJoin: Joinable Table Discovery with Pre-trained Language Models (PVLDB 2023) [Paper] [Code] (implemented by Deng et al. @ BIT)

Similarity Query Processing

Similarity

Efficient methods for high-dimensional ANNS/MIPS, sets, strings, and generic similarity operations.

  • Probabilistic Kernel Function for Fast Angle Testing [Paper]
  • Probabilistic Routing for Graph-based ANNS (ICML 2024) [Paper] [Code]
  • MQH: Locality Sensitive Hashing on Multi-level Quantization Errors for Point-to-Hyperplane Distances (PVLDB 2022) [Paper] [Code]
  • HVS: Hierarchical Graph Structure Based on Voronoi Diagrams for Solving Approximate Nearest Neighbor Search (PVLDB 2021) [Paper] [Code]
  • High-Dimensional Similarity Query Processing for Data Science (KDD 2021 Tutorial) [Slides]

Data Science for Digital Humanities

Digital Humanities

Computational approaches in sociology, law, education, and history through data-centric methods.

  • Legal Fact Prediction: The Missing Piece in Legal Judgment Prediction (EMNLP 2025) [Paper] [Benchmark]
  • Utilization of Information Entropy in Training and Evaluation of Students' Abstraction Performance and Algorithm Efficiency in Programming (ToE) [Paper]

Publications

For a complete and up-to-date list, see my Google Scholar or DBLP entry.

Resources

Source Codes

  • SABM case studies: number-guessing, emergency evacuation, plea bargain, Bertrand competition
  • Shall we team up: Spontaneous cooperation of competing LLM agents
  • LLMob: agentic workflow for personal mobility generation
  • SAS: software for LLM-empowered automated surveys