Note: This is output from ChatGPT5 Plus’s Deep Research (light) mode, 25 SEP 2025. References have been checked and some light editing done by me – Janne M. Korhonen. I’m going to return to this topic later, but due to popular demand, chose to share the note as-is.

Researchers have begun using large language models (LLMs) to simulate human decision-making in surveys, experiments, and agent-based models. For example, Stanford HAI built “digital twin” agents of 1,052 people by feeding two-hour interview transcripts into an LLM. These AI agents answered survey questions and played economic games in ways that “mirror” their real-life counterparts [1][2]. In fact, the synthetic agents’ survey responses were 85% as accurate as the real participants’ own answers on a follow-up General Social Survey (GSS) [1]. Similarly, researchers Kim and Lee (2024) finetuned an LLM on decades of GSS data; their model could fill in missing responses and backfill longitudinal surveys with about 78% accuracy [3]. These successes indicate that LLMs can act as proxy respondents: one study describes them as “silicon samples” that complement “human samples” by generating realistic answers and decision patterns when given persona information [4][5].

Researchers have started benchmarking LLMs explicitly as survey simulators. Liu et al. (2025) introduce LLM-S³, a comprehensive evaluation suite for “virtual survey respondents.” They propose two modes: (1) Partial Attribute Simulation (PAS), where the LLM infers missing demographics or answers from a partial profile, and (2) Full Attribute Simulation (FAS), where the LLM generates entire synthetic survey datasets. Tested on 11 real-world datasets (politics, economics, health, etc.), LLMs showed consistent but imperfect performance. Contextual prompts and few-shot examples significantly improve fidelity, but generating fully coherent structured answers remains challenging [6][7]. Another recent preprint (“LLM-Mirror”) tested LLMs at individual-level replication: giving each respondent’s demographics and past answers as a prompt, the LLM generated the rest of their survey responses. This method (sometimes augmented with a synthetic user persona prompt) produced a set of answers whose statistical relationships matched the original data. In their experiment, structural-equation estimates from the LLM-generated survey were nearly identical to those from the real survey [8]. These studies show LLMs can reliably reproduce broad patterns of human responses and even individual answers when properly conditioned.

Industry and design-research practitioners are also using LLMs as surrogate users. In marketing research, the “study boosting” approach transforms a completed survey’s respondents into AI personas. AskRally (2025) describes converting real survey segments into “living AI personas” by copying each segment’s answers into the LLM’s memory[9][10]. These personas carry the actual survey respondents’ voice and preferences so teams can test new scenarios or questions without new data collection. Likewise, tools are emerging to automate LLM-based surveys: for example, an EDSL Python library lets users design experiments and “simulate responses with LLMs” as if they were survey participants [11][12]. In UX research, Nielsen Norman’s recent review highlights three studies using LLM-driven “digital twins” and “synthetic users.” They report that LLM-based twins (especially when built from rich interview transcripts) can accurately infer missing survey answers and predict behaviors across individual and group levels [13][2]. By contrast, simpler synthetic users built only from demographics capture only general trends and tend to underestimate variability in opinions [13][14].

LLM-Generated Personas and Opinion Simulation

Another major use of LLMs is to generate persona profiles or simulate public opinions. Large-scale studies have shown that LLMs can produce synthetic personas that embody demographic and psychographic traits. Ang et al. (2025) note that LLM-generated personas have already been used to conduct surveys, market research, and even “societal-scale simulations” [5]. In their work they generated ~1,000,000 personas and found biases (e.g. a skew toward one political party) that highlight the need for care. In general, when an LLM prompt includes a persona description or user attributes, the model tends to “role-play” that persona, producing responses consistent with the given profile [5][4]. For example, prompting an LLM with a synthetic persona allowed it to predict a person’s survey answers much more accurately than without any persona [8][4].

Studies of “persona prompting” show mixed results. Hu & Collier (2024) found that explicitly adding annotated persona features (age, gender, values, etc.) to LLM prompts can modestly improve its ability to mimic diverse opinions. In their experiments, powerful models (GPT-4-scale) captured ~81% of the variance of human annotators when persona info was given [15][16]. However, they also report that in many tasks those socio-demographic variables explain less than 10% of variation in responses [15]. In practice this means LLM personas can simulate broad stance differences, but may not capture finer-grained individual variation.

Interactive design tools leverage LLM personas for creative purposes. PersonaFlow (Liu et al. 2024) used multiple LLM-simulated expert personas to critique research ideas. In a user study, participants reported that multiple AI personas gave more relevant and creative feedback than a single voice[17]. Similarly, recent work on Human–AI workflows in persona generation shows that LLMs can summarize clustered user data into believable persona profiles, and that allowing human researchers to guide the grouping leads to more representative personas [18]. These examples demonstrate how simulated users can augment design research: an LLM persona can act like a focus-group participant or domain expert.

Agent-Based and Decision-Model Simulation

LLM-driven agents have also been integrated into agent-based models (ABM) and game simulations. Gao et al. (2023) survey this area, noting that LLM agents can “adaptively react” and plan actions like humans, enabling richer simulations than simple rule-based agents [19]. For instance, we have seen LLM agents engaging in social games: a NeurIPS (2024) paper found that GPT-4 agents playing iterative Trust Games behaved very similarly to humans. Specifically, these AI agents “manifest high behavioral alignment” with human trust decisions [20]. That work concludes that LLMs can feasibly model fundamental human behaviors (like trusting others) within economic and social settings.

In summary, academic and industry examples abound of using LLMs to simulate people. They include formal studies (e.g. survey replication, persona generation experiments, game-theoretic simulations) and practical tools (surveys with AI respondents, persona-generators, UX “digital twin” systems). Collectively, these efforts show that LLMs can generate synthetic survey answers, embody user personas, and interact with decision models as stand-ins for humans [1][5]. However, researchers also warn that without careful calibration this can introduce biases [5][21]. As this area matures, best practices (combining real data with AI prompts, extensive validation, etc.) are being developed to ensure reliability.

References

[1] Park, J. S., Zou, C. Q., Shaw, A., Hill, B. M., Cai, C. J., Morris, M. R., Willer, R., Liang, P., & Bernstein, M. S. (n.d.). Simulating Human Behavior with AI Agents [Policy brief]. Retrieved 25 September 2025, from https://hai.stanford.edu/assets/files/hai-policy-brief-simulating-human-behavior-with-ai-agents.pdf

[2] [3] [13] [14] Evaluating AI-Simulated Behavior: Insights from Three Studies on Digital Twins and Synthetic Users – NN/G https://www.nngroup.com/articles/ai-simulations-studies/

[4] [5] [21] Li, A., Chen, H., Namkoong, H., & Peng, T. (2025). LLM Generated Persona is a Promise with a Catch (No. arXiv:2503.16527). arXiv. https://doi.org/10.48550/arXiv.2503.16527

[6] [7] Zhao, J., Yuan, C., Luo, W., Xie, H., Zhang, G., Quan, S. J., Yuan, Z., Wang, P., & Zhang, D. (2025). Large Language Models as Virtual Survey Respondents: Evaluating Sociodemographic Response Generation (No. arXiv:2509.06337). arXiv. https://doi.org/10.48550/arXiv.2509.06337

[8] Kim, S., Jeong, J., Han, J. S., & Shin, D. (2024). LLM-Mirror: A Generated-Persona Approach for Survey Pre-Testing (No. arXiv:2412.03162). arXiv. https://doi.org/10.48550/arXiv.2412.03162

[9] [10] Study Boosting: Using Real Survey Data To Create Authentic AI Personas for Extended Research | Ask Rally https://askrally.com/article/study-boosting-with-ai-personas

[11] [12] Create and analyze LLM-based surveys using EDSL (1) | by FS Ndzomga https://medium.com/thoughts-on-machine-learning/create-and-analyze-llm-based-surveys-using-edsl-1-f50ede7ecaf0

[15] [16] Hu, T., & Collier, N. (2024). Quantifying the Persona Effect in LLM Simulations (No. arXiv:2402.10811). https://arxiv.org/html/2402.10811v2

[17] Liu, Y., Sharma, P., Oswal, M. J., Xia, H., & Huang, Y. (2025). PersonaFlow: Designing LLM-Simulated Expert Perspectives for Enhanced Research Ideation. Proceedings of the 2025 ACM Designing Interactive Systems Conference, 506–534. https://arxiv.org/html/2409.12538v1

[18] Understanding Human–AI Workflows for Generating Personas https://joongishin.github.io/perGenWorkflow/material/persona-generation-workflow.pdf

[19] Gao, C., Lan, X., Li, N., Yuan, Y., Ding, J., Zhou, Z., Xu, F., & Li, Y. (2023). Large Language Models Empowered Agent-based Modeling and Simulation: A Survey and Perspectives https://arxiv.org/html/2312.11970v1

[20] Xie, C., Chen, C., Jia, F., Ye, Z., Lai, S., Shu, K., Gu, J., Bibi, A., Hu, Z., Jurgens, D., Evans, J., Torr, P. H. S., Ghanem, B., & Li, G. (2024). Can Large Language Model Agents Simulate Human Trust Behavior? In A. Globerson, L. Mackey, D. Belgrave, A. Fan, U. Paquet, J. Tomczak, & C. Zhang (Eds.), Advances in Neural Information Processing Systems 37 (NeurIPS 2024). NeurIPS. https://proceedings.neurips.cc/paper_files/paper/2024/file/1cb57fcf7ff3f6d37eebae5becc9ea6d-Paper-Conference.pdf

Leave a Reply

More posts

When the umbrella starts to leak: Securing NATO’s Eastern Flank after the new U.S. security strategy

The new U.S. National Security Strategy shifts the United States to an unreliable ally at best – an active threat at worst. Europe and in particular NB8 countries on NATO’s Eastern Flank must draw their own conclusions, and prepare for a new world – by an independent deterrent if necessary.

Keep reading

Learning to live with weakly godlike tools

Technology has already given us weakly godlike powers. As our tools become even more powerful and accessible, the defining challenge of this century is setting the limits to their use — and to individual power.

Keep reading

Why Sustainable Civilizations Must Be Democratic

Every civilization faces a choice: share power or run out of planet. Environmental limits cannot be enforced by decree or managed by elites; they can only be sustained through fairness, accountability, and democratic control.

Keep reading

Something went wrong. Please refresh the page and/or try again.

Discover more from The unpublished notebooks of J. M. Korhonen

Subscribe now to keep reading and get access to the full archive.

Continue reading