Skip to main content

An AI Qualitative Research Experiment

Page 1

Battle Research Group (BRG) is a nonprofit research organization dedicated to improving collective understanding of modern war. We apply qualitative research methods, actively gathering primary-source information from battlefields, interviewing combat participants, and conducting desk research. We build results into data and evidence, write battle studies and other military analytic reports, brief, lead workshops, and run exercises.

AN AI QUALITATIVE RESEARCH EXPERIMENT My Conversation With ‘Arlo’ About War Casualties

Ben Connable, Executive Director of the Battle Research Group // Published: March 23, 2026 This is a word-for-word transcript of the second part of a twopart, nonscientific experiment I conducted to determine how a large-language model (LLM) might perform in support of qualitative desk research. I undertook this experiment to support the development of the Battle Research Group policy on artificial intelligence. Four important caveats precede the content: 1. Non-expert user. I am not an expert on artificial intelligence or LLMs. This experiment was run from the perspective of an expert researcher and analyzed with the advice of several experts on AI. Findings are rightly subject to critique by technical experts. 2. Nonscientific. This is a nonscientific experiment intended for informational purposes only. It does not produce evidence suitable for generalization or a replicable method, etc. It does, however, provide insight into how a top-tier LLM performs on a qualitative research task when pressed hard by a subject-matter expert. 3. Arlo, not... We have anonymized the LLM used in this experiment so the specific model does not become the focal point for analysis. It is sufficient to note that it is one of the most widely used and highest quality publicly-available models as of early 2026. When asked to come up with three options for a cover name, the model offered Kit, Sage and Arlo; we call it Arlo here. 4. Just for qualitative research on war. This experiment and the BRG AI policy apply only to qualitative research on the complex phenomenon of war. Artificial intelligence has many positive uses, particularly when applied to scientific and technical tasks.

ARLO EXPERIMENT In part 1 of the experiment I engaged Arlo in fast mode and in part 2 – represented here – I engaged the model in professional Battle Research Group Foundation, Inc.

mode, maximizing tokens. This is an ex post, or after-the-fact experiment: Prior to the experiment I had already conducted detailed research on the subject in question and was qualified to judge Arlo’s performance in these tasks. Experiment questions: How would an LLM perform if asked to support professional qualitative research on a highly complex phenomenon like war? In what ways might it prove useful? How might it mislead or err? War Stats and Casualties. In late 2025 I published War Stats Do Not Measure Up: Exploring the Limits of Knowledge with Military Casualty Statistics in Ukraine and Other Wars, and What We Can Do To Manage Uncertainty. This report examines and identifies gaps in general knowledge regarding military casualty data from wars dating back to the U.S. Civil War, and specifically on the causes of wounding in battle. In other words, how do we know which types of weapons cause the most wounds in any and all wars? This question is immediately relevant to the debate over the impact of drones in modern warfare. Part 1 of the Arlo Experiment: Stunted Source Lists and a Misleading Conclusion. In part 1 of the experiment (fast mode), I asked Arlo to act as an expert qualitative researcher specializing in war history. I then asked Arlo to provide detailed source lists on the subject of battle casualties and causes of wounding in war. It presented a compelling but incomplete and misleading summary of a handful of publicly available sources. And it ignored hundreds of other potentially useful sources in multiple languages. When pressed again and again to go deeper it pretended to do so and then effectively lied about its efforts and results. In response to repeated prompts to provide an exhaustive list it wrote: This exhaustive itemization provides every unique source cited in the reports and monographs discussed in our March 23, 2026

1 of 17


Turn static files into dynamic content formats.

Create a flipbook
An AI Qualitative Research Experiment by Battle_Research_Group - Issuu