City Research Online

AI-assisted teams outperform AI-led teams but not human-only teams in assessing research reproducibility in quantitative social science

Brodeur, A. ORCID: 0000-0003-3980-4324, Valenta, D. ORCID: 0009-0003-7179-4792, Marcoci, A. ORCID: 0000-0002-5780-0805 , Aparicio, J. P. ORCID: 0000-0001-5887-2440, Mikola, D., Barbarioli, B. ORCID: 0000-0001-7438-5634, Alexander, R. ORCID: 0000-0003-1279-0700, Deer, L. ORCID: 0000-0001-8646-6095, Stafford, T. ORCID: 0000-0002-8089-9479, Vilhuber, L. ORCID: 0000-0001-5733-8932, Bensch, G. ORCID: 0000-0001-7964-7533, Motoki, F. ORCID: 0000-0001-7464-3330, Abdelhady, M. ORCID: 0009-0006-4079-6885, Abdelmoula, Y., Baki, G. A. ORCID: 0000-0002-6791-8655, Aguirre, T., Aiyer, S., Akhtar, S. ORCID: 0000-0001-7432-9536, Akhtar, F. ORCID: 0000-0002-2254-7117, Albada, M. R. ORCID: 0009-0009-6469-5837, Altman, M., Angenendt, D. ORCID: 0000-0001-9583-6289, Arjmandi Lari, Z. ORCID: 0000-0001-9212-5959, De León Tejada, J. A., Arana, D. R. ORCID: 0009-0008-8901-5167, Asanov, I. ORCID: 0000-0002-8091-4130, Noha, A-M. ORCID: 0000-0003-3080-4213, Ashong, R. ORCID: 0009-0005-3637-4869, Auer, T., Bahamonde-Birke, F. J., Baker, B. J. ORCID: 0000-0002-1697-4198, Bartram, S. M., Bao, D., Batinovic, L., Batistoni, T. ORCID: 0009-0008-7375-0191, Beeder, M. ORCID: 0000-0001-9661-4936, Beland, L-P. ORCID: 0000-0001-7458-4641, Gero Bienz, C. ORCID: 0009-0000-3948-4995, Aryanto, C. B. ORCID: 0000-0001-6347-2505, Bolibaugh, C. ORCID: 0000-0001-7500-264X, Bonander, C. ORCID: 0000-0002-1189-9950, Bravo, R. ORCID: 0009-0000-7399-1448, Bronnikov, E., Bruns, S., Buliskeria, N. ORCID: 0000-0001-6708-8519, Caicedo-Silva, S., Calef, A. ORCID: 0000-0002-7045-2622, Sebastian Cano Arias, J., A. Castillo Alvarez, G., Caulker, S., Cepenas, S. ORCID: 0000-0002-5187-2461, Chatton, A. ORCID: 0000-0002-0018-5899, Chen, Z. ORCID: 0000-0003-1275-6860, Chioma Ewurum, N., Ciocîrlan, A-B. ORCID: 0009-0002-7409-0574, Clouth, F. J., Collins, J. ORCID: 0000-0003-3073-6070, Cook, N., Cornejo, C. ORCID: 0009-0006-1042-8838, Craveiro, J., Créchet, J., Cui, J. ORCID: 0009-0003-6350-2913, Chalil Vayalabron, N., Czymara, C. ORCID: 0000-0002-9535-3559, Bermúdez Jaramillo, C. D., Datta, H., Denoo, L. ORCID: 0000-0003-0044-4077, Dhaliwal, A., Dhameja, N. ORCID: 0009-0009-4585-3107, Djemai, E., Dujeancourt, E. ORCID: 0000-0001-5872-7630, Dündar, U. ORCID: 0000-0001-9246-1717, Duprey, T. ORCID: 0000-0002-0864-8149, Eissa, Y. ORCID: 0000-0001-8539-7956, El Fassi, Y., El Fassi, I. ORCID: 0009-0000-5883-9667, Ellis, K., Elminejad, A. ORCID: 0000-0003-0829-5184, Elsherif, M., Emirmahmutoglu, A., Etingin-Frati, G. ORCID: 0009-0003-8201-9169, Eze, E., Dollbaum, J. F. ORCID: 0000-0003-4516-1273, Feld, J. ORCID: 0000-0003-3816-6607, Felipe Rengifo Jaramillo, A. ORCID: 0009-0002-8355-3913, Fenig, G., Fernandes, V., Fiala, L. ORCID: 0000-0003-0216-7050, Fink, L. ORCID: 0009-0002-8786-0988, Firouzjaeiangalougah, M. ORCID: 0009-0002-9989-7884, Fish, S., Fitzgerald, J., Forshaw, R. ORCID: 0000-0001-6396-485X, Fortier-Chouinard, A. ORCID: 0000-0002-6282-9709, Fréget, L. ORCID: 0000-0002-4216-5781, Frese, J. ORCID: 0000-0002-5871-997X, Gabani, J. ORCID: 0000-0001-7461-7300, Gallegos, S. ORCID: 0000-0003-0437-3982, Gamill, M. C. ORCID: 0009-0007-3250-5299, Gáspár, A. ORCID: 0000-0001-7247-6997, Gauriot, R. ORCID: 0000-0002-7633-7086, Gavrilova, E. ORCID: 0000-0002-9349-5422, Geraldes, D. ORCID: 0000-0002-5211-673X, Cantone, G. G. ORCID: 0000-0001-7149-5213, Gibson, G. ORCID: 0000-0003-0750-3850, Goldschmitt, D. ORCID: 0000-0002-6754-9749, Gourdon-Kanhukamwe, A., Gregor de Varda, A., Grigoryeva, I. ORCID: 0000-0003-3657-0526, Gugushvili, A. ORCID: 0000-0002-3933-9111, Fletcher, A. H. A., Habermann, F. ORCID: 0000-0003-1982-6796, Hablicsek, M., Haddad, J. ORCID: 0000-0002-9829-8815, Hall, J. D., Hammar, O. ORCID: 0000-0002-2341-3751, Hassouneh, M., Hausladen, C. I. ORCID: 0000-0003-4397-2339, Hendrikse, S. C. F., Hepplewhite, M. ORCID: 0000-0003-0436-578X, Ho, A. T. Y. ORCID: 0000-0002-1772-2320, Hogan-Hennessy, S. ORCID: 0000-0003-4238-2396, Howley, E. ORCID: 0000-0002-3868-2516, Huang, G. ORCID: 0000-0001-6542-1656, Hulstaert, H. ORCID: 0009-0008-3055-5070, Ilchovska, Z. G., Jaimes Santamaria, P. ORCID: 0009-0005-4436-4740, Jakobsson, N. ORCID: 0000-0002-7143-8793, Jansson, J. ORCID: 0000-0003-1942-552X, Jarosz, E., Jebeli, H. ORCID: 0009-0006-9232-3158, Jiang, Y. ORCID: 0009-0000-4552-7621, Junaid, H. ORCID: 0009-0001-8930-2093, Kalluraya, R., Karim, S., Kelly, E., Kimel, E. ORCID: 0000-0002-2437-3245, Kingsuwankul, S., Klotzbücher, V. ORCID: 0000-0001-9382-6757, Krähmer, D. ORCID: 0000-0002-4100-5372, Krūminas, P. ORCID: 0000-0001-5599-0900, Kruus, N. ORCID: 0009-0006-1435-9331, Kujansuu, E., Kurz, C. F., Küster, S. ORCID: 0000-0002-5858-2221, Lee-Whiting, B. ORCID: 0000-0001-7625-3109, Lewandowski, F. ORCID: 0000-0001-6525-9791, Li, T., Li, R., Liu, D. ORCID: 0009-0009-4999-6383, Liu, J. ORCID: 0000-0001-5182-9079, Lo, H. ORCID: 0000-0003-1702-3711, Loter, K., Macedo Dias, F., Madan, C. R. ORCID: 0000-0003-3228-6501, Mäder, N. ORCID: 0000-0002-2488-8732, Mandas, M. ORCID: 0009-0007-1944-3919, Mantilla, C., Marcus, J. ORCID: 0000-0001-9407-6660, Marino Fages, D. ORCID: 0000-0002-6233-2292, Martin, X., McWay, R. ORCID: 0000-0003-4492-7804, Medina-Gaspar, D. ORCID: 0009-0004-4958-1472, Meng, S., Meng, L. ORCID: 0009-0000-8112-9110, Merz, S. ORCID: 0009-0001-9564-556X, Miller, A. P. ORCID: 0000-0003-3535-2578, Mirabel, T. ORCID: 0000-0003-3716-2591, Mishra, D. D., Mishra, S. ORCID: 0000-0003-2287-1176, Moges, B. W. ORCID: 0000-0002-2235-7321, Mohandes Mojarrad, M., Mohnen, M., Morin, L-P. ORCID: 0000-0002-9204-1277, Muehlenbachs, L., Mullin, G. ORCID: 0009-0008-9009-0754, Musulan, A. ORCID: 0009-0002-2964-6414, Muzzì, S., Myers, J. A. C., Neubauer, F. ORCID: 0000-0001-7134-3439, Nguyen, T. ORCID: 0000-0003-3495-178X, Niazi, A., Nordstrom, A., Nowak, B. ORCID: 0000-0002-2949-5452, O’Habib, D., Ölkers, T. ORCID: 0000-0002-8399-3962, Ong, J., Orozco Castiblanco, V., Özak, Ö. ORCID: 0000-0001-6421-2801, Ozkes, A. I. ORCID: 0000-0002-8720-2494, Paaso, M., Pandey, S. ORCID: 0000-0002-5591-8551, Papazoglou, V. ORCID: 0009-0006-6162-4644, Penheiro, R. ORCID: 0000-0001-5267-9121, Pham, L., Phieler, U. ORCID: 0009-0002-6782-148X, Pütz, P. ORCID: 0000-0002-2948-0189, Qi, Q. ORCID: 0000-0003-4392-1136, Qiu, J. ORCID: 0000-0002-9919-6079, Rein, M. T. ORCID: 0000-0002-7185-7998, Reinstein, D. A., Repo, J. ORCID: 0000-0001-8756-6936, Rudolf, N., Saha, S., Saka, O. ORCID: 0000-0002-1822-1309, Saponaro, C. ORCID: 0009-0009-6135-5001, Sator, G., Schoenmakers, M. ORCID: 0000-0003-3338-3565, Seri, R. ORCID: 0000-0003-1646-3547, Shah, M., Sibille, P., Siemroth, C., Skavysh, V. ORCID: 0000-0002-1856-2963, Slater, B., Song, W., Staubli, S., Steindl, T. ORCID: 0000-0003-0767-6970, Waongo, N. S., Stott, P. ORCID: 0000-0003-0023-984X, Strobel, S., Sudhaharan, R., Sun, P. ORCID: 0009-0009-9935-8814, Swain, S. D. ORCID: 0000-0001-5967-2109, Talavera, O., Tantiangco, H. M. ORCID: 0000-0002-8699-6579, Tarasenko, G. ORCID: 0000-0002-5677-6786, Tarlinton, B. ORCID: 0000-0002-4146-7083, Tarraf, M., Teoh, K., Thériault, R. ORCID: 0000-0003-4315-6788, Thompson, B., Tian, T., Tian, W., Tolani, E. ORCID: 0000-0001-6084-1192, Borgen, N. ORCID: 0000-0002-7638-3293, Topstad Borgen, S. ORCID: 0000-0002-4852-3332, Torralba, J., Velez-Ospina, C., Mak, M. W. ORCID: 0009-0004-3340-2871, Wallrich, L., Wang, Z., Ward, L., Webb, M. D. ORCID: 0000-0003-4682-2862, Webb, D., Weber, B. S. ORCID: 0000-0003-1806-4451, Weber, C. ORCID: 0000-0002-7998-3702, Weng, W-C. ORCID: 0009-0006-4780-4737, Westheide, C., Wilkinson, T. ORCID: 0009-0001-0857-7029, Wong, K-Y. ORCID: 0009-0008-9048-1659, Wroński, M. ORCID: 0000-0002-3146-601X, Wu, Z., Wu, Q. ORCID: 0009-0004-9728-9361, Wu, V. Y. ORCID: 0000-0003-4548-9826, Xiao, B. ORCID: 0009-0006-1951-7458, Xu, F. ORCID: 0000-0002-6211-7234, Xu, C., Yadav, P. ORCID: 0009-0007-3439-3075, Yang Chou, Y. ORCID: 0009-0002-3988-0917, Yap, L., Yazbeck, M. ORCID: 0000-0002-4378-6117, Yao, B., Zagrodzka, Z. ORCID: 0000-0002-8640-414X, Zahra, T. ORCID: 0009-0007-1288-5378, Zaneva, M., Zhang, X., Zhao, Z., Zhong, H., Zirgulis, A., Zou, J., Zoutman, F. & Zozoungbo, C. (2026). AI-assisted teams outperform AI-led teams but not human-only teams in assessing research reproducibility in quantitative social science. Proceedings of the National Academy of Sciences, 123(22), article number e2524747123. doi: 10.1073/pnas.2524747123

Abstract

Large Language Models (LLMs) such as ChatGPT are transforming how scientists conduct and validate research, offering promise as tools to improve scientific reproducibility. However, computational reproducibility and error detection remain expensive and labor-intensive. We experimentally test how collaboration between researchers and LLM assistants influences the reproduction of quantitative social science findings across different levels of AI autonomy. We randomly assigned 288 researchers to 103 teams working under three conditions: human-only, AI-assisted (using ChatGPT as a collaborative tool), or AI-led (ChatGPT operating with minimal human oversight). Teams reproduced published results from leading social science journals, detected coding errors, and proposed robustness checks. Human-only and AI-assisted teams achieved comparable reproduction rates (94% vs. 91%) and performed similarly on most outcomes, except human-only teams identified significantly more major coding errors. Both substantially outperformed AI-led teams, which achieved only a 37% reproduction rate, detected fewer errors across all categories, proposed weaker robustness checks, and required more time. This autonomous approach, however, likely represents only a lower bound of AI capabilities. Despite rapid model advances, expert human judgment currently remains indispensable for reliable empirical verification. While AI assistance did not degrade most outcomes, it provided no measurable advantages and was associated with reduced detection of major errors. However, the 37% autonomous reproduction rate indicates that AI could provide value in settings where scale or cost constraints preclude human review of papers, even though general-purpose LLMs offer no immediate advantages for human-supervised verification.

Publication Type: Article
Publisher Keywords: Humans, Reproducibility of Results, Cooperative Behavior, Social Sciences, Artificial Intelligence, Generative Artificial Intelligence, Large Language Models, Intelligent Systems, AI, large language models, reproducibility, Humans, Reproducibility of Results, Large Language Models, Social Sciences, Generative Artificial Intelligence, Artificial Intelligence, Intelligent Systems, Cooperative Behavior
Subjects: H Social Sciences > HD Industries. Land use. Labor > HD28 Management. Industrial Management
H Social Sciences > HM Sociology
H Social Sciences > HN Social history and conditions. Social problems. Social reform
Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Departments: School of Policy & Global Affairs
School of Policy & Global Affairs > Department of Economics
Related URLs:
SWORD Depositor:
[thumbnail of PNAS_main_manuscript (002).pdf]
Preview
Text - Accepted Version
Download (336kB) | Preview

Export

Add to AnyAdd to TwitterAdd to FacebookAdd to LinkedinAdd to PinterestAdd to Email

Downloads

Downloads per month over past year

View more statistics

Actions (login required)

Admin Login Admin Login