Automatic language ability assessment method based on natural language processing

Nnamoko, N.; Karaminis, T.; Procter, J.; Barrowclough, J.; Korkontzelos, I.

Automatic language ability assessment method based on natural language processing

Nnamoko, N. ORCID: 0000-0002-5064-2621, Karaminis, T. ORCID: 0000-0003-2977-5451, Procter, J. , Barrowclough, J. ORCID: 0000-0003-0902-5098 & Korkontzelos, I. ORCID: 0000-0001-8052-2471 (2024). Automatic language ability assessment method based on natural language processing. Natural Language Processing Journal, 8, article number 100094. doi: 10.1016/j.nlp.2024.100094

Abstract

Background and Objectives:
The Wechsler Abbreviated Scales of Intelligence second edition (WASI-II) is a standardised assessment tool that is widely used to assess cognitive ability in clinical, research, and educational settings. In one of the components of this assessment, referred to as the Vocabulary task, the assessed individuals are presented with words (called stimulus items), and asked to explain what each word mean. Their responses are hand-scored based on a list of pre-rated sample responses [0-Point (poor), 1-Point (moderate), or 2-Point (excellent)] that is provided in the accompanying manual of WASI-II. This scoring method is time-consuming, and scoring of responses that do not fully match the pre-rated ones may vary between individual scorers. In this study, we aim to use natural language processing techniques to automate the scoring procedure and make it more time-efficient and reliable (objective).

Methods:
Utilising five different word embeddings (Word2vec, Global Vectors, Bidirectional Encoder Representations from Transformers, Generative Pre-trained Transformer 2, and Embeddings from Language Model), we transformed stimulus items and pre-rated responses from the WASI-II Vocabulary task into machine-readable vectors. We measured distance with cosine similarity, evaluating each model against a rational-expectations hypothesis that vector representations for stimuli should align closely with 2-Point responses and diverge from 0-Point responses. Assessment involved frequency of consistent representation and the Pearson correlation coefficient, examining overall consistency with the manual’s ranking across all items and sample responses.

Results:
The Word2vec model showed the highest consistency with the WASI-II manual (frequency = 20 out of 27; Pearson Correlation coefficient = 0.61) while Bidirectional Encoder Representations from Transformers was the worst performing model (frequency = 5; Pearson Correlation coefficient = 0.05). The consistency of these two models with the WASI-II manual differed significantly, Z = 2.282, p = 0.022.

Conclusions:
Our results showed that the scoring of the WASI-II Vocabulary task can be automated with moderate accuracy relying upon off-the-shelf embedding models. These results are promising, and could be improved further by considering alternative vector dimensions, similarity metrics, and data preprocessing techniques to those used in this study.

Publication Type:	Article
Publisher Keywords:	Cognitive assessment, Natural Language Processing, Language ability test, Cosine similarity, WASI-II, Word embedding
Subjects:	B Philosophy. Psychology. Religion > BF Psychology Q Science > QA Mathematics > QA75 Electronic computers. Computer science R Medicine > RC Internal medicine > RC0321 Neuroscience. Biological psychiatry. Neuropsychiatry
Departments:	School of Health & Medical Sciences School of Health & Medical Sciences > Department of Psychology & Neuroscience
SWORD Depositor:	Symplectic Administrator

[thumbnail of 1-s2.0-S2949719124000426-main.pdf]

Preview

Text - Published Version
Available under License Creative Commons: Attribution International Public License 4.0.
Download (741kB) | Preview

Supplementary Materials:

Graphical Abstract - https://ars.els-cdn.com/content/image/1-...

Official URL: https://doi.org/10.1016/j.nlp.2024.100094

Export

Downloads

Downloads per month over past year

View more statistics

Metadata

Altmetric

Funder Information

CORE (COnnecting REpositories)

Actions (login required)

Admin Login

Creators:	Nnamoko, N. ORCID: 0000-0002-5064-2621 Karaminis, T. ORCID: 0000-0003-2977-5451 Procter, J. Barrowclough, J. ORCID: 0000-0003-0902-5098 Korkontzelos, I. ORCID: 0000-0001-8052-2471
Status:	Published
Refereed:	Yes
Journal or Publication Title:	Natural Language Processing Journal
Publisher:	Elsevier BV
ISSN:	2949-7191
URI:	https://openaccess.city.ac.uk/id/eprint/33582
Date available in CRO:	02 Sep 2024 08:02
Date deposited:	29 August 2024
Dates:	Date Event 30 September 2024 Published 6 August 2024 Published Online 25 July 2024 Accepted