City Research Online

Early symptoms and sensations as predictors of lung cancer: a machine learning multivariate model.

Levitsky, A., Pernemalm, M., Bernhardson, B-M. , Forshed, J., Kölbeck, K., Olin, M., Henriksson, R., Lehtiö, J., Tishelman, C. & Eriksson, L. E. ORCID: 0000-0001-5121-5325 (2019). Early symptoms and sensations as predictors of lung cancer: a machine learning multivariate model.. Scientific Reports, 9(1), article number 16504. doi: 10.1038/s41598-019-52915-x


The aim of this study was to identify a combination of early predictive symptoms/sensations attributable to primary lung cancer (LC). An interactive e-questionnaire comprised of pre-diagnostic descriptors of first symptoms/sensations was administered to patients referred for suspected LC. Respondents were included in the present analysis only if they later received a primary LC diagnosis or had no cancer; and inclusion of each descriptor required ≥4 observations. Fully-completed data from 506/670 individuals later diagnosed with primary LC (n = 311) or no cancer (n = 195) were modelled with orthogonal projections to latent structures (OPLS). After analysing 145/285 descriptors, meeting inclusion criteria, through randomised seven-fold cross-validation (six-fold training set: n = 433; test set: n = 73), 63 provided best LC prediction. The most-significant LC-positive descriptors included a cough that varied over the day, back pain/aches/discomfort, early satiety, appetite loss, and having less strength. Upon combining the descriptors with the background variables current smoking, a cold/flu or pneumonia within the past two years, female sex, older age, a history of COPD (positive LC-association); antibiotics within the past two years, and a history of pneumonia (negative LC-association); the resulting 70-variable model had accurate cross-validated test set performance: area under the ROC curve = 0.767 (descriptors only: 0.736/background predictors only: 0.652), sensitivity = 84.8% (73.9/76.1%, respectively), specificity = 55.6% (66.7/51.9%, respectively). In conclusion, accurate prediction of LC was found through 63 early symptoms/sensations and seven background factors. Further research and precision in this model may lead to a tool for referral and LC diagnostic decision-making.

Publication Type: Article
Additional Information: © The Author(s) 2019. This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Cre-ative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not per-mitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit
Subjects: R Medicine > RC Internal medicine > RC0254 Neoplasms. Tumors. Oncology (including Cancer)
Departments: School of Health & Psychological Sciences > Nursing
Related URLs:
SWORD Depositor:
[thumbnail of s41598-019-52915-x.pdf]
Text - Accepted Version
Available under License Creative Commons Attribution.

Download (1MB) | Preview
[thumbnail of Supplementary Information]
Text (Supplementary Information) - Accepted Version
Available under License Creative Commons Attribution.

Download (1MB) | Preview


Add to AnyAdd to TwitterAdd to FacebookAdd to LinkedinAdd to PinterestAdd to Email


Downloads per month over past year

View more statistics

Actions (login required)

Admin Login Admin Login