City Research Online

A modular, open-source information extraction framework for identifying clinical concepts and processes of care in clinical narratives

Gooch, P. (2012). A modular, open-source information extraction framework for identifying clinical concepts and processes of care in clinical narratives. (Unpublished Doctoral thesis, City University London)


In this thesis, a synthesis is presented of the knowledge models required by clinical informa- tion systems that provide decision support for longitudinal processes of care. Qualitative research techniques and thematic analysis are novelly applied to a systematic review of the literature on the challenges in implementing such systems, leading to the development of an original conceptual framework. The thesis demonstrates how these process-oriented systems make use of a knowledge base derived from workflow models and clinical guidelines, and argues that one of the major barriers to implementation is the need to extract explicit and implicit information from diverse resources in order to construct the knowledge base. Moreover, concepts in both the knowledge base and in the electronic health record (EHR) must be mapped to a common ontological model. However, the majority of clinical guideline information remains in text form, and much of the useful clinical information residing in the EHR resides in the free text fields of progress notes and laboratory reports. In this thesis, it is shown how natural language processing and information extraction techniques provide a means to identify and formalise the knowledge components required by the knowledge base. Original contributions are made in the development of lexico-syntactic patterns and the use of external domain knowledge resources to tackle a variety of information extraction tasks in the clinical domain, such as recognition of clinical concepts, events, temporal relations, term disambiguation and abbreviation expansion. Methods are developed for adapting existing tools and resources in the biomedical domain to the processing of clinical texts, and approaches to improving the scalability of these tools are proposed and evalu- ated. These tools and techniques are then combined in the creation of a novel approach to identifying processes of care in the clinical narrative. It is demonstrated that resolution of coreferential and anaphoric relations as narratively and temporally ordered chains provides a means to extract linked narrative events and processes of care from clinical notes. Coreference performance in discharge summaries and progress notes is largely dependent on correct identification of protagonist chains (patient, clinician, family relation), pronominal resolution, and string matching that takes account of experiencer, temporal, spatial, and anatomical context; whereas for laboratory reports additional, external domain knowledge is required. The types of external knowledge and their effects on system performance are identified and evaluated. Results are compared against existing systems for solving these tasks and are found to improve on them, or to approach the performance of recently reported, state-of-the- art systems. Software artefacts developed in this research have been made available as open-source components within the General Architecture for Text Engineering framework.

Publication Type: Thesis (Doctoral)
Additional Information: © 2012 Philip Gooch
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
R Medicine > RA Public aspects of medicine
Departments: Doctoral Theses
School of Science & Technology > School of Science & Technology Doctoral Theses
School of Science & Technology > Computer Science
[thumbnail of philip_gooch_phd_thesis_2012.pdf]
Download (9MB) | Preview


Add to AnyAdd to TwitterAdd to FacebookAdd to LinkedinAdd to PinterestAdd to Email


Downloads per month over past year

View more statistics

Actions (login required)

Admin Login Admin Login