City Research Online

Dictionary-based methods for information extraction

Baronchelli, A., Caglioti, E., Loreto, V. & Pizzi, E. (2004). Dictionary-based methods for information extraction. Physica A: Statistical Mechanics and its Applications, 342(1-2), pp. 294-300. doi: 10.1016/j.physa.2004.01.072

Abstract

In this paper, we present a general method for information extraction that exploits the features of data compression techniques. We first define and focus our attention on the so-called dictionary of a sequence. Dictionaries are intrinsically interesting and a study of their features can be of great usefulness to investigate the properties of the sequences they have been extracted from e.g. DNA strings. We then describe a procedure of string comparison between dictionary-created sequences (or artificial texts) that gives very good results in several contexts. We finally present some results on self-consistent classification problems.

Publication Type: Article
Subjects: Q Science > QC Physics
Z Bibliography. Library Science. Information Resources > Z665 Library Science. Information Science
Departments: School of Science & Technology > Mathematics
SWORD Depositor:
[thumbnail of Dictionary based methods for information extraction.pdf]
Preview
PDF
Download (189kB) | Preview

Export

Add to AnyAdd to TwitterAdd to FacebookAdd to LinkedinAdd to PinterestAdd to Email

Downloads

Downloads per month over past year

View more statistics

Actions (login required)

Admin Login Admin Login