Dictionary-based methods for information extraction

Baronchelli, A., Caglioti, E., Loreto, V. & Pizzi, E. (2004). Dictionary-based methods for information extraction. Physica A: Statistical Mechanics and its Applications, 342(1-2), pp. 294-300. doi: 10.1016/j.physa.2004.01.072

[img]
Preview
PDF
Download (189kB) | Preview

Abstract

In this paper, we present a general method for information extraction that exploits the features of data compression techniques. We first define and focus our attention on the so-called dictionary of a sequence. Dictionaries are intrinsically interesting and a study of their features can be of great usefulness to investigate the properties of the sequences they have been extracted from e.g. DNA strings. We then describe a procedure of string comparison between dictionary-created sequences (or artificial texts) that gives very good results in several contexts. We finally present some results on self-consistent classification problems.

Item Type: Article
Subjects: Q Science > QC Physics
Z Bibliography. Library Science. Information Resources > Z665 Library Science. Information Science
Divisions: School of Engineering & Mathematical Sciences > Department of Mathematical Science
URI: http://openaccess.city.ac.uk/id/eprint/2652

Actions (login required)

View Item View Item

Downloads

Downloads per month over past year

View more statistics