Dictionary-based methods for information extraction
Baronchelli, A., Caglioti, E., Loreto, V. & Pizzi, E. (2004). Dictionary-based methods for information extraction. Physica A: Statistical Mechanics and its Applications, 342(1-2), pp. 294-300. doi: 10.1016/j.physa.2004.01.072
Abstract
In this paper, we present a general method for information extraction that exploits the features of data compression techniques. We first define and focus our attention on the so-called dictionary of a sequence. Dictionaries are intrinsically interesting and a study of their features can be of great usefulness to investigate the properties of the sequences they have been extracted from e.g. DNA strings. We then describe a procedure of string comparison between dictionary-created sequences (or artificial texts) that gives very good results in several contexts. We finally present some results on self-consistent classification problems.
Publication Type: | Article |
---|---|
Subjects: | Q Science > QC Physics Z Bibliography. Library Science. Information Resources > Z665 Library Science. Information Science |
Departments: | School of Science & Technology > Mathematics |
SWORD Depositor: |
Download (189kB) | Preview
Export
Downloads
Downloads per month over past year