City Research Online - Artificial sequences and complexity measures

Artificial sequences and complexity measures

Baronchelli, A., Caglioti, E. & Loreto, V. (2005). Artificial sequences and complexity measures. Journal of Statistical Mechanics: Theory and Experiment new, 2005(04), article number P04002. doi: 10.1088/1742-5468/2005/04/p04002

Abstract

In this paper we exploit concepts of information theory to address the fundamental problem of identifying and defining the most suitable tools for extracting, in a automatic and agnostic way, information from a generic string of characters. We introduce in particular a class of methods which use in a crucial way data compression techniques in order to define a measure of remoteness and distance between pairs of sequences of characters (e.g. texts) based on their relative information content. We also discuss in detail how specific features of data compression techniques could be used to introduce the notion of dictionary of a given sequence and of artificial text and we show how these new tools can be used for information extraction purposes. We point out the versatility and generality of our method that applies to any kind of corpora of character strings independently of the type of coding behind them. We consider as a case study linguistic motivated problems and we present results for automatic language recognition, authorship attribution and self-consistent classification.

Publication Type:	Article
Subjects:	Z Bibliography. Library Science. Information Resources > Z665 Library Science. Information Science
Departments:	School of Science & Technology > Mathematics
SWORD Depositor:	Symplectic Administrator

[thumbnail of Artificial Sequences and Complexity Measures.pdf]

Preview

PDF
Download (402kB) | Preview

Official URL: https://doi.org/10.1088/1742-5468/2005/04/p04002

Export

Downloads

Downloads per month over past year

View more statistics

Metadata

Altmetric

CORE (COnnecting REpositories)

Actions (login required)

Admin Login

Creators:	Baronchelli, A. Caglioti, E. Loreto, V.
Status:	Published
Refereed:	Yes
Journal or Publication Title:	Journal of Statistical Mechanics: Theory and Experiment new
Publisher:	IOP Publishing
ISSN:	1742-5468
e-ISSN:	1742-5468
URI:	https://openaccess.city.ac.uk/id/eprint/2643
Date available in CRO:	16 Sep 2013 13:15
Dates:	Date Event 2005 Published