Correlated Community Estimation Models Over a Set of Names

Veluru, S., Rahulamathavan, Y., Manandhar, S. & Rajarajan, M. (2014). Correlated Community Estimation Models Over a Set of Names. Paper presented at the IEEE Technically Co-Sponsored Science and Information Conference, 27-08-2014 - 29-08-2014, London, UK.

[img]
Preview
PDF - Accepted Version
Available under License : See the attached licence file.

Download (1MB) | Preview

Abstract

Generally surnames (family name) or forenames are evolved over generations which can be used to understand population origins, migration, identity, social norms and cultural customs. These forenames or surnames may have hidden structure associated with them called communities. Each community might have strong correlation among several forenames and surnames. In addition, the correlation might be across communities of forenames or surnames. Popular statistical generative model such as Latent Dirichlet Allocation (LDA) has been developed to find topics in a corpus of documents. However, the LDA model can be proposed to identify hidden communities in names data set. This paper proposes several variants of latent Dirichlet allocation models to capture correlation between surnames and forenames within the communities and across the communities over a set of names collected at different locations. Initially, we propose surname correlated LDA model and forename correlated LDA model. These models identify communities in surnames or forenames and extract corresponding correlated forenames or surnames in each community respectively. Later, we propose surname community correlated LDA model and forename community correlated LDA model. These models estimate correlation among each surname community to the communities of forenames and vice versa respectively. We experiment for India and United Kingdom names data sets and conclusions are drawn.

Item Type: Conference or Workshop Item (Paper)
Additional Information: © 2014 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
Uncontrolled Keywords: Latent Dirichlet Allocation, Communities, Probabilistic Generative Models, Bayesian Statistics, Correlation
Subjects: G Geography. Anthropology. Recreation > GN Anthropology
Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Divisions: School of Engineering & Mathematical Sciences > Engineering
URI: http://openaccess.city.ac.uk/id/eprint/4477

Actions (login required)

View Item View Item

Downloads

Downloads per month over past year

View more statistics