City Research Online

Econometric Analysis with Compositional and Non-Compositional Covariates

Ben-Gad, M. ORCID: 0000-0001-8641-4199 (2022). Econometric Analysis with Compositional and Non-Compositional Covariates (22/01). London, UK: Department of Economics, City, University of London.


In this paper I consider how best to incorporate compositional data (shares of a whole which can be represented as points on a simplex) together with noncompositional data as covariates in a linear regression. The standard method for incorporating compositional data in regressions is to omit one share to overcome the problem of singularity. I demonstrate that doing so ignores the compositional nature of the data and the resulting models are not objects in a vector space, which in turn reduces their usefulness. In terms of Aitchison geometry - the only geometry that can generate a vector space on a simplex - I show how this method also grossly distorts the relationship between points in the compositional data set. Futhermore, the regression coefficients that result are not permutation invariant, so unless there is an obvious baseline category to be omitted with which the other variables in the composition ought naturally to be compared, this approach gives researchers latitude to choose the permutation of the model that supports a particular hypothesis or appears most convincing in terms of p-values. The alternatives in this paper build on work by Aitchison (1982, 1986) on additive logarithmic ratio (ALR) transformations and Egozcue et al. (2003) on isometric logarithmic ratio (ILR) transformations. Transforming the compositional data using ALRs generates regressions that are permutation invariant and hyperplanes in a vector space. However, ALRs translate the points in the simplex into coordinates relative to an oblique basis, so the angles and distances between the data points remain somewhat distorted|though this distortion is inversely related to the number of shares in the composition. By contrast, ILRs eliminate the distortion by translating the points into coordinates relative to an orthogonal basis. However, the resulting regressions are no longer permutation invariant and are difficult to interpret. To overcome these shortcomings, Hron et al. (2012) suggest using ILRs, but combining the coefficient estimates across all the different permutations to produce one statistical model. I demonstrate that estimating a separate regression for each permutation is unnecessary - estimating either a single regression using ALR coordinates or a constrained regression and then multiplying the resulting regression coefficients and standard errors associated with the compositional variables by a simple factor is sufficient. Though log-ratios incorporate more information about the nature of compositional data as coordinates in a simplex, I demonstrate that it does not exacerbate the inherent multicollinearity present in compositional datasets. Throughout, I use economic growth regressions with compositional data on ten religious categories, similar to Barro and McCleary (2003) and McCleary and Barro (2006), to demonstrate and contrast all these different approaches.

Publication Type: Monograph (Discussion Paper)
Additional Information: Copyright, the authors, 2022.
Publisher Keywords: Compositional Data, Aitchison Geometry, Isometric Logarithmic Ratios, Economic Growth Regressions
Subjects: H Social Sciences > HB Economic Theory
Departments: School of Policy & Global Affairs > Economics
School of Policy & Global Affairs > Economics > Discussion Paper Series
[thumbnail of Dept_Econ_WP2201.pdf]
Text - Published Version
Download (4MB) | Preview


Add to AnyAdd to TwitterAdd to FacebookAdd to LinkedinAdd to PinterestAdd to Email


Downloads per month over past year

View more statistics

Actions (login required)

Admin Login Admin Login