City Research Online

Econometric Analysis with Compositional and Non-Compositional Covariates

(2022). Econometric Analysis with Compositional and Non-Compositional Covariates (22/01). London, UK: Department of Economics, City, University of London.

Abstract

In this paper I consider how best to incorporate compositional data (shares of a whole which can be represented as points on a simplex) together with noncompositional data as covariates in a linear regression. The standard method for incorporating compositional data in regressions is to omit one share to overcome the problem of singularity. I demonstrate that doing so ignores the compositional nature of the data and the resulting models are not objects in a vector space, which in turn reduces their usefulness. In terms of Aitchison geometry - the only geometry that can generate a vector space on a simplex - I show how this method also grossly distorts the relationship between points in the compositional data set. Futhermore, the regression coefficients that result are not permutation invariant, so unless there is an obvious baseline category to be omitted with which the other variables in the composition ought naturally to be compared, this approach gives researchers latitude to choose the permutation of the model that supports a particular hypothesis or appears most convincing in terms of p-values. The alternatives in this paper build on work by Aitchison (1982, 1986) on additive logarithmic ratio (ALR) transformations and Egozcue et al. (2003) on isometric logarithmic ratio (ILR) transformations. Transforming the compositional data using ALRs generates regressions that are permutation invariant and hyperplanes in a vector space. However, ALRs translate the points in the simplex into coordinates relative to an oblique basis, so the angles and distances between the data points remain somewhat distorted|though this distortion is inversely related to the number of shares in the composition. By contrast, ILRs eliminate the distortion by translating the points into coordinates relative to an orthogonal basis. However, the resulting regressions are no longer permutation invariant and are difficult to interpret. To overcome these shortcomings, Hron et al. (2012) suggest using ILRs, but combining the coefficient estimates across all the different permutations to produce one statistical model. I demonstrate that estimating a separate regression for each permutation is unnecessary - estimating either a single regression using ALR coordinates or a constrained regression and then multiplying the resulting regression coefficients and standard errors associated with the compositional variables by a simple factor is sufficient. Though log-ratios incorporate more information about the nature of compositional data as coordinates in a simplex, I demonstrate that it does not exacerbate the inherent multicollinearity present in compositional datasets. Throughout, I use economic growth regressions with compositional data on ten religious categories, similar to Barro and McCleary (2003) and McCleary and Barro (2006), to demonstrate and contrast all these different approaches.

Publication Type: Monograph (Discussion Paper) Copyright, the authors, 2022. Compositional Data, Aitchison Geometry, Isometric Logarithmic Ratios, Economic Growth Regressions H Social Sciences > HB Economic Theory School of Policy & Global Affairs > EconomicsSchool of Policy & Global Affairs > Economics > Discussion Paper Series
Preview
Text - Published Version