Indexed on: 03 Apr '17Published on: 18 Mar '17Published in: Journal of Multivariate Analysis
Multiple correspondence analysis is a dimension reduction technique which plays a large role in the analysis of tables with categorical nominal variables, such as survey data. Though it is usually motivated and derived using geometric considerations, we prove that in fact, it can be seen as a single proximal Newton step of a natural bilinear exponential family model for categorical data: the multinomial logit bilinear model. We compare and contrast the behavior of multiple correspondence analysis with that of this model on simulated data, and discuss new insights into both approaches and their cognate models. Consequently, multiple correspondence analysis can be used to approximate the parameters of the multilogit model. Indeed, estimating the model’s parameters is non-trivial, whereas multiple correspondence analysis has the advantage of being easily solved by a singular value decomposition, and scalable to large data sets. We illustrate the methods on a survey of the drinking habits in France in the context of European policies against the harmful effects of alcohol on society.