Indexed on: 11 Jun '15Published on: 11 Jun '15Published in: Knowledge and Information Systems
Cross-lingual sentiment classification is a popular research topic in natural language processing. The fundamental challenge of cross-lingual learning stems from a lack of overlap between the feature spaces of the source language data and the target language data. In this article, we propose a new model which uses stacked autoencoders to learn language-independent high-level feature representations for the both languages in an unsupervised fashion. The proposed framework aims to force the aligned input bilingual sentences into a common latent space, and the objective function is defined by minimizing the input and output vector representations as well as the distance of the common representations in the latent space. Sentiment classifiers trained on the source language can be adapted to predict sentiment polarity of the target language with the language-independent high-level feature representations. We conduct extensive experiments on English–Chinese sentiment classification tasks of multiple data sets. Our experimental results demonstrate the efficacy of the proposed cross-lingual approach.