Quantcast

Assessing sentence similarity through lexical, syntactic and semantic analysis

Research paper by Rafael Ferreira, Rafael Dueire Lins; Steven J. Simske; Fred Freitas; Marcelo Riss

Indexed on: 06 Oct '16Published on: 12 Feb '16Published in: Computer Speech and Language



Abstract

Publication date: Available online 6 February 2016 Source:Computer Speech & Language Author(s): Rafael Ferreira, Rafael Dueire Lins, Steven J. Simske, Fred Freitas, Marcelo Riss The degree of similarity between sentences is assessed by sentence similarity methods. Sentence similarity methods play an important role in areas such as summarization, search, and categorization of texts, machine translation, etc. The current methods for assessing sentence similarity are based only on the similarity between the words in the sentences. Such methods either represent sentences as bag of words vectors or are restricted to the syntactic information of the sentences. Two important problems in language understanding are not addressed by such strategies: the word order and the meaning of the sentence as a whole. The new sentence similarity assessment measure presented here largely improves and refines a recently published method that takes into account the lexical, syntactic and semantic components of sentences. The new method was benchmarked using Li–McLean, showing that it outperforms the state of the art systems and achieves results comparable to the evaluation made by humans. Besides that, the method proposed was extensively tested using the SemEval 2012 sentence similarity test set and in the evaluation of the degree of similarity between summaries using the CNN-corpus. In both cases, the measure proposed here was proved effective and useful.