Amaç Herdağdelen, Marco Marelli


Corpus-based word frequencies are one of the most important predictors in language processing tasks. Frequencies based on conversational corpora (such as movie subtitles) are shown to better capture the variance in lexical decision tasks compared to traditional corpora. In this study, we show that frequencies computed from social media are currently the best frequency-based estimators of lexical decision reaction times (up to 3.6% increase in explained variance). The results are robust (observed for Twitter- and Facebook-based frequencies on American English and British English datasets) and are still substantial when we control for corpus size.