Exploiting user-frequency information for mining regionalisms in Argentinian Spanish from Twitter

Gravano, Agustín; Pérez, Juan Manuel; Aleman, Damian E.; Kalinowski, Santiago N.

Exploiting user-frequency information for mining regionalisms in Argentinian Spanish from Twitter

Files

Gravano_Procesamiento del Lenguaje Natural_2022.pdf (1.34 MB)

Date

2022

Authors

Gravano, Agustín

Pérez, Juan Manuel

Aleman, Damian E.

Kalinowski, Santiago N.

Publisher

Procesamiento del Lenguaje Natural, Revista

Abstract

The task of detecting regionalisms (expressions or words used in certain regions) has traditionally relied on the use of questionnaires and surveys, heavily depending on the expertise and intuition of the surveyor. The emergence of social media and microblogging services has produced an unprecedented wealth of content (mainly informal text generated by users), opening new opportunities for linguists to extend their studies of language variation. Previous work on the automatic detection of regionalisms depended mostly on word frequencies. In this work, we present a novel metric based on Information Theory that incorporates user frequency. We tested this metric on a corpus of Argentinian Spanish tweets in two ways: via manual annotation of the relevance of the retrieved terms, and also as a feature selection method for geolocation of users. In either case, our metric outperformed other techniques based on word frequency, suggesting that measuring the amount of users that use a word is an informative feature. This tool has helped lexicographers discover several unregistered words of Argentinian Spanish, as well as di erent meanings assigned to registered words.

Keywords

Lexical dialectology, Social media, Spanish variants, Entropy

Citation

GPérez, J.M., Aleman, D.E., Kalinowski, S.N., Gravano, A. (2022) "Exploiting user-frequency information for mining regionalisms in Argentinian Spanish from Twitter", Revista de la Sociedad Española para el Procesamiento del Lenguaje Natural (SEPLN), Vol. 69, pp. 51-62, Sep 2022. http://journal.sepln.org/sepln/ojs/ojs/index.php/pln/article/view/6427/3835

URI

https://repositorio.utdt.edu/handle/20.500.13098/11442
http://journal.sepln.org/sepln/ojs/ojs/index.php/pln/article/view/6427

Collections

Artículos presentados, aceptados y publicados

Full item page

Exploiting user-frequency information for mining regionalisms in Argentinian Spanish from Twitter

Files

Date

Authors

relationships.isAdvisorOf

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

Citation

URI

Collections

Endorsement

Review

Supplemented By

Referenced By