Show simple item record

dc.rights.licensehttps://creativecommons.org/licenses/by-sa/2.5/ar/es_AR
dc.contributor.authorGravano, Agustínes_AR
dc.contributor.authorPérez, Juan Manueles_AR
dc.contributor.authorAleman, Damian E.es_AR
dc.contributor.authorKalinowski, Santiago N.es_AR
dc.coverage.spatialEspañaes_AR
dc.coverage.spatialArgentinaes_AR
dc.date.accessioned2022-11-23T16:59:37Z
dc.date.available2022-11-23T16:59:37Z
dc.date.issued2022
dc.identifier.citationGPérez, J.M., Aleman, D.E., Kalinowski, S.N., Gravano, A. (2022) "Exploiting user-frequency information for mining regionalisms in Argentinian Spanish from Twitter", Revista de la Sociedad Española para el Procesamiento del Lenguaje Natural (SEPLN), Vol. 69, pp. 51-62, Sep 2022. http://journal.sepln.org/sepln/ojs/ojs/index.php/pln/article/view/6427/3835
dc.identifier.issn1989-7553
dc.identifier.issn1135-5948
dc.identifier.urihttps://repositorio.utdt.edu/handle/20.500.13098/11442
dc.identifier.urihttp://journal.sepln.org/sepln/ojs/ojs/index.php/pln/article/view/6427
dc.description.abstractThe task of detecting regionalisms (expressions or words used in certain regions) has traditionally relied on the use of questionnaires and surveys, heavily depending on the expertise and intuition of the surveyor. The emergence of social media and microblogging services has produced an unprecedented wealth of content (mainly informal text generated by users), opening new opportunities for linguists to extend their studies of language variation. Previous work on the automatic detection of regionalisms depended mostly on word frequencies. In this work, we present a novel metric based on Information Theory that incorporates user frequency. We tested this metric on a corpus of Argentinian Spanish tweets in two ways: via manual annotation of the relevance of the retrieved terms, and also as a feature selection method for geolocation of users. In either case, our metric outperformed other techniques based on word frequency, suggesting that measuring the amount of users that use a word is an informative feature. This tool has helped lexicographers discover several unregistered words of Argentinian Spanish, as well as di erent meanings assigned to registered words.es_AR
dc.format.extentp.51-62es_AR
dc.format.mediumapplication/pdfes_AR
dc.languagespaes_AR
dc.publisherProcesamiento del Lenguaje Natural, Revistaes_AR
dc.relation.ispartofProcesamiento del Lenguaje Natural, Revista nº 69, septiembre de 2022
dc.rightsinfo:eu-repo/semantics/openAccesses_AR
dc.subjectLexical dialectologyes_AR
dc.subjectSocial mediaes_AR
dc.subjectSpanish variantses_AR
dc.subjectEntropyes_AR
dc.titleExploiting user-frequency information for mining regionalisms in Argentinian Spanish from Twitteres_AR
dc.typeinfo:eu-repo/semantics/articlees_AR
dc.type.versioninfo:eu-repo/semantics/publishedVersiones_AR


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record