IFISCPublication details

Publicaciones

Crowdsourcing Dialect Characterization through Twitter

Gonçalves, B.; Sánchez, D.
PLoS ONE 9, e112074 (1-6) (2014)

We perform a large-scale analysis of language diatopic variation using geotagged microblogging datasets. By collecting all Twitter messages written in Spanish over more than two years, we build a corpus from which a carefully selected list of concepts allows us to characterize Spanish varieties on a global scale. A cluster analysis proves the existence of well defined macroregions sharing common lexical properties. Remarkably enough, we find that Spanish language is split into two superdialects, namely, an urban speech used across major American and Spanish citites and a diverse form that encompasses rural areas and small towns. The latter can be further clustered into smaller varieties with a stronger regional character.

DOI 10.1371/journal.pone.0112074 
Número ArXiv 1407.7094 
Ficheros journal.pone.0112074.pdf (1234993 Bytes)
Volver a la lista de publicaciones

Charlas y Presentaciones

Buscar en las bases de datos IFISC los seminarios y las presentaciones

Cambiar Idioma

Búsqueda

Intranet

Pie de página

Consejo Superior de Investigaciones Científicas Universitat de les Illes Balears