Sociolinguistic Structure Induction
Jacob Eisenstein, directeur du Georgia Tech Computational Linguistics Lab
May 18, 2016
from 10:00 to 12:00
|Where||ENS de Lyon, site Descartes, salle F120|
|Add event to calendar||
Abstract: Language interacts with a variety of social structures, from local social networks to large-scale social categories. Computational modeling of language's social dimension offers potential benefits for both sociolinguistics and language technology. With the rise of text data sources that include rich social metadata, there is the potential to build a new generation of computational linguistic methodologies for social scientific analysis. I will describe two such efforts, both relating to language variation across social network ties: the first project relates the spread of neologisms to contextual variation, which can be viewed in the theoretical framework of audience design; the second project shows how formality of address can be modeled in the context of signed social network structures. Methodologically, these projects employ probabilistic factor-graph models, which provide a unified framework for treating social and linguistic data. Moreover, computational linguistics stands to benefit from better engagement with sociolinguistics and social network analysis. Contemporary language technology is bedeviled by sociolinguistic variation, which complicates efforts to search, mine, and translate text. I will present our recent work on leveraging social network community detection to make document classification more robust to sociolinguistic variation, building on recent progress on task-specific word embeddings.