1 research outputs found

    Detecting Locations from Twitter Messages Invited Talk

    Get PDF
    Abstract There is a large amount of information that can be extracted automatically from social media messages. Of particular interest are the topics discussed by the users, the opinions and emotions expressed, and the events and the locations mentioned. This work focuses on machine learning methods for detecting locations from Twitter messages, because the extracted locations can be useful in business, marketing and defence applications . There are two types of locations that we are interested in: location entities mentioned in the text of each message and the physical locations of the users. For the first type of locations (task 1), we detected expressions that denote locations and we classified them into names of cities, provinces/states, and countries. We approached the task in a novel way, consisting in two stages. In the first stage, we trained Conditional Random Field models with various sets of features. We collected and annotated our own dataset for training and testing. In the second stage, we resolved cases when more than one place with the same name exists, by applying a set of heuristics . For the second type of locations (task 2), we put together all the tweets written by a user, in order to predict his/her physical location. Only a few users declare their locations in their Twitter profiles, but this is sufficient to automatically produce training and test data for our classifiers. We experimented with two existing datasets collected from users located in the U.S. We propose a deep learning architecture for the solving the task, because deep learning was shown to work well for other natural language processing tasks, and because standard classifiers were already tested for the user location task. We designed a model that predicts the U.S. region of the user and his/her U.S. state, and another model that predicts the longitude and latitude of the user's location. We found that stacked denoising autoencoders are well suited for this task, with results comparable to the state-of-the-art
    corecore