5 research outputs found

    A Hierarchical Location Prediction Neural Network for Twitter User Geolocation

    Full text link
    Accurate estimation of user location is important for many online services. Previous neural network based methods largely ignore the hierarchical structure among locations. In this paper, we propose a hierarchical location prediction neural network for Twitter user geolocation. Our model first predicts the home country for a user, then uses the country result to guide the city-level prediction. In addition, we employ a character-aware word embedding layer to overcome the noisy information in tweets. With the feature fusion layer, our model can accommodate various feature combinations and achieves state-of-the-art results over three commonly used benchmarks under different feature settings. It not only improves the prediction accuracy but also greatly reduces the mean error distance.Comment: Accepted by EMNLP 201

    Predicting the age of social network users from user-generated texts with word embeddings

    Get PDF
    © 2016 FRUCT.Many web-based applications such as advertising or recommender systems often critically depend on the demographic information, which may be unavailable for new or anonymous users. We study the problem of predicting demographic information based on user-generated texts on a Russian-language dataset from a large social network. We evaluate the efficiency of age prediction algorithms based on word2vec word embeddings and conduct a comprehensive experimental evaluation, comparing these algorithms with each other and with classical baseline approaches

    Ranking de relevância baseado em informações geográficas e sociais.

    Get PDF
    Recuperação de Informação Geográfica (GIR) é uma área de pesquisa que desenvolve e viabiliza a construção de mecanismos de busca por conteúdos distribuídos pela Internet envolvendo algum contexto geográfico. Os motores de busca geográfica, que são artefatos produzidos na área de GIR, podem ser especificados para trabalhar em diversos contextos (e.g., esportes, concursos públicos), buscando um tratamento adequado ao tipo de documento manipulado. Atualmente, a comunidade científica e o meio comercial vêm concentrando esforços na construção de motores de busca geográfica com o foco em encontrar notícias distribuídas na Internet. Contudo, motores de busca (geográfica ou não) com foco em notícias, deveriam considerar o fator de credibilidade da informação contida nas mesmas no momento de ordená-las. Infelizmente, na maior parte das vezes, isso não acontece. Mensurar a credibilidade de notícias é uma atividade onerosa e complexa, por exigir o conhecimento dos fatos relatados. Dessa forma, os motores de busca acabam deixando a cargo do usuário a responsabilidade em confiar no que está sendo lido. Nesse contexto, esta dissertação propõe um método de ranking de relevância com foco em notícias e baseado em informações colhidas em redes sociais, para valorar um grau de credibilidade e, assim, ordená-las. O valor de credibilidade da notícia é calculado considerando a afinidade dos usuários, que a compartilharam em sua rede social, com as localidades mencionadas na notícia. Por fim, o ranking de relevância proposto é integrado a uma ferramenta de busca e leitura de notícias, denominada GeoSEn News, que viabiliza a consulta por meio de diversas operações espaciais e permite a visualização dos resultados em diferentes perspectivas. Tal ferramenta foi utilizada para avaliar o método proposto através de experimentos utilizando dados colhidos na rede social Twitter e em mídias informativas espalhadas pelo Brasil. A avaliação apresentou resultados promissores e atestou a viabilidade da construção do ranking de relevância que se baseia em informações coletadas em redes sociais.Geographic Information Retrieval is a research field that develops and allows the construction of search engines to retrieve information with geographic context that is available on the Internet. Produced in the GIR field, geographic search engines can be specified to work in many different contexts (e.g., as sports, concerts), seeking proper ways to handle the chosen document type. Nowadays, the scientific community and the commerce are focusing efforts on building geographic search engines to find news over the Internet. However, search engines (geographical or otherwise) focused on news should consider the information credibility factor in the moment of ranking them. Unfortunately, in most cases, it is not what happens. Measure the news credibility is a complex and expensive task since it requires knowledge of the stated facts. Thereby, search engines end up giving the user the responsibility to trust or not what is being read. In this context, this work proposes a relevance ranking method focused in news and based on information collected from social networks, to evaluate a credibility factor and thus, rank them. The news credibility value is calculated considering the affinity of users who have shared it on their social network with the locations mentioned in the news. Lastly, the proposed relevance ranking is integrated with a search engine and reading news tool called GeoSEn News, which enables various spatial operations queries and allows result visualization in different perspectives. Through experiments using data collected in the social network Twitter and informational media throughout Brazil, this tool was used to evaluate the proposed method. The evaluation presented promising results and certified the feasibility of building relevance ranking based on information collected from social networks.Cape

    Inferring the location of authors from words in their texts

    No full text
    For the purposes of computational dialec- tology or other geographically bound text analysis tasks, texts must be annotated with their or their authors’ location. Many texts are locatable but most have no ex- plicit annotation of place. This paper describes a series of experiments to de- termine how positionally annotated mi- croblog posts can be used to learn loca- tion indicating words which then can be used to locate blog texts and their authors. A Gaussian distribution is used to model the locational qualities of words. We in- troduce the notion of placeness to describe how locational words are. We find that modelling word distributions to account for several locations and thus several Gaussian distributions per word, defining a filter which picks out words with high placeness based on their local distributional context, and aggregating lo- cational information in a centroid for each text gives the most useful results. The re- sults are applied to data in the Swedish language. Qc 20150618SINUS (Spridning av innovationer i nutida svenska