503 research outputs found

    Enhancing Twitter Data Analysis with Simple Semantic Filtering: Example in Tracking Influenza-Like Illnesses

    Full text link
    Systems that exploit publicly available user generated content such as Twitter messages have been successful in tracking seasonal influenza. We developed a novel filtering method for Influenza-Like-Illnesses (ILI)-related messages using 587 million messages from Twitter micro-blogs. We first filtered messages based on syndrome keywords from the BioCaster Ontology, an extant knowledge model of laymen's terms. We then filtered the messages according to semantic features such as negation, hashtags, emoticons, humor and geography. The data covered 36 weeks for the US 2009 influenza season from 30th August 2009 to 8th May 2010. Results showed that our system achieved the highest Pearson correlation coefficient of 98.46% (p-value<2.2e-16), an improvement of 3.98% over the previous state-of-the-art method. The results indicate that simple NLP-based enhancements to existing approaches to mine Twitter data can increase the value of this inexpensive resource.Comment: 10 pages, 5 figures, IEEE HISB 2012 conference, Sept 27-28, 2012, La Jolla, California, U

    Analyzing Destination Choices of Tourists and Residents from Location Based Social Media Data

    Get PDF
    Ubiquitous uses of social media platforms in smartphones have created an opportunity to gather digital traces of individual activities at a large scale. Traditional travel surveys fall short in collecting longitudinal travel behavior data for a large number of people in a cost effective way, especially for the transient population such as tourists. This study presents an innovating methodological framework, using machine learning and econometric approaches, to gather and analyze location-based social media (LBSM) data to understand individual destination choices. First, using Twitter\u27s search interface, we have collected Twitter posts of nearly 156,000 users for the state of Florida. We have adopted several filtering techniques to create a reliable sample from noisy Twitter data. An ensemble classification technique is proposed to classify tourists and residents from user coordinates. The performance of the proposed classifier has been validated using manually labeled data and compared against the state-of-the-art classification methods. Second, using different clustering methods, we have analyzed the spatial distributions of destination choices of tourists and residents. The clusters from tourist destinations revealed most popular tourist spots including emerging tourist attractions in Florida. Third, to predict a tourist\u27s next destination type, we have estimated a Conditional Random Field (CRF) model with reasonable accuracy. Fourth, to analyze resident destination choice behavior, this study proposes an extensive data merging operation among the collected Twitter data and different geographic database from state level data libraries. We have estimated a Panel Latent Segmentation Multinomial Logit (PLSMNL) model to find the characteristics affecting individual destination choices. The proposed PLSMNL model is found to better explain the effects of variables on destination choices compared to trip-specific Multinomial Logit Models. The findings of this study show the potential of LBSM data in future transportation and planning studies where collecting individual activity data is expensive

    Knowledge extraction and popularity modeling using social media

    Get PDF

    Computational Analysis of Urban Places Using Mobile Crowdsensing

    Get PDF
    In cities, urban places provide a socio-cultural habitat for people to counterbalance the daily grind of urban life, an environment away from home and work. Places provide an environment for people to communicate, share perspectives, and in the process form new social connections. Due to the active role of places to the social fabric of city life, it is important to understand how people perceive and experience places. One fundamental construct that relates place and experience is ambiance, i.e., the impressions we ubiquitously form when we go out. Young people are key actors of urban life, specially at night, and as such play an equal role in co-creating and appropriating the urban space. Understanding how places and their youth inhabitants interact at night is a relevant urban issue. Until recently, our ability to assess the visual and perceptual qualities of urban spaces and to study the dynamics surrounding youth experiences in those spaces have been limited partly due to the lack of quantitative data. However, the growth of computational methods and tools including sensor-rich mobile devices, social multimedia platforms, and crowdsourcing tools have opened ways to measure urban perception at scale, and to deepen our understanding of nightlife as experienced by young people. In this thesis, as a first contribution, we present the design, implementation and computational analysis of four mobile crowdsensing studies involving youth populations from various countries to understand and infer phenomena related to urban places and people. We gathered a variety of explicit and implicit crowdsourced data including mobile sensor data and logs, survey responses, and multimedia content (images and videos) from hundreds of crowdworkers and thousands of users of mobile social networks. Second, we showed how crowdsensed images can be used for the computational characterization and analysis of urban perception in indoor and outdoor places. For both place types, urban perception impressions were elicited for several physical and psychological constructs using online crowdsourcing. Using low-level and deep learning features extracted from images, we automatically inferred crowdsourced judgments of indoor ambiance with a maximum R2 of 0.53 and outdoor perception with a maximum R2 of 0.49. Third, we demonstrated the feasibility to collect rich contextual data to study the physical mobility, activities, ambiance context, and social patterns of youth nightlife behavior. Fourth, using supervised machine learning techniques, we automatically classified drinking behavior of young people in an urban, real nightlife setting. Using features extracted from mobile sensor data and application logs, we obtained an overall accuracy of 76.7%. While this thesis contributes towards understanding urban perception and youth nightlife patterns in specific contexts, our research also contributes towards the computational understanding of urban places at scale with high spatial and temporal resolution, using a combination of mobile crowdsensing, social media, machine learning, multimedia analysis, and online crowdsourcing

    観光スポットとルート推薦のためのユーザ適応型旅行プラン生成アルゴリズム

    Get PDF
    早大学位記番号:新8428早稲田大

    Towards evidence-based, GIS-driven national spatial health information infrastructure and surveillance services in the United Kingdom

    Get PDF
    The term "Geographic Information Systems" (GIS) has been added to MeSH in 2003, a step reflecting the importance and growing use of GIS in health and healthcare research and practices. GIS have much more to offer than the obvious digital cartography (map) functions. From a community health perspective, GIS could potentially act as powerful evidence-based practice tools for early problem detection and solving. When properly used, GIS can: inform and educate (professionals and the public); empower decision-making at all levels; help in planning and tweaking clinically and cost-effective actions, in predicting outcomes before making any financial commitments and ascribing priorities in a climate of finite resources; change practices; and continually monitor and analyse changes, as well as sentinel events. Yet despite all these potentials for GIS, they remain under-utilised in the UK National Health Service (NHS). This paper has the following objectives: (1) to illustrate with practical, real-world scenarios and examples from the literature the different GIS methods and uses to improve community health and healthcare practices, e.g., for improving hospital bed availability, in community health and bioterrorism surveillance services, and in the latest SARS outbreak; (2) to discuss challenges and problems currently hindering the wide-scale adoption of GIS across the NHS; and (3) to identify the most important requirements and ingredients for addressing these challenges, and realising GIS potential within the NHS, guided by related initiatives worldwide. The ultimate goal is to illuminate the road towards implementing a comprehensive national, multi-agency spatio-temporal health information infrastructure functioning proactively in real time. The concepts and principles presented in this paper can be also applied in other countries, and on regional (e.g., European Union) and global levels

    Analyzing Granger causality in climate data with time series classification methods

    Get PDF
    Attribution studies in climate science aim for scientifically ascertaining the influence of climatic variations on natural or anthropogenic factors. Many of those studies adopt the concept of Granger causality to infer statistical cause-effect relationships, while utilizing traditional autoregressive models. In this article, we investigate the potential of state-of-the-art time series classification techniques to enhance causal inference in climate science. We conduct a comparative experimental study of different types of algorithms on a large test suite that comprises a unique collection of datasets from the area of climate-vegetation dynamics. The results indicate that specialized time series classification methods are able to improve existing inference procedures. Substantial differences are observed among the methods that were tested

    Knowledge-based and data-driven approaches for geographical information access

    Get PDF
    Geographical Information Access (GeoIA) can be defined as a way of retrieving information from textual collections that includes the automatic analysis and interpretation of the geographical constraints and terms present in queries and documents. This PhD thesis presents, describes and evaluates several heterogeneous approaches for the following three GeoIA tasks: Geographical Information Retrieval (GIR), Geographical Question Answering (GeoQA), and Textual Georeferencing (TG). The GIR task deals with user queries that search over documents (e.g. ¿vineyards in California?) and the GeoQA task treats questions that retrieve answers (e.g. ¿What is the capital of France?). On the other hand, TG is the task of associate one or more georeferences (such as polygons or coordinates in a geodetic reference system) to electronic documents. Current state-of-the-art AI algorithms are not yet fully understanding the semantic meaning and the geographical constraints and terms present in queries and document collections. This thesis attempts to improve the effectiveness results of GeoIA tasks by: 1) improving the detection, understanding, and use of a part of the geographical and the thematic content of queries and documents with Toponym Recognition, Toponym Disambiguation and Natural Language Processing (NLP) techniques, and 2) combining Geographical Knowledge-Based Heuristics based on common sense with Data-Driven IR algorithms. The main contributions of this thesis to the state-of-the-art of GeoIA tasks are: 1) The presentation of 10 novel approaches for GeoIA tasks: 3 approaches for GIR, 3 for GeoQA, and 4 for Textual Georeferencing (TG). 2) The evaluation of these novel approaches in these contexts: within official evaluation benchmarks, after evaluation benchmarks with the test collections, and with other specific datasets. Most of these algorithms have been evaluated in international evaluations and some of them achieved top-ranked state-of-the-art results, including top-performing results in GIR (GeoCLEF 2007) and TG (MediaEval 2014) benchmarks. 3) The experiments reported in this PhD thesis show that the approaches can combine effectively Geographical Knowledge and NLP with Data-Driven techniques to improve the efectiveness measures of the three Geographical Information Access tasks investigated. 4) TALPGeoIR: a novel GIR approach that combines Geographical Knowledge ReRanking (GeoKR), NLP and Relevance Feedback (RF) that achieved state-of-the-art results in official GeoCLEF benchmarks (Ferrés and Rodríguez, 2008; Mandl et al., 2008) and posterior experiments (Ferrés and Rodríguez, 2015a). This approach has been evaluated with the full GeoCLEF corpus (100 topics) and showed that GeoKR, NLP, and RF techniques evaluated separately or in combination improve the results in MAP and R-Precision effectiveness measures of the state-of-the-art IR algorithms TF-IDF, BM25 and InL2 and show statistical significance in most of the experiments. 5) GeoTALP-QA: a scope-based GeoQA approach for Spanish and English and its evaluation with a set of questions of the Spanish geography (Ferrés and Rodríguez, 2006). 6) Four state-of-the-art Textual Georeferencing approaches for informal and formal documents that achieved state-of-the-art results in evaluation benchmarks (Ferrés and Rodríguez, 2014) and posterior experiments (Ferrés and Rodríguez, 2011; Ferrés and Rodríguez, 2015b).L'Accés a la Informació Geogràfica (GeoAI) pot ser definit com una forma de recuperar informació de col·lecions textuals que inclou l'anàlisi automàtic i la interpretació dels termes i restriccions geogràfiques que apareixen en consultes i documents. Aquesta tesi doctoral presenta, descriu i avalua varies aproximacions heterogènies a les seguents tasques de GeoAI: Recuperació de la Informació Geogràfica (RIG), Cerca de la Resposta Geogràfica (GeoCR), i Georeferenciament Textual (GT). La tasca de RIG tracta amb consultes d'usuari que cerquen documents (e.g. ¿vinyes a California?) i la tasca GeoCR tracta de recuperar respostes concretes a preguntes (e.g. ¿Quina és la capital de França?). D'altra banda, GT es la tasca de relacionar una o més referències geogràfiques (com polígons o coordenades en un sistema de referència geodètic) a documents electrònics. Els algoritmes de l'estat de l'art actual en Intel·ligència Artificial encara no comprenen completament el significat semàntic i els termes i les restriccions geogràfiques presents en consultes i col·leccions de documents. Aquesta tesi intenta millorar els resultats en efectivitat de les tasques de GeoAI de la seguent manera: 1) millorant la detecció, comprensió, i la utilització d'una part del contingut geogràfic i temàtic de les consultes i documents amb tècniques de reconeixement de topònims, desambiguació de topònims, i Processament del Llenguatge Natural (PLN), i 2) combinant heurístics basats en Coneixement Geogràfic i en el sentit comú humà amb algoritmes de Recuperació de la Informació basats en dades. Les principals contribucions d'aquesta tesi a l'estat de l'art de les tasques de GeoAI són: 1) La presentació de 10 noves aproximacions a les tasques de GeoAI: 3 aproximacions per RIG, 3 per GeoCR, i 4 per Georeferenciament Textual (GT). 2) L'avaluació d'aquestes noves aproximacions en aquests contexts: en el marc d'avaluacions comparatives internacionals, posteriorment a avaluacions comparatives internacionals amb les col·lections de test, i amb altres conjunts de dades específics. La majoria d'aquests algoritmes han estat avaluats en avaluacions comparatives internacionals i alguns d'ells aconseguiren alguns dels millors resultats en l'estat de l'art, com per exemple els resultats en comparatives de RIG (GeoCLEF 2007) i GT (MediaEval 2014). 3) Els experiments descrits en aquesta tesi mostren que les aproximacions poden combinar coneixement geogràfic i PLN amb tècniques basades en dades per millorar les mesures d'efectivitat en les tres tasques de l'Accés a la Informació Geogràfica investigades. 4) TALPGeoIR: una nova aproximació a la RIG que combina Re-Ranking amb Coneixement Geogràfic (GeoKR), PLN i Retroalimentació de Rellevancia (RR) que aconseguí resultats en l'estat de l'art en comparatives oficials GeoCLEF (Ferrés and Rodríguez, 2008; Mandl et al., 2008) i en experiments posteriors (Ferrés and Rodríguez, 2015a). Aquesta aproximació ha estat avaluada amb el conjunt complert del corpus GeoCLEF (100 topics) i ha mostrat que les tècniques GeoKR, PLN i RR avaluades separadament o en combinació milloren els resultats en les mesures efectivitat MAP i R-Precision dels algoritmes de l'estat de l'art en Recuperació de la Infomació TF-IDF, BM25 i InL2 i a més mostren significació estadística en la majoria dels experiments. 5) GeoTALP-QA: una aproximació basada en l'àmbit geogràfic per espanyol i anglès i la seva avaluació amb un conjunt de preguntes de la geografía espanyola (Ferrés and Rodríguez, 2006). 6) Quatre aproximacions per al georeferenciament de documents formals i informals que obtingueren resultats en l'estat de l'art en avaluacions comparatives (Ferrés and Rodríguez, 2014) i en experiments posteriors (Ferrés and Rodríguez, 2011; Ferrés and Rodríguez, 2015b)
    corecore