Search CORE

4,011 research outputs found

A Survey of Location Prediction on Twitter

Author: Han Jialong
Sun Aixin
Zheng Xin
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2018
Field of study

Locations, e.g., countries, states, cities, and point-of-interests, are central to news, emergency events, and people's daily lives. Automatic identification of locations associated with or mentioned in documents has been explored for decades. As one of the most popular online social network platforms, Twitter has attracted a large number of users who send millions of tweets on daily basis. Due to the world-wide coverage of its users and real-time freshness of tweets, location prediction on Twitter has gained significant attention in recent years. Research efforts are spent on dealing with new challenges and opportunities brought by the noisy, short, and context-rich nature of tweets. In this survey, we aim at offering an overall picture of location prediction on Twitter. Specifically, we concentrate on the prediction of user home locations, tweet locations, and mentioned locations. We first define the three tasks and review the evaluation metrics. By summarizing Twitter network, tweet content, and tweet context as potential inputs, we then structurally highlight how the problems depend on these inputs. Each dependency is illustrated by a comprehensive review of the corresponding strategies adopted in state-of-the-art approaches. In addition, we also briefly review two related problems, i.e., semantic location prediction and point-of-interest recommendation. Finally, we list future research directions.Comment: Accepted to TKDE. 30 pages, 1 figur

arXiv.org e-Print Archive

DR-NTU (Digital Repository of NTU)

Confounds and Consequences in Geotagged Twitter Data

Author: Eisenstein Jacob
Pavalanathan Umashanthi
Publication venue
Publication date: 01/01/2015
Field of study

Twitter is often used in quantitative studies that identify geographically-preferred topics, writing styles, and entities. These studies rely on either GPS coordinates attached to individual messages, or on the user-supplied location field in each profile. In this paper, we compare these data acquisition techniques and quantify the biases that they introduce; we also measure their effects on linguistic analysis and text-based geolocation. GPS-tagging and self-reported locations yield measurably different corpora, and these linguistic differences are partially attributable to differences in dataset composition by age and gender. Using a latent variable model to induce age and gender, we show how these demographic variables interact with geography to affect language use. We also show that the accuracy of text-based geolocation varies with population demographics, giving the best results for men above the age of 40.Comment: final version for EMNLP 201

arXiv.org e-Print Archive

CiteSeerX

Crossref

Recommended from our members

Towards a People's Social Epidemiology: Envisioning a More Inclusive and Equitable Future for Social Epi Research and Practice in the 21st Century.

Author: Allen Amani
Morello-Frosch Rachel
Mujahid Mahasin
Petteway Ryan
Publication venue: eScholarship, University of California
Publication date: 01/10/2019
Field of study

Social epidemiology has made critical contributions to understanding population health. However, translation of social epidemiology science into action remains a challenge, raising concerns about the impacts of the field beyond academia. With so much focus on issues related to social position, discrimination, racism, power, and privilege, there has been surprisingly little deliberation about the extent and value of social inclusion and equity within the field itself. Indeed, the challenge of translation/action might be more readily met through re-envisioning the role of the people within the research/practice enterprise-reimagining what "social" could, or even should, mean for the future of the field. A potential path forward rests at the nexus of social epidemiology, community-based participatory research (CBPR), and information and communication technology (ICT). Here, we draw from social epidemiology, CBPR, and ICT literatures to introduce A People's Social Epi-a multi-tiered framework for guiding social epidemiology in becoming more inclusive, equitable, and actionable for 21st century practice. In presenting this framework, we suggest the value of taking participatory, collaborative approaches anchored in CBPR and ICT principles and technological affordances-especially within the context of place-based and environmental research. We believe that such approaches present opportunities to create a social epidemiology that is of, with, and by the people-not simply about them. In this spirit, we suggest 10 ICT tools to "socialize" social epidemiology and outline 10 ways to move towards A People's Social Epi in practice

eScholarship - University of California

PDXScholar (Portland State University)

Earth observations from DSCOVR EPIC instrument

Author: Blank Karin
Carn Simon
Cede Alexander
Geogdzhayev Igor
Herman Jay
Huang Dong
Huang Liang-Kang
Knyazikhin Yuri
Kowalewski Matthew
Krotkov Nickolay
Lyapustin Alexei
Marshak Alexander
McPeters Richard
Meyer Kerry G.
Szabo Adam
Torres Omar
Yang Yuekui
Publication venue: 'American Meteorological Society'
Publication date: 01/09/2018
Field of study

The National Oceanic and Atmospheric Administration (NOAA) Deep Space Climate Observatory (DSCOVR) spacecraft was launched on 11 February 2015 and in June 2015 achieved its orbit at the first Lagrange point (L1), 1.5 million km from Earth toward the sun. There are two National Aeronautics and Space Administration (NASA) Earth-observing instruments on board: the Earth Polychromatic Imaging Camera (EPIC) and the National Institute of Standards and Technology Advanced Radiometer (NISTAR). The purpose of this paper is to describe various capabilities of the DSCOVR EPIC instrument. EPIC views the entire sunlit Earth from sunrise to sunset at the backscattering direction (scattering angles between 168.5° and 175.5°) with 10 narrowband filters: 317, 325, 340, 388, 443, 552, 680, 688, 764, and 779 nm. We discuss a number of preprocessing steps necessary for EPIC calibration including the geolocation algorithm and the radiometric calibration for each wavelength channel in terms of EPIC counts per second for conversion to reflectance units. The principal EPIC products are total ozone (O3) amount, scene reflectivity, erythemal irradiance, ultraviolet (UV) aerosol properties, sulfur dioxide (SO2) for volcanic eruptions, surface spectral reflectance, vegetation properties, and cloud products including cloud height. Finally, we describe the observation of horizontally oriented ice crystals in clouds and the unexpected use of the O2 B-band absorption for vegetation properties.The NASA GSFC DSCOVR project is funded by NASA Earth Science Division. We gratefully acknowledge the work by S. Taylor and B. Fisher for help with the SO2 retrievals and Marshall Sutton, Carl Hostetter, and the EPIC NISTAR project for help with EPIC data. We also would like to thank the EPIC Cloud Algorithm team, especially Dr. Gala Wind, for the contribution to the EPIC cloud products. (NASA Earth Science Division)Accepted manuscrip

Boston University Institutional Repository (OpenBU)

NASA Technical Reports Server

The astrometric Gaia-FUN-SSO observation campaign of 99 942 Apophis

Author: Abe L.
Andreev M.
Arlot J. -E.
Asami A.
Assafin M.
Ayvasian V.
Bancelin D.
Baransky A.
Belcheva M.
Bendjoya Ph.
Bikmaev I.
Burkhonov O. A.
Camci U.
Carbognani A.
Carry B.
Colas F.
David P.
Desmars J.
Devyatkin A. V.
Eggl S.
Ehgamberdiev Sh. A.
Enikova P.
Eyer L.
Galeev A.
Gerlach E.
Godunova V.
Golubaev A. V.
Gorshanov D. L.
Gumerov R.
Hashimoto N.
Helvaci M.
Hestroffer D.
Ibryamov S.
Inasaridze R. Ya.
Ivantsov A.
Khamitov I.
Kostov A.
Kozhukhov A. M.
Kozyryev Y.
Krugly Yu. N.
Kryuchkovskiy V.
Kulichenko N.
Maigurova N.
Manilla-Robles A.
Martyusheva A. A.
Molotov I. E.
Nikolov G.
Nikolov P.
Nishiyama K.
Okumura S.
Palaversa L.
Parmonov O.
Peng Q. Y.
Petrova S. N.
Pinigin G. I.
Pomazan A.
Rivet J. -P.
Rocher P.
Sakamoto T.
Sakhibullin N.
Sergeev O.
Sergeyev A. V.
Shulga O. V.
Suarez O.
Sybiryakova Y.
Takahashi N.
Tarady V.
Thuillot W.
Todd M.
Urakawa S.
Uysal O.
Vaduvescu O.
Vovk V.
Zhang X. -L.
Publication venue: 'EDP Sciences'
Publication date: 02/10/2015
Field of study

Astrometric observations performed by the Gaia Follow-Up Network for Solar System Objects (Gaia-FUN-SSO) play a key role in ensuring that moving objects first detected by ESA's Gaia mission remain recoverable after their discovery. An observation campaign on the potentially hazardous asteroid (99 942) Apophis was conducted during the asteroid's latest period of visibility, from 12/21/2012 to 5/2/2013, to test the coordination and evaluate the overall performance of the Gaia-FUN-SSO . The 2732 high quality astrometric observations acquired during the Gaia-FUN-SSO campaign were reduced with the Platform for Reduction of Astronomical Images Automatically (PRAIA), using the USNO CCD Astrograph Catalogue 4 (UCAC4) as a reference. The astrometric reduction process and the precision of the newly obtained measurements are discussed. We compare the residuals of astrometric observations that we obtained using this reduction process to data sets that were individually reduced by observers and accepted by the Minor Planet Center. We obtained 2103 previously unpublished astrometric positions and provide these to the scientific community. Using these data we show that our reduction of this astrometric campaign with a reliable stellar catalog substantially improves the quality of the astrometric results. We present evidence that the new data will help to reduce the orbit uncertainty of Apophis during its close approach in 2029. We show that uncertainties due to geolocations of observing stations, as well as rounding of astrometric data can introduce an unnecessary degradation in the quality of the resulting astrometric positions. Finally, we discuss the impact of our campaign reduction on the recovery process of newly discovered asteroids.Comment: Accepted for publication in A&

arXiv.org e-Print Archive

EDP Sciences OAI-PMH repository (1.2.0)

Passport: Enabling Accurate Country-Level Router Geolocation using Inaccurate Sources

Author: Choffnes David
Goldberg Sharon
Rehman Muzammil Abdul
Publication venue
Publication date: 23/07/2019
Field of study

When does Internet traffic cross international borders? This question has major geopolitical, legal and social implications and is surprisingly difficult to answer. A critical stumbling block is a dearth of tools that accurately map routers traversed by Internet traffic to the countries in which they are located. This paper presents Passport: a new approach for efficient, accurate country-level router geolocation and a system that implements it. Passport provides location predictions with limited active measurements, using machine learning to combine information from IP geolocation databases, router hostnames, whois records, and ping measurements. We show that Passport substantially outperforms existing techniques, and identify cases where paths traverse countries with implications for security, privacy, and performance

arXiv.org e-Print Archive

Boston University Institutional Repository (OpenBU)

Passport: enabling accurate country-level router geolocation using inaccurate sources

Author: Choffnes David
Goldberg Sharon
Rehman Muzammil Abdul
Publication venue
Publication date: 30/01/2020
Field of study

Boston University Institutional Repository (OpenBU)

Investigating Full-Waveform Lidar Data for Detection and Recognition of Vertical Objects

Author: Parrish Christopher
Scarpace Frank L
Publication venue: University of New Hampshire Scholars\u27 Repository
Publication date: 01/05/2007
Field of study

A recent innovation in commercially-available topographic lidar systems is the ability to record return waveforms at high sampling frequencies. These “full-waveform” systems provide up to two orders of magnitude more data than “discrete-return” systems. However, due to the relatively limited capabilities of current processing and analysis software, more data does not always translate into more or better information for object extraction applications. In this paper, we describe a new approach for exploiting full waveform data to improve detection and recognition of vertical objects, such as trees, poles, buildings, towers, and antennas. Each waveform is first deconvolved using an expectation-maximization (EM) algorithm to obtain a train of spikes in time, where each spike corresponds to an individual laser reflection. The output is then georeferenced to create extremely dense, detailed X,Y,Z,I point clouds, where I denotes intensity. A tunable parameter is used to control the number of spikes in the deconvolved waveform, and, hence, the point density of the output point cloud. Preliminary results indicate that the average number of points on vertical objects using this method is several times higher than using discrete-return lidar data. The next steps in this ongoing research will involve voxelizing the lidar point cloud to obtain a high-resolution volume of intensity values and computing a 3D wavelet representation. The final step will entail performing vertical object detection/recognition in the wavelet domain using a multiresolution template matching approach

UNH Scholars' Repository

RIDDLE: Race and ethnicity Imputation from Disease history with Deep LEarning

Author: Gao Xin
Kim Ji-Sung
Rzhetsky Andrey
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 27/04/2018
Field of study

Anonymized electronic medical records are an increasingly popular source of research data. However, these datasets often lack race and ethnicity information. This creates problems for researchers modeling human disease, as race and ethnicity are powerful confounders for many health exposures and treatment outcomes; race and ethnicity are closely linked to population-specific genetic variation. We showed that deep neural networks generate more accurate estimates for missing racial and ethnic information than competing methods (e.g., logistic regression, random forest). RIDDLE yielded significantly better classification performance across all metrics that were considered: accuracy, cross-entropy loss (error), and area under the curve for receiver operating characteristic plots (all

p < 10^{-6}

). We made specific efforts to interpret the trained neural network models to identify, quantify, and visualize medical features which are predictive of race and ethnicity. We used these characterizations of informative features to perform a systematic comparison of differential disease patterns by race and ethnicity. The fact that clinical histories are informative for imputing race and ethnicity could reflect (1) a skewed distribution of blue- and white-collar professions across racial and ethnic groups, (2) uneven accessibility and subjective importance of prophylactic health, (3) possible variation in lifestyle, such as dietary habits, and (4) differences in background genetic variation which predispose to diseases

arXiv.org e-Print Archive

Directory of Open Access Journals