4,011 research outputs found

    A Survey of Location Prediction on Twitter

    Full text link
    Locations, e.g., countries, states, cities, and point-of-interests, are central to news, emergency events, and people's daily lives. Automatic identification of locations associated with or mentioned in documents has been explored for decades. As one of the most popular online social network platforms, Twitter has attracted a large number of users who send millions of tweets on daily basis. Due to the world-wide coverage of its users and real-time freshness of tweets, location prediction on Twitter has gained significant attention in recent years. Research efforts are spent on dealing with new challenges and opportunities brought by the noisy, short, and context-rich nature of tweets. In this survey, we aim at offering an overall picture of location prediction on Twitter. Specifically, we concentrate on the prediction of user home locations, tweet locations, and mentioned locations. We first define the three tasks and review the evaluation metrics. By summarizing Twitter network, tweet content, and tweet context as potential inputs, we then structurally highlight how the problems depend on these inputs. Each dependency is illustrated by a comprehensive review of the corresponding strategies adopted in state-of-the-art approaches. In addition, we also briefly review two related problems, i.e., semantic location prediction and point-of-interest recommendation. Finally, we list future research directions.Comment: Accepted to TKDE. 30 pages, 1 figur

    Confounds and Consequences in Geotagged Twitter Data

    Full text link
    Twitter is often used in quantitative studies that identify geographically-preferred topics, writing styles, and entities. These studies rely on either GPS coordinates attached to individual messages, or on the user-supplied location field in each profile. In this paper, we compare these data acquisition techniques and quantify the biases that they introduce; we also measure their effects on linguistic analysis and text-based geolocation. GPS-tagging and self-reported locations yield measurably different corpora, and these linguistic differences are partially attributable to differences in dataset composition by age and gender. Using a latent variable model to induce age and gender, we show how these demographic variables interact with geography to affect language use. We also show that the accuracy of text-based geolocation varies with population demographics, giving the best results for men above the age of 40.Comment: final version for EMNLP 201

    Earth observations from DSCOVR EPIC instrument

    Full text link
    The National Oceanic and Atmospheric Administration (NOAA) Deep Space Climate Observatory (DSCOVR) spacecraft was launched on 11 February 2015 and in June 2015 achieved its orbit at the first Lagrange point (L1), 1.5 million km from Earth toward the sun. There are two National Aeronautics and Space Administration (NASA) Earth-observing instruments on board: the Earth Polychromatic Imaging Camera (EPIC) and the National Institute of Standards and Technology Advanced Radiometer (NISTAR). The purpose of this paper is to describe various capabilities of the DSCOVR EPIC instrument. EPIC views the entire sunlit Earth from sunrise to sunset at the backscattering direction (scattering angles between 168.5° and 175.5°) with 10 narrowband filters: 317, 325, 340, 388, 443, 552, 680, 688, 764, and 779 nm. We discuss a number of preprocessing steps necessary for EPIC calibration including the geolocation algorithm and the radiometric calibration for each wavelength channel in terms of EPIC counts per second for conversion to reflectance units. The principal EPIC products are total ozone (O3) amount, scene reflectivity, erythemal irradiance, ultraviolet (UV) aerosol properties, sulfur dioxide (SO2) for volcanic eruptions, surface spectral reflectance, vegetation properties, and cloud products including cloud height. Finally, we describe the observation of horizontally oriented ice crystals in clouds and the unexpected use of the O2 B-band absorption for vegetation properties.The NASA GSFC DSCOVR project is funded by NASA Earth Science Division. We gratefully acknowledge the work by S. Taylor and B. Fisher for help with the SO2 retrievals and Marshall Sutton, Carl Hostetter, and the EPIC NISTAR project for help with EPIC data. We also would like to thank the EPIC Cloud Algorithm team, especially Dr. Gala Wind, for the contribution to the EPIC cloud products. (NASA Earth Science Division)Accepted manuscrip

    The astrometric Gaia-FUN-SSO observation campaign of 99 942 Apophis

    Full text link
    Astrometric observations performed by the Gaia Follow-Up Network for Solar System Objects (Gaia-FUN-SSO) play a key role in ensuring that moving objects first detected by ESA's Gaia mission remain recoverable after their discovery. An observation campaign on the potentially hazardous asteroid (99 942) Apophis was conducted during the asteroid's latest period of visibility, from 12/21/2012 to 5/2/2013, to test the coordination and evaluate the overall performance of the Gaia-FUN-SSO . The 2732 high quality astrometric observations acquired during the Gaia-FUN-SSO campaign were reduced with the Platform for Reduction of Astronomical Images Automatically (PRAIA), using the USNO CCD Astrograph Catalogue 4 (UCAC4) as a reference. The astrometric reduction process and the precision of the newly obtained measurements are discussed. We compare the residuals of astrometric observations that we obtained using this reduction process to data sets that were individually reduced by observers and accepted by the Minor Planet Center. We obtained 2103 previously unpublished astrometric positions and provide these to the scientific community. Using these data we show that our reduction of this astrometric campaign with a reliable stellar catalog substantially improves the quality of the astrometric results. We present evidence that the new data will help to reduce the orbit uncertainty of Apophis during its close approach in 2029. We show that uncertainties due to geolocations of observing stations, as well as rounding of astrometric data can introduce an unnecessary degradation in the quality of the resulting astrometric positions. Finally, we discuss the impact of our campaign reduction on the recovery process of newly discovered asteroids.Comment: Accepted for publication in A&

    Passport: Enabling Accurate Country-Level Router Geolocation using Inaccurate Sources

    Full text link
    When does Internet traffic cross international borders? This question has major geopolitical, legal and social implications and is surprisingly difficult to answer. A critical stumbling block is a dearth of tools that accurately map routers traversed by Internet traffic to the countries in which they are located. This paper presents Passport: a new approach for efficient, accurate country-level router geolocation and a system that implements it. Passport provides location predictions with limited active measurements, using machine learning to combine information from IP geolocation databases, router hostnames, whois records, and ping measurements. We show that Passport substantially outperforms existing techniques, and identify cases where paths traverse countries with implications for security, privacy, and performance

    Passport: enabling accurate country-level router geolocation using inaccurate sources

    Full text link
    When does Internet traffic cross international borders? This question has major geopolitical, legal and social implications and is surprisingly difficult to answer. A critical stumbling block is a dearth of tools that accurately map routers traversed by Internet traffic to the countries in which they are located. This paper presents Passport: a new approach for efficient, accurate country-level router geolocation and a system that implements it. Passport provides location predictions with limited active measurements, using machine learning to combine information from IP geolocation databases, router hostnames, whois records, and ping measurements. We show that Passport substantially outperforms existing techniques, and identify cases where paths traverse countries with implications for security, privacy, and performance.First author draf

    Investigating Full-Waveform Lidar Data for Detection and Recognition of Vertical Objects

    Get PDF
    A recent innovation in commercially-available topographic lidar systems is the ability to record return waveforms at high sampling frequencies. These “full-waveform” systems provide up to two orders of magnitude more data than “discrete-return” systems. However, due to the relatively limited capabilities of current processing and analysis software, more data does not always translate into more or better information for object extraction applications. In this paper, we describe a new approach for exploiting full waveform data to improve detection and recognition of vertical objects, such as trees, poles, buildings, towers, and antennas. Each waveform is first deconvolved using an expectation-maximization (EM) algorithm to obtain a train of spikes in time, where each spike corresponds to an individual laser reflection. The output is then georeferenced to create extremely dense, detailed X,Y,Z,I point clouds, where I denotes intensity. A tunable parameter is used to control the number of spikes in the deconvolved waveform, and, hence, the point density of the output point cloud. Preliminary results indicate that the average number of points on vertical objects using this method is several times higher than using discrete-return lidar data. The next steps in this ongoing research will involve voxelizing the lidar point cloud to obtain a high-resolution volume of intensity values and computing a 3D wavelet representation. The final step will entail performing vertical object detection/recognition in the wavelet domain using a multiresolution template matching approach

    RIDDLE: Race and ethnicity Imputation from Disease history with Deep LEarning

    Full text link
    Anonymized electronic medical records are an increasingly popular source of research data. However, these datasets often lack race and ethnicity information. This creates problems for researchers modeling human disease, as race and ethnicity are powerful confounders for many health exposures and treatment outcomes; race and ethnicity are closely linked to population-specific genetic variation. We showed that deep neural networks generate more accurate estimates for missing racial and ethnic information than competing methods (e.g., logistic regression, random forest). RIDDLE yielded significantly better classification performance across all metrics that were considered: accuracy, cross-entropy loss (error), and area under the curve for receiver operating characteristic plots (all p<106p < 10^{-6}). We made specific efforts to interpret the trained neural network models to identify, quantify, and visualize medical features which are predictive of race and ethnicity. We used these characterizations of informative features to perform a systematic comparison of differential disease patterns by race and ethnicity. The fact that clinical histories are informative for imputing race and ethnicity could reflect (1) a skewed distribution of blue- and white-collar professions across racial and ethnic groups, (2) uneven accessibility and subjective importance of prophylactic health, (3) possible variation in lifestyle, such as dietary habits, and (4) differences in background genetic variation which predispose to diseases
    corecore