3,392 research outputs found

    Using Robust PCA to estimate regional characteristics of language use from geo-tagged Twitter messages

    Full text link
    Principal component analysis (PCA) and related techniques have been successfully employed in natural language processing. Text mining applications in the age of the online social media (OSM) face new challenges due to properties specific to these use cases (e.g. spelling issues specific to texts posted by users, the presence of spammers and bots, service announcements, etc.). In this paper, we employ a Robust PCA technique to separate typical outliers and highly localized topics from the low-dimensional structure present in language use in online social networks. Our focus is on identifying geospatial features among the messages posted by the users of the Twitter microblogging service. Using a dataset which consists of over 200 million geolocated tweets collected over the course of a year, we investigate whether the information present in word usage frequencies can be used to identify regional features of language use and topics of interest. Using the PCA pursuit method, we are able to identify important low-dimensional features, which constitute smoothly varying functions of the geographic location

    The “dirty dozen” of freshwater science: Detecting then reconciling hydrological data biases and errors

    Get PDF
    Sound water policy and management rests on sound hydrometeorological and ecological data. Conversely, unrepresentative, poorly collected or erroneously archived data introduces uncertainty regarding the magnitude, rate and direction of environmental change, in addition to undermining confidence in decision-making processes. Unfortunately, data biases and errors can enter the information flow at various stages, starting with site selection, instrumentation, sampling/ measurement procedures, post-processing and ending with archiving systems. Techniques such as visual inspection of raw data, graphical representation and comparison between sites, outlier and trend detection, and referral to metadata can all help uncover spurious data. Tell-tale signs of ambiguous and/or anomalous data are highlighted using 12 carefully chosen cases drawn mainly from hydrology (‘the dirty dozen’). These include evidence of changes in site or local conditions (due to land management, river regulation or urbanisation); modifications to instrumentation or inconsistent observer behaviour; mismatched or misrepresentative sampling in space and time; treatment of missing values, post-processing and data storage errors. As well as raising awareness of pitfalls, recommendations are provided for uncovering lapses in data quality after the information has been gathered. It is noted that error detection and attribution are more problematic for very large data sets, where observation networks are automated, or when various information sources have been combined. In these cases, more holistic indicators of data integrity are needed that reflect the overall information life-cycle and application(s) of the hydrological data

    Validation of ash/dust detections from SEVIRI data using ACTRIS/EARLINET ground-based LIDAR measurements

    Get PDF
    Twotailored configurations of the Robust Satellite Technique (RST) multi-temporal approach, for airborne volcanic ash and desert dust detection, have been tested in the framework of the European Natural Airborne Disaster Information and Coordination System for Aviation (EUNADICS-AV) project. The two algorithms, running on Spinning Enhanced Visible Infra-Red Imager (SEVIRI) data, were previously assessed over wide areas by comparison with independent satellite-based aerosol products. In this study, we present results of a first validation analysis of the above mentioned satellite-based ash/dust products using independent, ground-based observations coming from the European Aerosol Research Lidar Network (EARLINET). The aim is to assess the capabilities of RST-based ash/dust products in providing useful information even at local scale and to verify their applicability as a "trigger" to timely activate EARLINET measurements during airborne hazards. The intense Saharan dust event of May 18-23 2008-which affected both the Mediterranean Basin and Continental Europe-and the strong explosive eruptions of Eyjafjallajökull (Iceland) volcano of April-May 2010, were analyzed as test cases. Our results show that both RST-based algorithms were capable of providing reliable information about the investigated phenomena at specific sites of interest, successfully detecting airborne ash/dust in different geographic regions using both nighttime and daytime SEVIRI data. However, the validation analysis also demonstrates that ash/dust layers remain undetected by satellite in the presence of overlying meteorological clouds and when they are tenuous (i.e., with an integrated backscatter coefficient less than ~0.001 sr-1 and with aerosol backscatter coefficient less than ~1 × 10-6 m-1sr-1). This preliminary analysis confirms that the continuity of satellite-based observations can be used to timely "trigger" ground-based LIDAR measurements in case of airborne hazard events. Finally, this work confirms that advanced satellite-based detection schemes may provide a relevant contribution to the monitoring of ash/dust phenomena and that the synergistic use of (satellite-based) large scale, continuous and timely records with (ground-based) accurate and quantitative measurements may represent an added value, especially in operational scenarios
    corecore