17,219 research outputs found

    OpenTED Browser: Insights into European Public Spendings

    Full text link
    We present the OpenTED browser, a Web application allowing to interactively browse public spending data related to public procurements in the European Union. The application relies on Open Data recently published by the European Commission and the Publications Office of the European Union, from which we imported a curated dataset of 4.2 million contract award notices spanning the period 2006-2015. The application is designed to easily filter notices and visualise relationships between public contracting authorities and private contractors. The simple design allows for example to quickly find information about who the biggest suppliers of local governments are, and the nature of the contracted goods and services. We believe the tool, which we make Open Source, is a valuable source of information for journalists, NGOs, analysts and citizens for getting information on public procurement data, from large scale trends to local municipal developments.Comment: ECML, PKDD, SoGood workshop 201

    What do they know about me? Contents and Concerns of Online Behavioral Profiles

    Full text link
    Data aggregators collect large amount of information about individual users and create detailed online behavioral profiles of individuals. Behavioral profiles benefit users by improving products and services. However, they have also raised concerns regarding user privacy, transparency of collection practices and accuracy of data in the profiles. To improve transparency, some companies are allowing users to access their behavioral profiles. In this work, we investigated behavioral profiles of users by utilizing these access mechanisms. Using in-person interviews (n=8), we analyzed the data shown in the profiles, elicited user concerns, and estimated accuracy of profiles. We confirmed our interview findings via an online survey (n=100). To assess the claim of improving transparency, we compared data shown in profiles with the data that companies have about users. More than 70% of the participants expressed concerns about collection of sensitive data such as credit and health information, level of detail and how their data may be used. We found a large gap between the data shown in profiles and the data possessed by companies. A large number of profiles were inaccurate with as much as 80% inaccuracy. We discuss implications for public policy management.Comment: in Ashwini Rao, Florian Schaub, and Norman Sadeh What do they know about me? Contents and Concerns of Online Behavioral Profiles (2014) ASE BigData/SocialInformatics/PASSAT/BioMedCom Conferenc

    Automatic Geotagging of Russian Web Sites

    Full text link
    The poster describes a fast, simple, yet accurate method to associate large amounts of web resources stored in a search engine database with geographic locations. The method uses location-by-IP data, domain names, and content-related features: ZIP and area codes. The novelty of the approach lies in building location-by-IP database by using continuous IP blocks method. Another contribution is domain name analysis. The method uses search engine infrastructure and makes it possible to effectively associate large amounts of search engine data with geography on a regular basis. Experiments ran on Yandex search engine index; evaluation has proved the efficacy of the approach.ACM Special Interest Group on Hypertext, Hypermedia, and We

    National Working Conditions Surveys in Europe: A Compilation

    Get PDF
    [Excerpt] Eurofound’s European Working Conditions Survey (EWCS) has been measuring working conditions across the European Union for the past 20 years. It is a unique instrument for better understanding the quality of work and employment and the factors influencing it. Eurofound is committed to improving further the quality of the EWCS and strengthening its relevance for Eurofound’s tripartite stakeholders. Some of the most important sources of information for the development of the EWCS questionnaire are the national surveys on working conditions. This compilation is a follow-up of a study of working conditions surveys commissioned by Eurofound in 2006 which covered both national and transnational working conditions surveys (Eurofound, 2007). The main goals of this inventory are to: update the background information on existing national working conditions surveys; create a source of basic information from national working conditions surveys related to methodologies, quality control procedures, fieldwork and findings; provide a practical resource for researchers, policymakers, social partners and others with a professional interest in working conditions

    One-Class Classification: Taxonomy of Study and Review of Techniques

    Full text link
    One-class classification (OCC) algorithms aim to build classification models when the negative class is either absent, poorly sampled or not well defined. This unique situation constrains the learning of efficient classifiers by defining class boundary just with the knowledge of positive class. The OCC problem has been considered and applied under many research themes, such as outlier/novelty detection and concept learning. In this paper we present a unified view of the general problem of OCC by presenting a taxonomy of study for OCC problems, which is based on the availability of training data, algorithms used and the application domains applied. We further delve into each of the categories of the proposed taxonomy and present a comprehensive literature review of the OCC algorithms, techniques and methodologies with a focus on their significance, limitations and applications. We conclude our paper by discussing some open research problems in the field of OCC and present our vision for future research.Comment: 24 pages + 11 pages of references, 8 figure

    Initiating informatics and GIS support for a field investigation of Bioterrorism: The New Jersey anthrax experience

    Get PDF
    BACKGROUND: The investigation of potential exposure to anthrax spores in a Trenton, New Jersey, mail-processing facility required rapid assessment of informatics needs and adaptation of existing informatics tools to new physical and information-processing environments. Because the affected building and its computers were closed down, data to list potentially exposed persons and map building floor plans were unavailable from the primary source. RESULTS: Controlling the effects of anthrax contamination required identification and follow-up of potentially exposed persons. Risk of exposure had to be estimated from the geographic relationship between work history and environmental sample sites within the contaminated facility. To assist in establishing geographic relationships, floor plan maps of the postal facility were constructed in ArcView Geographic Information System (GIS) software and linked to a database of personnel and visitors using Epi Info and Epi Map 2000. A repository for maintaining the latest versions of various documents was set up using Web page hyperlinks. CONCLUSIONS: During public health emergencies, such as bioterrorist attacks and disease epidemics, computerized information systems for data management, analysis, and communication may be needed within hours of beginning the investigation. Available sources of data and output requirements of the system may be changed frequently during the course of the investigation. Integrating data from a variety of sources may require entering or importing data from a variety of digital and paper formats. Spatial representation of data is particularly valuable for assessing environmental exposure. Written documents, guidelines, and memos important to the epidemic were frequently revised. In this investigation, a database was operational on the second day and the GIS component during the second week of the investigation

    Historical collaborative geocoding

    Full text link
    The latest developments in digital have provided large data sets that can increasingly easily be accessed and used. These data sets often contain indirect localisation information, such as historical addresses. Historical geocoding is the process of transforming the indirect localisation information to direct localisation that can be placed on a map, which enables spatial analysis and cross-referencing. Many efficient geocoders exist for current addresses, but they do not deal with the temporal aspect and are based on a strict hierarchy (..., city, street, house number) that is hard or impossible to use with historical data. Indeed historical data are full of uncertainties (temporal aspect, semantic aspect, spatial precision, confidence in historical source, ...) that can not be resolved, as there is no way to go back in time to check. We propose an open source, open data, extensible solution for geocoding that is based on the building of gazetteers composed of geohistorical objects extracted from historical topographical maps. Once the gazetteers are available, geocoding an historical address is a matter of finding the geohistorical object in the gazetteers that is the best match to the historical address. The matching criteriae are customisable and include several dimensions (fuzzy semantic, fuzzy temporal, scale, spatial precision ...). As the goal is to facilitate historical work, we also propose web-based user interfaces that help geocode (one address or batch mode) and display over current or historical topographical maps, so that they can be checked and collaboratively edited. The system is tested on Paris city for the 19-20th centuries, shows high returns rate and is fast enough to be used interactively.Comment: WORKING PAPE

    Email Analysis and Information Extraction for Enterprise Benefit

    Get PDF
    In spite of rapid advances in multimedia and interactive technologies, enterprise users prefer to battle with email spam and overload rather than lose the benefits of communicating, collaborating and solving business tasks over email. Many aspects of email have significantly improved over time, but its overall integration with the enterprise environment remained practically the same. In this paper we describe and evaluate a light-weight approach to enterprise email communication analysis and information extraction. We provide several use cases exploiting the extracted information, such as the enrichment of emails with relevant contextual information, social network extraction and its subsequent search, creation of semantic objects as well as the relationship between email analysis and information extraction on one hand, and email protocols and email servers on the other. The proposed approach was partially tested on several small and medium enterprises (SMEs) and seems to be promising for enterprise interoperability and collaboration in SMEs that depend on emails to accomplish their daily business tasks
    corecore