220,827 research outputs found

    Distributed Holistic Clustering on Linked Data

    Full text link
    Link discovery is an active field of research to support data integration in the Web of Data. Due to the huge size and number of available data sources, efficient and effective link discovery is a very challenging task. Common pairwise link discovery approaches do not scale to many sources with very large entity sets. We here propose a distributed holistic approach to link many data sources based on a clustering of entities that represent the same real-world object. Our clustering approach provides a compact and fused representation of entities, and can identify errors in existing links as well as many new links. We support a distributed execution of the clustering approach to achieve faster execution times and scalability for large real-world data sets. We provide a novel gold standard for multi-source clustering, and evaluate our methods with respect to effectiveness and efficiency for large data sets from the geographic and music domains

    Machine Learning in Wireless Sensor Networks: Algorithms, Strategies, and Applications

    Get PDF
    Wireless sensor networks monitor dynamic environments that change rapidly over time. This dynamic behavior is either caused by external factors or initiated by the system designers themselves. To adapt to such conditions, sensor networks often adopt machine learning techniques to eliminate the need for unnecessary redesign. Machine learning also inspires many practical solutions that maximize resource utilization and prolong the lifespan of the network. In this paper, we present an extensive literature review over the period 2002-2013 of machine learning methods that were used to address common issues in wireless sensor networks (WSNs). The advantages and disadvantages of each proposed algorithm are evaluated against the corresponding problem. We also provide a comparative guide to aid WSN designers in developing suitable machine learning solutions for their specific application challenges.Comment: Accepted for publication in IEEE Communications Surveys and Tutorial

    Geotagging One Hundred Million Twitter Accounts with Total Variation Minimization

    Full text link
    Geographically annotated social media is extremely valuable for modern information retrieval. However, when researchers can only access publicly-visible data, one quickly finds that social media users rarely publish location information. In this work, we provide a method which can geolocate the overwhelming majority of active Twitter users, independent of their location sharing preferences, using only publicly-visible Twitter data. Our method infers an unknown user's location by examining their friend's locations. We frame the geotagging problem as an optimization over a social network with a total variation-based objective and provide a scalable and distributed algorithm for its solution. Furthermore, we show how a robust estimate of the geographic dispersion of each user's ego network can be used as a per-user accuracy measure which is effective at removing outlying errors. Leave-many-out evaluation shows that our method is able to infer location for 101,846,236 Twitter users at a median error of 6.38 km, allowing us to geotag over 80\% of public tweets.Comment: 9 pages, 8 figures, accepted to IEEE BigData 2014, Compton, Ryan, David Jurgens, and David Allen. "Geotagging one hundred million twitter accounts with total variation minimization." Big Data (Big Data), 2014 IEEE International Conference on. IEEE, 201

    MSUO Information Technology and Geographical Information Systems: Common Protocols & Procedures. Report to the Marine Safety Umbrella Operation

    Get PDF
    The Marine Safety Umbrella Operation (MSUO) facilitates the cooperation between Interreg funded Marine Safety Projects and maritime stakeholders. The main aim of MSUO is to permit efficient operation of new projects through Project Cooperation Initiatives, these include the review of the common protocols and procedures for Information Technology (IT) and Geographical Information Systems (GIS). This study carried out by CSA Group and the National Centre for Geocomputation (NCG) reviews current spatial information standards in Europe and the data management methodologies associated with different marine safety projects. International best practice was reviewed based on the combined experience of spatial data research at NCG and initiatives in the US, Canada and the UK relating to marine security service information and acquisition and integration of large marine datasets for ocean management purposes. This report identifies the most appropriate international data management practices that could be adopted for future MSUO projects

    BCAS: A Web-enabled and GIS-based Decision Support System for the Diagnosis and Treatment of Breast Cancer

    Get PDF
    For decades, geographical variations in cancer rates have been observed but the precise determinants of such geographic differences in breast cancer development are unclear. Various statistical models have been proposed. Applications of these models, however, require that the data be assembled from a variety of sources, converted into the statistical models’ parameters and delivered effectively to researchers and policy makers. A web-enabled and GIS-based system can be developed to provide the needed functionality. This article overviews the conceptual web-enabled and GIS-based system (BCAS), illustrates the system’s use in diagnosing and treating breast cancer and examines the potential benefits and implications for breast cancer research and practice

    Gossip Algorithms for Distributed Signal Processing

    Full text link
    Gossip algorithms are attractive for in-network processing in sensor networks because they do not require any specialized routing, there is no bottleneck or single point of failure, and they are robust to unreliable wireless network conditions. Recently, there has been a surge of activity in the computer science, control, signal processing, and information theory communities, developing faster and more robust gossip algorithms and deriving theoretical performance guarantees. This article presents an overview of recent work in the area. We describe convergence rate results, which are related to the number of transmitted messages and thus the amount of energy consumed in the network for gossiping. We discuss issues related to gossiping over wireless links, including the effects of quantization and noise, and we illustrate the use of gossip algorithms for canonical signal processing tasks including distributed estimation, source localization, and compression.Comment: Submitted to Proceedings of the IEEE, 29 page
    • …
    corecore