339 research outputs found

    WikiSensing: A collaborative sensor management system with trust assessment for big data

    Get PDF
    Big Data for sensor networks and collaborative systems have become ever more important in the digital economy and is a focal point of technological interest while posing many noteworthy challenges. This research addresses some of the challenges in the areas of online collaboration and Big Data for sensor networks. This research demonstrates WikiSensing (www.wikisensing.org), a high performance, heterogeneous, collaborative data cloud for managing and analysis of real-time sensor data. The system is based on the Big Data architecture with comprehensive functionalities for smart city sensor data integration and analysis. The system is fully functional and served as the main data management platform for the 2013 UPLondon Hackathon. This system is unique as it introduced a novel methodology that incorporates online collaboration with sensor data. While there are other platforms available for sensor data management WikiSensing is one of the first platforms that enable online collaboration by providing services to store and query dynamic sensor information without any restriction of the type and format of sensor data. An emerging challenge of collaborative sensor systems is modelling and assessing the trustworthiness of sensors and their measurements. This is with direct relevance to WikiSensing as an open collaborative sensor data management system. Thus if the trustworthiness of the sensor data can be accurately assessed, WikiSensing will be more than just a collaborative data management system for sensor but also a platform that provides information to the users on the validity of its data. Hence this research presents a new generic framework for capturing and analysing sensor trustworthiness considering the different forms of evidence available to the user. It uses an extensible set of metrics that can represent such evidence and use Bayesian analysis to develop a trust classification model. Based on this work there are several publications and others are at the final stage of submission. Further improvement is also planned to make the platform serve as a cloud service accessible to any online user to build up a community of collaborators for smart city research.Open Acces

    DTKI: a new formalized PKI with no trusted parties

    Get PDF
    The security of public key validation protocols for web-based applications has recently attracted attention because of weaknesses in the certificate authority model, and consequent attacks. Recent proposals using public logs have succeeded in making certificate management more transparent and verifiable. However, those proposals involve a fixed set of authorities. This means an oligopoly is created. Another problem with current log-based system is their heavy reliance on trusted parties that monitor the logs. We propose a distributed transparent key infrastructure (DTKI), which greatly reduces the oligopoly of service providers and allows verification of the behaviour of trusted parties. In addition, this paper formalises the public log data structure and provides a formal analysis of the security that DTKI guarantees.Comment: 19 page

    Semi-automatic matching of semi-structured data updates

    Get PDF
    Includes bibliographical references.Data matching, also referred to as data linkage or field matching, is a technique used to combine multiple data sources into one data set. Data matching is used for data integration in a number of sectors and industries; from politics and health care to scientific applications. The motivation for this study was the observation of the day-to-day struggles of a large non-governmental organisation (NGO) in managing their membership database. With a membership base of close to 2.4 million, the challenges they face with regard to the capturing and processing of the semi-structured membership updates are monumental. Updates arrive from the field in a multitude of formats, often incomplete and unstructured, and expert knowledge is geographically localised. These issues are compounded by an extremely complex organisational hierarchy and a general lack of data validation processes. An online system was proposed for pre-processing input and then matching it against the membership database. Termed the Data Pre-Processing and Matching System (DPPMS), it allows for single or bulk updates. Based on the success of the DPPMS with the NGO’s membership database, it was subsequently used for pre-processing and data matching of semi-structured patient and financial customer data. Using the semi-automated DPPMS rather than a clerical data matching system, true positive matches increased by 21% while false negative matches decreased by 20%. The Recall, Precision and F-Measure values all improved and the risk of false positives diminished. The DPPMS was unable to match approximately 8% of provided records; this was largely due to human error during initial data capture. While the DPPMS greatly diminished the reliance on experts, their role remained pivotal during the final stage of the process

    Pitfalls in Language Models for Code Intelligence: A Taxonomy and Survey

    Full text link
    Modern language models (LMs) have been successfully employed in source code generation and understanding, leading to a significant increase in research focused on learning-based code intelligence, such as automated bug repair, and test case generation. Despite their great potential, language models for code intelligence (LM4Code) are susceptible to potential pitfalls, which hinder realistic performance and further impact their reliability and applicability in real-world deployment. Such challenges drive the need for a comprehensive understanding - not just identifying these issues but delving into their possible implications and existing solutions to build more reliable language models tailored to code intelligence. Based on a well-defined systematic research approach, we conducted an extensive literature review to uncover the pitfalls inherent in LM4Code. Finally, 67 primary studies from top-tier venues have been identified. After carefully examining these studies, we designed a taxonomy of pitfalls in LM4Code research and conducted a systematic study to summarize the issues, implications, current solutions, and challenges of different pitfalls for LM4Code systems. We developed a comprehensive classification scheme that dissects pitfalls across four crucial aspects: data collection and labeling, system design and learning, performance evaluation, and deployment and maintenance. Through this study, we aim to provide a roadmap for researchers and practitioners, facilitating their understanding and utilization of LM4Code in reliable and trustworthy ways

    Graph based Anomaly Detection and Description: A Survey

    Get PDF
    Detecting anomalies in data is a vital task, with numerous high-impact applications in areas such as security, finance, health care, and law enforcement. While numerous techniques have been developed in past years for spotting outliers and anomalies in unstructured collections of multi-dimensional points, with graph data becoming ubiquitous, techniques for structured graph data have been of focus recently. As objects in graphs have long-range correlations, a suite of novel technology has been developed for anomaly detection in graph data. This survey aims to provide a general, comprehensive, and structured overview of the state-of-the-art methods for anomaly detection in data represented as graphs. As a key contribution, we give a general framework for the algorithms categorized under various settings: unsupervised vs. (semi-)supervised approaches, for static vs. dynamic graphs, for attributed vs. plain graphs. We highlight the effectiveness, scalability, generality, and robustness aspects of the methods. What is more, we stress the importance of anomaly attribution and highlight the major techniques that facilitate digging out the root cause, or the ‘why’, of the detected anomalies for further analysis and sense-making. Finally, we present several real-world applications of graph-based anomaly detection in diverse domains, including financial, auction, computer traffic, and social networks. We conclude our survey with a discussion on open theoretical and practical challenges in the field

    Impact and key challenges of insider threats on organizations and critical businesses

    Get PDF
    The insider threat has consistently been identified as a key threat to organizations and governments. Understanding the nature of insider threats and the related threat landscape can help in forming mitigation strategies, including non-technical means. In this paper, we survey and highlight challenges associated with the identification and detection of insider threats in both public and private sector organizations, especially those part of a nation’s critical infrastructure. We explore the utility of the cyber kill chain to understand insider threats, as well as understanding the underpinning human behavior and psychological factors. The existing defense techniques are discussed and critically analyzed, and improvements are suggested, in line with the current state-of-the-art cyber security requirements. Finally, open problems related to the insider threat are identified and future research directions are discussed

    Data management and use: case studies of technologies and governance

    Get PDF

    Development of a supervisory internet of things (IoT) system for factories of the future

    Full text link
    Big data is of great importance to stakeholders, including manufacturers, business partners, consumers, government. It leads to many benefits, including improving productivity and reducing the cost of products by using digitalised automation equipment and manufacturing information systems. Some other benefits include using social media to build the agile cooperation between suppliers and retailers, product designers and production engineers, timely tracking customers’ feedbacks, reducing environmental impacts by using Internet of Things (IoT) sensors to monitor energy consumption and noise level. However, manufacturing big data integration has been neglected. Many open-source big data software provides complicated capabilities to manage big data software for various data-driven applications for manufacturing. In this research, a manufacturing big data integration system, named as Data Control Module (DCM) has been designed and developed. The system can securely integrate data silos from various manufacturing systems and control the data for different manufacturing applications. Firstly, the architecture of manufacturing big data system has been proposed, including three parts: manufacturing data source, manufacturing big data ecosystem and manufacturing applications. Secondly, nine essential components have been identified in the big data ecosystem to build various manufacturing big data solutions. Thirdly, a conceptual framework is proposed based on the big data ecosystem for the aim of DCM. Moreover, the DCM has been designed and developed with the selected big data software to integrate all the three varieties of manufacturing data, including non-structured, semi-structured and structured. The DCM has been validated on three general manufacturing domains, including product design and development, production and business. The DCM cannot only be used for the legacy manufacturing software but may also be used in emerging areas such as digital twin and digital thread. The limitations of DCM have been analysed, and further research directions have also been discussed
    • …
    corecore