9,508 research outputs found

    Peer to Peer Information Retrieval: An Overview

    Get PDF
    Peer-to-peer technology is widely used for file sharing. In the past decade a number of prototype peer-to-peer information retrieval systems have been developed. Unfortunately, none of these have seen widespread real- world adoption and thus, in contrast with file sharing, information retrieval is still dominated by centralised solutions. In this paper we provide an overview of the key challenges for peer-to-peer information retrieval and the work done so far. We want to stimulate and inspire further research to overcome these challenges. This will open the door to the development and large-scale deployment of real-world peer-to-peer information retrieval systems that rival existing centralised client-server solutions in terms of scalability, performance, user satisfaction and freedom

    BlogForever D3.2: Interoperability Prospects

    Get PDF
    This report evaluates the interoperability prospects of the BlogForever platform. Therefore, existing interoperability models are reviewed, a Delphi study to identify crucial aspects for the interoperability of web archives and digital libraries is conducted, technical interoperability standards and protocols are reviewed regarding their relevance for BlogForever, a simple approach to consider interoperability in specific usage scenarios is proposed, and a tangible approach to develop a succession plan that would allow a reliable transfer of content from the current digital archive to other digital repositories is presented

    Empowering Patient Similarity Networks through Innovative Data-Quality-Aware Federated Profiling

    Get PDF
    Continuous monitoring of patients involves collecting and analyzing sensory data from a multitude of sources. To overcome communication overhead, ensure data privacy and security, reduce data loss, and maintain efficient resource usage, the processing and analytics are moved close to where the data are located (e.g., the edge). However, data quality (DQ) can be degraded because of imprecise or malfunctioning sensors, dynamic changes in the environment, transmission failures, or delays. Therefore, it is crucial to keep an eye on data quality and spot problems as quickly as possible, so that they do not mislead clinical judgments and lead to the wrong course of action. In this article, a novel approach called federated data quality profiling (FDQP) is proposed to assess the quality of the data at the edge. FDQP is inspired by federated learning (FL) and serves as a condensed document or a guide for node data quality assurance. The FDQP formal model is developed to capture the quality dimensions specified in the data quality profile (DQP). The proposed approach uses federated feature selection to improve classifier precision and rank features based on criteria such as feature value, outlier percentage, and missing data percentage. Extensive experimentation using a fetal dataset split into different edge nodes and a set of scenarios were carefully chosen to evaluate the proposed FDQP model. The results of the experiments demonstrated that the proposed FDQP approach positively improved the DQ, and thus, impacted the accuracy of the federated patient similarity network (FPSN)-based machine learning models. The proposed data-quality-aware federated PSN architecture leveraging FDQP model with data collected from edge nodes can effectively improve the data quality and accuracy of the federated patient similarity network (FPSN)-based machine learning models. Our profiling algorithm used lightweight profile exchange instead of full data processing at the edge, which resulted in optimal data quality achievement, thus improving efficiency. Overall, FDQP is an effective method for assessing data quality in the edge computing environment, and we believe that the proposed approach can be applied to other scenarios beyond patient monitoring

    Next Generation Cloud Computing: New Trends and Research Directions

    Get PDF
    The landscape of cloud computing has significantly changed over the last decade. Not only have more providers and service offerings crowded the space, but also cloud infrastructure that was traditionally limited to single provider data centers is now evolving. In this paper, we firstly discuss the changing cloud infrastructure and consider the use of infrastructure from multiple providers and the benefit of decentralising computing away from data centers. These trends have resulted in the need for a variety of new computing architectures that will be offered by future cloud infrastructure. These architectures are anticipated to impact areas, such as connecting people and devices, data-intensive computing, the service space and self-learning systems. Finally, we lay out a roadmap of challenges that will need to be addressed for realising the potential of next generation cloud systems.Comment: Accepted to Future Generation Computer Systems, 07 September 201

    Computationally intensive, distributed and decentralised machine learning: from theory to applications

    Get PDF
    Machine learning (ML) is currently one of the most important research fields, spanning computer science, statistics, pattern recognition, data mining, and predictive analytics. It plays a central role in automatic data processing and analysis in numerous research domains owing to widely distributed and geographically scattered data sources, powerful computing clouds, and high digitisation requirements. However, aspects such as the accuracy of methods, data privacy, and model explainability remain challenging and require additional research. Therefore, it is necessary to analyse centralised and distributed data processing architectures, and to create novel computationally intensive explainable and privacy-preserving ML methods, to investigate their properties, to propose distributed versions of prospective ML baseline methods, and to evaluate and apply these in various applications. This thesis addresses the theoretical and practical aspects of state-of-the-art ML methods. The contributions of this thesis are threefold. In Chapter 2, novel non-distributed, centralised, computationally intensive ML methods are proposed, their properties are investigated, and state-of-the-art ML methods are applied to real-world data from two domains, namely transportation and bioinformatics. Moreover, algorithms for ‘black-box’ model interpretability are presented. Decentralised ML methods are considered in Chapter 3. First, we investigate data processing as a preliminary step in data-driven, agent-based decision-making. Thereafter, we propose novel decentralised ML algorithms that are based on the collaboration of the local models of agents. Within this context, we consider various regression models. Finally, the explainability of multiagent decision-making is addressed. In Chapter 4, we investigate distributed centralised ML methods. We propose a distributed parallelisation algorithm for the semi-parametric and non-parametric regression types, and implement these in the computational environment and data structures of Apache SPARK. Scalability, speed-up, and goodness-of-fit experiments using real-world data demonstrate the excellent performance of the proposed methods. Moreover, the federated deep-learning approach enables us to address the data privacy challenges caused by processing of distributed private data sources to solve the travel-time prediction problem. Finally, we propose an explainability strategy to interpret the influence of the input variables on this federated deep-learning application. This thesis is based on the contribution made by 11 papers to the theoretical and practical aspects of state-of-the-art and proposed ML methods. We successfully address the stated challenges with various data processing architectures, validate the proposed approaches in diverse scenarios from the transportation and bioinformatics domains, and demonstrate their effectiveness in scalability, speed-up, and goodness-of-fit experiments with real-world data. However, substantial future research is required to address the stated challenges and to identify novel issues in ML. Thus, it is necessary to advance the theoretical part by creating novel ML methods and investigating their properties, as well as to contribute to the application part by using of the state-of-the-art ML methods and their combinations, and interpreting their results for different problem setting

    FedCSD: A Federated Learning Based Approach for Code-Smell Detection

    Full text link
    This paper proposes a Federated Learning Code Smell Detection (FedCSD) approach that allows organizations to collaboratively train federated ML models while preserving their data privacy. These assertions have been supported by three experiments that have significantly leveraged three manually validated datasets aimed at detecting and examining different code smell scenarios. In experiment 1, which was concerned with a centralized training experiment, dataset two achieved the lowest accuracy (92.30%) with fewer smells, while datasets one and three achieved the highest accuracy with a slight difference (98.90% and 99.5%, respectively). This was followed by experiment 2, which was concerned with cross-evaluation, where each ML model was trained using one dataset, which was then evaluated over the other two datasets. Results from this experiment show a significant drop in the model's accuracy (lowest accuracy: 63.80\%) where fewer smells exist in the training dataset, which has a noticeable reflection (technical debt) on the model's performance. Finally, the last and third experiments evaluate our approach by splitting the dataset into 10 companies. The ML model was trained on the company's site, then all model-updated weights were transferred to the server. Ultimately, an accuracy of 98.34% was achieved by the global model that has been trained using 10 companies for 100 training rounds. The results reveal a slight difference in the global model's accuracy compared to the highest accuracy of the centralized model, which can be ignored in favour of the global model's comprehensive knowledge, lower training cost, preservation of data privacy, and avoidance of the technical debt problem.Comment: 17 pages, 7 figures, Journal pape
    corecore