178,750 research outputs found

    Investigating and developing efficient federated learning for air pollution monitoring

    Get PDF
    Location-based data may be considered highly private; as such, handling location-based data requires that it cannot be used to track a user. In a network of multiple edge devices that each collect data, training a machine learning model would typically involve transmitting the data securely to a central server which requires strict privacy rules. Federated learning solves the privacy problem by not requiring data to be shared; instead, training of a machine learning model is performed on the device that gathered the data itself. Using federated learning with the Federated Stochastic Gradient Descent (fedsgd) algorithm, a similar training performance is expected as training a machine learning model on a single server with data transmitted to it. Overall less bandwidth may be used for communication between edge devices and the server. However, a higher computational cost is seen due to having to perform model training on the edge device, which lowers the potential data points that can be processed each day given the lower computational performance of an edge device versus a high power server. Whilst only a single edge device may train the model at a time, a different federated learning algorithm may be used on the server to enable multiple to train simultaneousl

    Strategies for Replica Placement in Tree Networks

    Get PDF
    In this paper, we discuss and compare several policies to place replicas in tree networks, subject to server capacity and QoS constraints. The client requests are known beforehand, while the number and location of the servers are to be determined. The standard approach in the literature is to enforce that all requests of a client be served by the closest server in the tree. We introduce and study two new policies. In the first policy, all requests from a given client are still processed by the same server, but this server can be located anywhere in the path from the client to the root. In the second policy, the requests of a given client can be processed by multiple servers. One major contribution of this paper is to assess the impact of these new policies on the total replication cost. Another important goal is to assess the impact of server heterogeneity, both from a theoretical and a practical perspective. In this paper, we establish several new complexity results, and provide several efficient polynomial heuristics for NP-complete instances of the problem. These heuristics are compared to an absolute lower bound provided by the formulation of the problem in terms of the solution of an integer linear program

    Optimal algorithms and approximation algorithms for replica placement with distance constraints in tree networks

    Get PDF
    In this paper, we study the problem of replica placement in tree networks subject to server capacity and distance constraints. The client requests are known beforehand, while the number and location of the servers are to be determined. The Single policy enforces that all requests of a client are served by a single server in the tree, while in the Multiple policy, the requests of a given client can be processed by multiple servers, thus distributing the processing of requests over the platform. For the Single policy, we prove that all instances of the problem are NP-hard, and we propose approximation algorithms. The problem with the Multiple policy was known to be NP-hard with distance constraints, but we provide a polynomial time optimal algorithm to solve the problem in the particular case of binary trees when no request exceeds the server capacity

    A Multi-Label Classifier for Predicting the Subcellular Localization of Gram-Negative Bacterial Proteins with Both Single and Multiple Sites

    Get PDF
    Prediction of protein subcellular localization is a challenging problem, particularly when the system concerned contains both singleplex and multiplex proteins. In this paper, by introducing the “multi-label scale” and hybridizing the information of gene ontology with the sequential evolution information, a novel predictor called iLoc-Gneg is developed for predicting the subcellular localization of Gram-positive bacterial proteins with both single-location and multiple-location sites. For facilitating comparison, the same stringent benchmark dataset used to estimate the accuracy of Gneg-mPLoc was adopted to demonstrate the power of iLoc-Gneg. The dataset contains 1,392 Gram-negative bacterial proteins classified into the following eight locations: (1) cytoplasm, (2) extracellular, (3) fimbrium, (4) flagellum, (5) inner membrane, (6) nucleoid, (7) outer membrane, and (8) periplasm. Of the 1,392 proteins, 1,328 are each with only one subcellular location and the other 64 are each with two subcellular locations, but none of the proteins included has pairwise sequence identity to any other in a same subset (subcellular location). It was observed that the overall success rate by jackknife test on such a stringent benchmark dataset by iLoc-Gneg was over 91%, which is about 6% higher than that by Gneg-mPLoc. As a user-friendly web-server, iLoc-Gneg is freely accessible to the public at http://icpr.jci.edu.cn/bioinfo/iLoc-Gneg. Meanwhile, a step-by-step guide is provided on how to use the web-server to get the desired results. Furthermore, for the user's convenience, the iLoc-Gneg web-server also has the function to accept the batch job submission, which is not available in the existing version of Gneg-mPLoc web-server. It is anticipated that iLoc-Gneg may become a useful high throughput tool for Molecular Cell Biology, Proteomics, System Biology, and Drug Development

    Federated-Learning-Assisted Failure-Cause Identification in Microwave Networks

    Get PDF
    Machine Learning (ML) adoption for automated failure management is becoming pervasive in today's communication networks. However, ML-based failure management typically requires that monitoring data is exchanged between network devices, where data is collected, and centralized locations, e.g., servers in data centers, where data is processed. ML algorithms in this centralized location are then trained to learn mappings between collected data and desired outputs, e.g., whether a failure exists, its cause, location, etc. This paradigm poses several challenges to network operators in terms of privacy as well as in terms of computational and communication resource usage, as a massive amount of sensible failure data is transmitted over the network. To overcome such limitations, Federated Learning (FL) can be adopted, which consists of training multiple distributed ML models at multiple decentralized locations (called 'clients') using a limited amount of locally-collected data, and of sharing these trained models to a centralized location (called 'server'), where these models are aggregated and shared again with clients. FL reduces data exchange between clients and a server and improves algorithms' performance thanks to sharing knowledge among different domains (i.e., clients), leveraging different sources of local information in a collaborative environment. In this paper, we focus on applying FL to perform failure-cause identification in microwave networks. The problem is modeled as a multi-class ML classification problem with six pre-defined failure causes. Specifically, using real failure data from an operational microwave network composed of more than 10000 microwave links, we emulate a multi-operator scenario in which one operator has partial knowledge of failure causes during the training phase. Thanks to knowledge sharing, numerical results show that FL achieves up to 72% precision in identifying an unknown particular class concerning traditional ML (non- FL) approaches where training is performed without knowledge sharing

    Power-aware replica placement in tree networks with multiple servers per client

    Get PDF
    In this paper, we revisit the well-studied problem of replica placement in tree networks. Rather than minimizing the number of servers needed to serve all client requests, we aim at minimizing the total power consumed by these servers. In addition, we use the most general (and powerful) server assignment policy, where the requests of a client can be served by multiple servers located in the (unique) path from this client to the root of the tree. We consider multi-modal servers that can operate at a set of discrete speeds, using the dynamic voltage and frequency scaling (DVFS) technique. The optimization problem is to determine an optimal location of the servers in the tree, as well as the speed at which each server is operated. A major result is the NP-completeness of this problem, to be contrasted with the minimization of the number of servers, which has polynomial complexity. Another important contribution is the formulation of a Mixed Integer Linear Program (MILP) for the problem, together with the design of several polynomial-time heuristics. We assess the efficiency of these heuristics by simulation. For mid-size instances (up to 30 nodes in the tree), we evaluate their absolute performance by comparison with the optimal solution (obtained via the MILP). The most efficient heuristics provide satisfactory results, within 20% of the optimal solution

    A distributed hybrid index for processing continuous range queries over moving objects

    Get PDF
    Central to many location-based services is the problem of processing concurrent continuous range queries over a large scale of moving objects. Most relevant works to this problem mainly investigate the centralized search algorithms based on a single server for handling range queries. However, due to the limited resources of a single server, these algorithms hardly can deal with an ocean of objects and extensive concurrent queries. Moreover, these approaches usually suppose either objects or queries are static but seldom consider the scenario that objects and queries are both moving simultaneously, restricting the practicability of these approaches. To resolve the above issues, we propose a distributed hybrid index (DHI) that consists of a global grid index and extensive local VR-tree indexes. DHI is apt to be deployed on a cluster of servers, and owns a good scalability to maintain numerous moving objects and concurrent range queries. Based on DHI, we further design a distributed incremental search approach, which organizes multiple servers with a publish/subscribe mechanism to calculate and monitor the results for continuous range queries in a distributed pattern. Finally, we conduct extensive experiments to fully evaluate the performance of our paper.Peer ReviewedPostprint (author's final draft
    • 

    corecore