169,996 research outputs found

    Approaches to address the Data Skew Problem in Federated Learning

    Get PDF
    A Federated Learning approach consists of creating an AI model from multiple data sources, without moving large amounts of data across to a central environment. Federated learning can be very useful in a tactical coalition environment, where data can be collected individually by each of the coalition partners, but network connectivity is inadequate to move the data to a central environment. However, such data collected is often dirty and imperfect. The data can be imbalanced, and in some cases, some classes can be completely missing from some coalition partners. Under these conditions, traditional approaches for federated learning can result in models that are highly inaccurate. In this paper, we propose approaches that can result in good machine learning models even in the environments where the data may be highly skewed, and study their performance under different environments

    Passport: enabling accurate country-level router geolocation using inaccurate sources

    Full text link
    When does Internet traffic cross international borders? This question has major geopolitical, legal and social implications and is surprisingly difficult to answer. A critical stumbling block is a dearth of tools that accurately map routers traversed by Internet traffic to the countries in which they are located. This paper presents Passport: a new approach for efficient, accurate country-level router geolocation and a system that implements it. Passport provides location predictions with limited active measurements, using machine learning to combine information from IP geolocation databases, router hostnames, whois records, and ping measurements. We show that Passport substantially outperforms existing techniques, and identify cases where paths traverse countries with implications for security, privacy, and performance.First author draf

    Passport: Enabling Accurate Country-Level Router Geolocation using Inaccurate Sources

    Full text link
    When does Internet traffic cross international borders? This question has major geopolitical, legal and social implications and is surprisingly difficult to answer. A critical stumbling block is a dearth of tools that accurately map routers traversed by Internet traffic to the countries in which they are located. This paper presents Passport: a new approach for efficient, accurate country-level router geolocation and a system that implements it. Passport provides location predictions with limited active measurements, using machine learning to combine information from IP geolocation databases, router hostnames, whois records, and ping measurements. We show that Passport substantially outperforms existing techniques, and identify cases where paths traverse countries with implications for security, privacy, and performance

    Transfer Learning for Improving Model Predictions in Highly Configurable Software

    Full text link
    Modern software systems are built to be used in dynamic environments using configuration capabilities to adapt to changes and external uncertainties. In a self-adaptation context, we are often interested in reasoning about the performance of the systems under different configurations. Usually, we learn a black-box model based on real measurements to predict the performance of the system given a specific configuration. However, as modern systems become more complex, there are many configuration parameters that may interact and we end up learning an exponentially large configuration space. Naturally, this does not scale when relying on real measurements in the actual changing environment. We propose a different solution: Instead of taking the measurements from the real system, we learn the model using samples from other sources, such as simulators that approximate performance of the real system at low cost. We define a cost model that transform the traditional view of model learning into a multi-objective problem that not only takes into account model accuracy but also measurements effort as well. We evaluate our cost-aware transfer learning solution using real-world configurable software including (i) a robotic system, (ii) 3 different stream processing applications, and (iii) a NoSQL database system. The experimental results demonstrate that our approach can achieve (a) a high prediction accuracy, as well as (b) a high model reliability.Comment: To be published in the proceedings of the 12th International Symposium on Software Engineering for Adaptive and Self-Managing Systems (SEAMS'17

    Pushing towards the Limit of Sampling Rate: Adaptive Chasing Sampling

    Full text link
    Measurement samples are often taken in various monitoring applications. To reduce the sensing cost, it is desirable to achieve better sensing quality while using fewer samples. Compressive Sensing (CS) technique finds its role when the signal to be sampled meets certain sparsity requirements. In this paper we investigate the possibility and basic techniques that could further reduce the number of samples involved in conventional CS theory by exploiting learning-based non-uniform adaptive sampling. Based on a typical signal sensing application, we illustrate and evaluate the performance of two of our algorithms, Individual Chasing and Centroid Chasing, for signals of different distribution features. Our proposed learning-based adaptive sampling schemes complement existing efforts in CS fields and do not depend on any specific signal reconstruction technique. Compared to conventional sparse sampling methods, the simulation results demonstrate that our algorithms allow 46%46\% less number of samples for accurate signal reconstruction and achieve up to 57%57\% smaller signal reconstruction error under the same noise condition.Comment: 9 pages, IEEE MASS 201
    • …
    corecore