1,462 research outputs found

    In Search of netUnicorn: A Data-Collection Platform to Develop Generalizable ML Models for Network Security Problems

    Full text link
    The remarkable success of the use of machine learning-based solutions for network security problems has been impeded by the developed ML models' inability to maintain efficacy when used in different network environments exhibiting different network behaviors. This issue is commonly referred to as the generalizability problem of ML models. The community has recognized the critical role that training datasets play in this context and has developed various techniques to improve dataset curation to overcome this problem. Unfortunately, these methods are generally ill-suited or even counterproductive in the network security domain, where they often result in unrealistic or poor-quality datasets. To address this issue, we propose an augmented ML pipeline that leverages explainable ML tools to guide the network data collection in an iterative fashion. To ensure the data's realism and quality, we require that the new datasets should be endogenously collected in this iterative process, thus advocating for a gradual removal of data-related problems to improve model generalizability. To realize this capability, we develop a data-collection platform, netUnicorn, that takes inspiration from the classic "hourglass" model and is implemented as its "thin waist" to simplify data collection for different learning problems from diverse network environments. The proposed system decouples data-collection intents from the deployment mechanisms and disaggregates these high-level intents into smaller reusable, self-contained tasks. We demonstrate how netUnicorn simplifies collecting data for different learning problems from multiple network environments and how the proposed iterative data collection improves a model's generalizability

    Leveraging Resources on Anonymous Mobile Edge Nodes

    Get PDF
    Smart devices have become an essential component in the life of mankind. The quick rise of smartphones, IoTs, and wearable devices enabled applications that were not possible few years ago, e.g., health monitoring and online banking. Meanwhile, smart sensing laid the infrastructure for smart homes and smart cities. The intrusive nature of smart devices granted access to huge amounts of raw data. Researchers seized the moment with complex algorithms and data models to process the data over the cloud and extract as much information as possible. However, the pace and amount of data generation, in addition to, networking protocols transmitting data to cloud servers failed short in touching more than 20% of what was generated on the edge of the network. On the other hand, smart devices carry a large set of resources, e.g., CPU, memory, and camera, that sit idle most of the time. Studies showed that for plenty of the time resources are either idle, e.g., sleeping and eating, or underutilized, e.g. inertial sensors during phone calls. These findings articulate a problem in processing large data sets, while having idle resources in the close proximity. In this dissertation, we propose harvesting underutilized edge resources then use them in processing the huge data generated, and currently wasted, through applications running at the edge of the network. We propose flipping the concept of cloud computing, instead of sending massive amounts of data for processing over the cloud, we distribute lightweight applications to process data on users\u27 smart devices. We envision this approach to enhance the network\u27s bandwidth, grant access to larger datasets, provide low latency responses, and more importantly involve up-to-date user\u27s contextual information in processing. However, such benefits come with a set of challenges: How to locate suitable resources? How to match resources with data providers? How to inform resources what to do? and When? How to orchestrate applications\u27 execution on multiple devices? and How to communicate between devices on the edge? Communication between devices at the edge has different parameters in terms of device mobility, topology, and data rate. Standard protocols, e.g., Wi-Fi or Bluetooth, were not designed for edge computing, hence, does not offer a perfect match. Edge computing requires a lightweight protocol that provides quick device discovery, decent data rate, and multicasting to devices in the proximity. Bluetooth features wide acceptance within the IoT community, however, the low data rate and unicast communication limits its use on the edge. Despite being the most suitable communication protocol for edge computing and unlike other protocols, Bluetooth has a closed source code that blocks lower layer in front of all forms of research study, enhancement, and customization. Hence, we offer an open source version of Bluetooth and then customize it for edge computing applications. In this dissertation, we propose Leveraging Resources on Anonymous Mobile Edge Nodes (LAMEN), a three-tier framework where edge devices are clustered by proximities. On having an application to execute, LAMEN clusters discover and allocate resources, share application\u27s executable with resources, and estimate incentives for each participating resource. In a cluster, a single head node, i.e., mediator, is responsible for resource discovery and allocation. Mediators orchestrate cluster resources and present them as a virtually large homogeneous resource. For example, two devices each offering either a camera or a speaker are presented outside the cluster as a single device with both camera and speaker, this can be extended to any combination of resources. Then, mediator handles applications\u27 distribution within a cluster as needed. Also, we provide a communication protocol that is customizable to the edge environment and application\u27s need. Pushing lightweight applications that end devices can execute over their locally generated data have the following benefits: First, avoid sharing user data with cloud server, which is a privacy concern for many of them; Second, introduce mediators as a local cloud controller closer to the edge; Third, hide the user\u27s identity behind mediators; and Finally, enhance bandwidth utilization by keeping raw data at the edge and transmitting processed information. Our evaluation shows an optimized resource lookup and application assignment schemes. In addition to, scalability in handling networks with large number of devices. In order to overcome the communication challenges, we provide an open source communication protocol that we customize for edge computing applications, however, it can be used beyond the scope of LAMEN. Finally, we present three applications to show how LAMEN enables various application domains on the edge of the network. In summary, we propose a framework to orchestrate underutilized resources at the edge of the network towards processing data that are generated in their proximity. Using the approaches explained later in the dissertation, we show how LAMEN enhances the performance of applications and enables a new set of applications that were not feasible

    Web Data Extraction, Applications and Techniques: A Survey

    Full text link
    Web Data Extraction is an important problem that has been studied by means of different scientific tools and in a broad range of applications. Many approaches to extracting data from the Web have been designed to solve specific problems and operate in ad-hoc domains. Other approaches, instead, heavily reuse techniques and algorithms developed in the field of Information Extraction. This survey aims at providing a structured and comprehensive overview of the literature in the field of Web Data Extraction. We provided a simple classification framework in which existing Web Data Extraction applications are grouped into two main classes, namely applications at the Enterprise level and at the Social Web level. At the Enterprise level, Web Data Extraction techniques emerge as a key tool to perform data analysis in Business and Competitive Intelligence systems as well as for business process re-engineering. At the Social Web level, Web Data Extraction techniques allow to gather a large amount of structured data continuously generated and disseminated by Web 2.0, Social Media and Online Social Network users and this offers unprecedented opportunities to analyze human behavior at a very large scale. We discuss also the potential of cross-fertilization, i.e., on the possibility of re-using Web Data Extraction techniques originally designed to work in a given domain, in other domains.Comment: Knowledge-based System

    Wireless and Mobile Networks: Security and Privacy Issues

    Get PDF

    Process-Driven and Flow-Based Processing of Industrial Sensor Data

    Get PDF
    For machine manufacturing companies, besides the production of high quality and reliable machines, requirements have emerged to maintain machine-related aspects through digital services. The development of such services in the field of the Industrial Internet of Things (IIoT) is dealing with solutions such as effective condition monitoring and predictive maintenance. However, appropriate data sources are needed on which digital services can be technically based. As many powerful and cheap sensors have been introduced over the last years, their integration into complex machines is promising for developing digital services for various scenarios. It is apparent that for components handling recorded data of these sensors they must usually deal with large amounts of data. In particular, the labeling of raw sensor data must be furthered by a technical solution. To deal with these data handling challenges in a generic way, a sensor processing pipeline (SPP) was developed, which provides effective methods to capture, process, store, and visualize raw sensor data based on a processing chain. Based on the example of a machine manufacturing company, the SPP approach is presented in this work. For the company involved, the approach has revealed promising results

    Edge/Fog Computing Technologies for IoT Infrastructure

    Get PDF
    The prevalence of smart devices and cloud computing has led to an explosion in the amount of data generated by IoT devices. Moreover, emerging IoT applications, such as augmented and virtual reality (AR/VR), intelligent transportation systems, and smart factories require ultra-low latency for data communication and processing. Fog/edge computing is a new computing paradigm where fully distributed fog/edge nodes located nearby end devices provide computing resources. By analyzing, filtering, and processing at local fog/edge resources instead of transferring tremendous data to the centralized cloud servers, fog/edge computing can reduce the processing delay and network traffic significantly. With these advantages, fog/edge computing is expected to be one of the key enabling technologies for building the IoT infrastructure. Aiming to explore the recent research and development on fog/edge computing technologies for building an IoT infrastructure, this book collected 10 articles. The selected articles cover diverse topics such as resource management, service provisioning, task offloading and scheduling, container orchestration, and security on edge/fog computing infrastructure, which can help to grasp recent trends, as well as state-of-the-art algorithms of fog/edge computing technologies

    brainlife.io: A decentralized and open source cloud platform to support neuroscience research

    Full text link
    Neuroscience research has expanded dramatically over the past 30 years by advancing standardization and tool development to support rigor and transparency. Consequently, the complexity of the data pipeline has also increased, hindering access to FAIR data analysis to portions of the worldwide research community. brainlife.io was developed to reduce these burdens and democratize modern neuroscience research across institutions and career levels. Using community software and hardware infrastructure, the platform provides open-source data standardization, management, visualization, and processing and simplifies the data pipeline. brainlife.io automatically tracks the provenance history of thousands of data objects, supporting simplicity, efficiency, and transparency in neuroscience research. Here brainlife.io's technology and data services are described and evaluated for validity, reliability, reproducibility, replicability, and scientific utility. Using data from 4 modalities and 3,200 participants, we demonstrate that brainlife.io's services produce outputs that adhere to best practices in modern neuroscience research

    Empowering Patient Similarity Networks through Innovative Data-Quality-Aware Federated Profiling

    Get PDF
    Continuous monitoring of patients involves collecting and analyzing sensory data from a multitude of sources. To overcome communication overhead, ensure data privacy and security, reduce data loss, and maintain efficient resource usage, the processing and analytics are moved close to where the data are located (e.g., the edge). However, data quality (DQ) can be degraded because of imprecise or malfunctioning sensors, dynamic changes in the environment, transmission failures, or delays. Therefore, it is crucial to keep an eye on data quality and spot problems as quickly as possible, so that they do not mislead clinical judgments and lead to the wrong course of action. In this article, a novel approach called federated data quality profiling (FDQP) is proposed to assess the quality of the data at the edge. FDQP is inspired by federated learning (FL) and serves as a condensed document or a guide for node data quality assurance. The FDQP formal model is developed to capture the quality dimensions specified in the data quality profile (DQP). The proposed approach uses federated feature selection to improve classifier precision and rank features based on criteria such as feature value, outlier percentage, and missing data percentage. Extensive experimentation using a fetal dataset split into different edge nodes and a set of scenarios were carefully chosen to evaluate the proposed FDQP model. The results of the experiments demonstrated that the proposed FDQP approach positively improved the DQ, and thus, impacted the accuracy of the federated patient similarity network (FPSN)-based machine learning models. The proposed data-quality-aware federated PSN architecture leveraging FDQP model with data collected from edge nodes can effectively improve the data quality and accuracy of the federated patient similarity network (FPSN)-based machine learning models. Our profiling algorithm used lightweight profile exchange instead of full data processing at the edge, which resulted in optimal data quality achievement, thus improving efficiency. Overall, FDQP is an effective method for assessing data quality in the edge computing environment, and we believe that the proposed approach can be applied to other scenarios beyond patient monitoring
    corecore