4,895 research outputs found

    Pyramid: Enhancing Selectivity in Big Data Protection with Count Featurization

    Full text link
    Protecting vast quantities of data poses a daunting challenge for the growing number of organizations that collect, stockpile, and monetize it. The ability to distinguish data that is actually needed from data collected "just in case" would help these organizations to limit the latter's exposure to attack. A natural approach might be to monitor data use and retain only the working-set of in-use data in accessible storage; unused data can be evicted to a highly protected store. However, many of today's big data applications rely on machine learning (ML) workloads that are periodically retrained by accessing, and thus exposing to attack, the entire data store. Training set minimization methods, such as count featurization, are often used to limit the data needed to train ML workloads to improve performance or scalability. We present Pyramid, a limited-exposure data management system that builds upon count featurization to enhance data protection. As such, Pyramid uniquely introduces both the idea and proof-of-concept for leveraging training set minimization methods to instill rigor and selectivity into big data management. We integrated Pyramid into Spark Velox, a framework for ML-based targeting and personalization. We evaluate it on three applications and show that Pyramid approaches state-of-the-art models while training on less than 1% of the raw data

    Cloud-based digital twinning for structural health monitoring using deep learning

    Get PDF
    Digital Twin technology has recently gathered pace in the engineering communities as it allows for the convergence of the real structure and its digital counterpart throughout their entire life-cycle. With the rapid development of supporting technologies, including machine learning, 5G/6G, cloud computing, and Internet of Things, Digital Twin has been moving progressively from concept to practice. In this paper, a Digital Twin framework based on cloud computing and deep learning for structural health monitoring is proposed to efficiently perform real-time monitoring and proactive maintenance. The framework consists of structural components, device measurements, and digital models formed by combining different sub-models including mathematical, finite element, and machine learning ones. The data interaction among physical structure, digital model, and human interventions are enhanced by using cloud computing infrastructure and a user-friendly web application. The feasibility of the proposed framework is demonstrated via case studies of damage detection of model bridge and real bridge structures using deep learning algorithms, with high accuracy of 92%

    Industry 4.0 for SME

    Get PDF
    Dissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics, specialization in Business AnalyticsIndustry 4.0 has been growing within companies and impacting the economy and society, but this has been a more complex challenge for some types of companies. Due to the costs and complexity associated with Industry 4.0 technologies, small and medium enterprises face difficulties in adopting them. This thesis proposes to create a model that gives guidance and simplifies how to implement Industry 4.0 in SMEs with a low-cost perspective. It is intended that this model can be used as a blueprint to design and implement an Industry 4.0 project within a manufactory SME. To create the model, a literature review of the different fields regarding Industry 4.0 were conducted to understand the most suited technologies to leverage within the manufacturing industry and the different use cases where these would be applicable. After the model was built, expert interviews were conducted, and based on the received feedback, the model was tweaked, improved, and validated

    Sparse Attentive Backtracking: Temporal CreditAssignment Through Reminding

    Full text link
    Learning long-term dependencies in extended temporal sequences requires credit assignment to events far back in the past. The most common method for training recurrent neural networks, back-propagation through time (BPTT), requires credit information to be propagated backwards through every single step of the forward computation, potentially over thousands or millions of time steps. This becomes computationally expensive or even infeasible when used with long sequences. Importantly, biological brains are unlikely to perform such detailed reverse replay over very long sequences of internal states (consider days, months, or years.) However, humans are often reminded of past memories or mental states which are associated with the current mental state. We consider the hypothesis that such memory associations between past and present could be used for credit assignment through arbitrarily long sequences, propagating the credit assigned to the current state to the associated past state. Based on this principle, we study a novel algorithm which only back-propagates through a few of these temporal skip connections, realized by a learned attention mechanism that associates current states with relevant past states. We demonstrate in experiments that our method matches or outperforms regular BPTT and truncated BPTT in tasks involving particularly long-term dependencies, but without requiring the biologically implausible backward replay through the whole history of states. Additionally, we demonstrate that the proposed method transfers to longer sequences significantly better than LSTMs trained with BPTT and LSTMs trained with full self-attention.Comment: To appear as a Spotlight presentation at NIPS 201

    Visual Tracking by Sampling in Part Space

    Get PDF
    In this paper, we present a novel part-based visual tracking method from the perspective of probability sampling. Specifically, we represent the target by a part space with two online learned probabilities to capture the structure of the target. The proposal distribution memorizes the historical performance of different parts, and it is used for the first round of part selection. The acceptance probability validates the specific tracking stability of each part in a frame, and it determines whether to accept its vote or to reject it. By doing this, we transform the complex online part selection problem into a probability learning one, which is easier to tackle. The observation model of each part is constructed by an improved supervised descent method and is learned in an incremental manner. Experimental results on two benchmarks demonstrate the competitive performance of our tracker against state-of-the-art methods

    Data analytics 2016: proceedings of the fifth international conference on data analytics

    Get PDF

    CILP: Co-simulation based imitation learner for dynamic resource provisioning in cloud computing environments

    Get PDF
    Intelligent Virtual Machine (VM) provisioning is central to cost and resource efficient computation in cloud computing environments. As bootstrapping VMs is time-consuming, a key challenge for latency-critical tasks is to predict future workload demands to provision VMs proactively. However, existing AI-based solutions tend to not holistically consider all crucial aspects such as provisioning overheads, heterogeneous VM costs and Quality of Service (QoS) of the cloud system. To address this, we propose a novel method, called CILP, that formulates the VM provisioning problem as two sub-problems of prediction and optimization, where the provisioning plan is optimized based on predicted workload demands. CILP leverages a neural network as a surrogate model to predict future workload demands with a co-simulated digital-twin of the infrastructure to compute QoS scores. We extend the neural network to also act as an imitation learner that dynamically decides the optimal VM provisioning plan. A transformer based neural model reduces training and inference overheads while our novel two-phase decision making loop facilitates in making informed provisioning decisions. Crucially, we address limitations of prior work by including resource utilization, deployment costs and provisioning overheads to inform the provisioning decisions in our imitation learning framework. Experiments with three public benchmarks demonstrate that CILP gives up to 22% higher resource utilization, 14% higher QoS scores and 44% lower execution costs compared to the current online and offline optimization based state-of-the-art methods

    The AlfaCrux CubeSat mission description and early results

    Get PDF
    On 1 April 2022, the AlfaCrux CubeSat was launched by the Falcon 9 Transporter-4 mission, the fourth SpaceX dedicated smallsat rideshare program mission, from Space Launch Complex 40 at Cape Canaveral Space Force Station in Florida into a Sun-synchronous orbit at 500 km. AlfaCrux is an amateur radio and educational mission to provide learning and scientific benefits in the context of small satellite missions. It is an opportunity for theoretical and practical learning about the technical management, systems design, communication, orbital mechanics, development, integration, and operation of small satellites. The AlfaCrux payload, a software-defined radio hardware, is responsible for two main services, which are a digital packet repeater and a store-and-forward system. In the ground segment, a cloud-computing-based command and control station has been developed, together with an open access online platform to access and visualize the main information of the AlfaCrux telemetry and user data and experiments. It also becomes an in-orbit database reference to be used for different studies concerned with, for instance, radio propagation, attitude reconstruction, data-driven calibration algorithms for satellite sensors, among others. In this context, this paper describes the AlfaCrux mission, its main subsystems, and the achievements obtained in the early orbit phase. Scientific and engineering assessments conducted with the spacecraft operations to tackle unexpected behaviors in the ground station and also to better understand the space environment are also presented and discussed.Fundação de Apoio à Pesquisa del Distrito Federal (FAPDF), Brasil | Ref. N/
    corecore