3 research outputs found

    Designing for Deployable, Secure, and Generic Machine Learning Systems

    Get PDF
    Machine learning systems have catalyzed numerous image-centric applications owing to the significant achievements of machine learning algorithms and models. While these systems have showcased the efficacy of machine learning models, certain challenges persist, such as machine learning system design and security vulnerabilities inherent in deep neural networks. Moreover, the deployment of deep neural network models remains a significant hurdle. This dissertation introduces a multimedia prototyping framework tailored for visual analytical applications, improving the reusability of video analysis software tools with minimal performance overhead. Furthermore, we present novel image-processing techniques designed to bolster the robustness of deep neural networks and propose an innovative compression technique to address deployment challenges. First, we propose a new software prototyping framework called Video as Text (vText) that analyzes and manipulates the video data as trivial as we handle text data in most Unix and Linux systems to tackle the reusability issue in the existing video analysis tools. The vText paradigm seeks to mimic such programs. We demonstrate the design and implementation of vText linking video codecs with computer vision and image processing algorithms, and the performance evaluation shows that the vText framework achieves comparable running time and is easily used for prototyping visual analytical programs. Second, to reduce the vulnerability of deep neural networks against adversaries, we propose three color-reduction image processing approaches, which are Gaussian smoothing plus PNM color reduction (GPCR), Gaussian smoothing plus K-means (GK-means), and fast GK-means to make deep convolutional neural networks more robust to adversarial perturbation. We evaluate the approaches on a subset of the ImageNet dataset. Our evaluation reveals that our GK-means-based algorithms have the best top-1 classification accuracy. The final contribution of the dissertation is introducing a novel deep neural network compression framework on class specialization problems to address the limited utilization of deep neural network-based functionalities. We propose a novel knowledge distillation framework with two proposed losses, Renormalized Knowledge Distillation (RKD) and Intra-Class Variance (ICV), to render computationally efficient, specialized neural network models. Our quantitatively empirical evaluation demonstrates that our proposed framework achieves significant classification accuracy improvements for the tasks where the number of subclasses or instances in datasets is relatively small

    Timely processing of big data in collaborative large-scale distributed systems

    Get PDF
    Today’s Big Data phenomenon, characterized by huge volumes of data produced at very high rates by heterogeneous and geographically dispersed sources, is fostering the employment of large-scale distributed systems in order to leverage parallelism, fault tolerance and locality awareness with the aim of delivering suitable performances. Among the several areas where Big Data is gaining increasing significance, the protection of Critical Infrastructure is one of the most strategic since it impacts on the stability and safety of entire countries. Intrusion detection mechanisms can benefit a lot from novel Big Data technologies because these allow to exploit much more information in order to sharpen the accuracy of threats discovery. A key aspect for increasing even more the amount of data at disposal for detection purposes is the collaboration (meant as information sharing) among distinct actors that share the common goal of maximizing the chances to recognize malicious activities earlier. Indeed, if an agreement can be found to share their data, they all have the possibility to definitely improve their cyber defenses. The abstraction of Semantic Room (SR) allows interested parties to form trusted and contractually regulated federations, the Semantic Rooms, for the sake of secure information sharing and processing. Another crucial point for the effectiveness of cyber protection mechanisms is the timeliness of the detection, because the sooner a threat is identified, the faster proper countermeasures can be put in place so as to confine any damage. Within this context, the contributions reported in this thesis are threefold * As a case study to show how collaboration can enhance the efficacy of security tools, we developed a novel algorithm for the detection of stealthy port scans, named R-SYN (Ranked SYN port scan detection). We implemented it in three distinct technologies, all of them integrated within an SR-compliant architecture that allows for collaboration through information sharing: (i) in a centralized Complex Event Processing (CEP) engine (Esper), (ii) in a framework for distributed event processing (Storm) and (iii) in Agilis, a novel platform for batch-oriented processing which leverages the Hadoop framework and a RAM-based storage for fast data access. Regardless of the employed technology, all the evaluations have shown that increasing the number of participants (that is, increasing the amount of input data at disposal), allows to improve the detection accuracy. The experiments made clear that a distributed approach allows for lower detection latency and for keeping up with higher input throughput, compared with a centralized one. * Distributing the computation over a set of physical nodes introduces the issue of improving the way available resources are assigned to the elaboration tasks to execute, with the aim of minimizing the time the computation takes to complete. We investigated this aspect in Storm by developing two distinct scheduling algorithms, both aimed at decreasing the average elaboration time of the single input event by decreasing the inter-node traffic. Experimental evaluations showed that these two algorithms can improve the performance up to 30%. * Computations in online processing platforms (like Esper and Storm) are run continuously, and the need of refining running computations or adding new computations, together with the need to cope with the variability of the input, requires the possibility to adapt the resource allocation at runtime, which entails a set of additional problems. Among them, the most relevant concern how to cope with incoming data and processing state while the topology is being reconfigured, and the issue of temporary reduced performance. At this aim, we also explored the alternative approach of running the computation periodically on batches of input data: although it involves a performance penalty on the elaboration latency, it allows to eliminate the great complexity of dynamic reconfigurations. We chose Hadoop as batch-oriented processing framework and we developed some strategies specific for dealing with computations based on time windows, which are very likely to be used for pattern recognition purposes, like in the case of intrusion detection. Our evaluations provided a comparison of these strategies and made evident the kind of performance that this approach can provide

    Can Infopipes Facilitate Reuse in a Traffic Application?

    Get PDF
    Infopipes are presented as reusable building blocks for streaming applications. To evaluate this claim, we have built a significant traffic application in Smalltalk using Infopipes. This poster presents a traffic problem and solution, a short introduction to Infopipes, and the types of reuse Infopipes facilitate in our implementation