2,150 research outputs found

    Quality of Service Aware Data Stream Processing for Highly Dynamic and Scalable Applications

    Get PDF
    Huge amounts of georeferenced data streams are arriving daily to data stream management systems that are deployed for serving highly scalable and dynamic applications. There are innumerable ways at which those loads can be exploited to gain deep insights in various domains. Decision makers require an interactive visualization of such data in the form of maps and dashboards for decision making and strategic planning. Data streams normally exhibit fluctuation and oscillation in arrival rates and skewness. Those are the two predominant factors that greatly impact the overall quality of service. This requires data stream management systems to be attuned to those factors in addition to the spatial shape of the data that may exaggerate the negative impact of those factors. Current systems do not natively support services with quality guarantees for dynamic scenarios, leaving the handling of those logistics to the user which is challenging and cumbersome. Three workloads are predominant for any data stream, batch processing, scalable storage and stream processing. In this thesis, we have designed a quality of service aware system, SpatialDSMS, that constitutes several subsystems that are covering those loads and any mixed load that results from intermixing them. Most importantly, we natively have incorporated quality of service optimizations for processing avalanches of geo-referenced data streams in highly dynamic application scenarios. This has been achieved transparently on top of the codebases of emerging de facto standard best-in-class representatives, thus relieving the overburdened shoulders of the users in the presentation layer from having to reason about those services. Instead, users express their queries with quality goals and our system optimizers compiles that down into query plans with an embedded quality guarantee and leaves logistic handling to the underlying layers. We have developed standard compliant prototypes for all the subsystems that constitutes SpatialDSMS

    Streaming Euclidean Max-Cut: Dimension vs Data Reduction

    Full text link
    Max-Cut is a fundamental problem that has been studied extensively in various settings. We design an algorithm for Euclidean Max-Cut, where the input is a set of points in Rd\mathbb{R}^d, in the model of dynamic geometric streams, where the input X[Δ]dX\subseteq [\Delta]^d is presented as a sequence of point insertions and deletions. Previously, Frahling and Sohler [STOC 2005] designed a (1+ϵ)(1+\epsilon)-approximation algorithm for the low-dimensional regime, i.e., it uses space exp(d)\exp(d). To tackle this problem in the high-dimensional regime, which is of growing interest, one must improve the dependence on the dimension dd, ideally to space complexity poly(ϵ1dlogΔ)\mathrm{poly}(\epsilon^{-1} d \log\Delta). Lammersen, Sidiropoulos, and Sohler [WADS 2009] proved that Euclidean Max-Cut admits dimension reduction with target dimension d=poly(ϵ1)d' = \mathrm{poly}(\epsilon^{-1}). Combining this with the aforementioned algorithm that uses space exp(d)\exp(d'), they obtain an algorithm whose overall space complexity is indeed polynomial in dd, but unfortunately exponential in ϵ1\epsilon^{-1}. We devise an alternative approach of \emph{data reduction}, based on importance sampling, and achieve space bound poly(ϵ1dlogΔ)\mathrm{poly}(\epsilon^{-1} d \log\Delta), which is exponentially better (in ϵ\epsilon) than the dimension-reduction approach. To implement this scheme in the streaming model, we employ a randomly-shifted quadtree to construct a tree embedding. While this is a well-known method, a key feature of our algorithm is that the embedding's distortion O(dlogΔ)O(d\log\Delta) affects only the space complexity, and the approximation ratio remains 1+ϵ1+\epsilon

    The model of an anomaly detector for HiLumi LHC magnets based on Recurrent Neural Networks and adaptive quantization

    Full text link
    This paper focuses on an examination of an applicability of Recurrent Neural Network models for detecting anomalous behavior of the CERN superconducting magnets. In order to conduct the experiments, the authors designed and implemented an adaptive signal quantization algorithm and a custom GRU-based detector and developed a method for the detector parameters selection. Three different datasets were used for testing the detector. Two artificially generated datasets were used to assess the raw performance of the system whereas the 231 MB dataset composed of the signals acquired from HiLumi magnets was intended for real-life experiments and model training. Several different setups of the developed anomaly detection system were evaluated and compared with state-of-the-art OC-SVM reference model operating on the same data. The OC-SVM model was equipped with a rich set of feature extractors accounting for a range of the input signal properties. It was determined in the course of the experiments that the detector, along with its supporting design methodology, reaches F1 equal or very close to 1 for almost all test sets. Due to the profile of the data, the best_length setup of the detector turned out to perform the best among all five tested configuration schemes of the detection system. The quantization parameters have the biggest impact on the overall performance of the detector with the best values of input/output grid equal to 16 and 8, respectively. The proposed solution of the detection significantly outperformed OC-SVM-based detector in most of the cases, with much more stable performance across all the datasets.Comment: Related to arXiv:1702.0083

    Distributed and Communication-Efficient Continuous Data Processing in Vehicular Cyber-Physical Systems

    Get PDF
    Processing the data produced by modern connected vehicles is of increasing interest for vehicle manufacturers to gain knowledge and develop novel functions and applications for the future of mobility.Connected vehicles form Vehicular Cyber-Physical Systems (VCPSs) that continuously sense increasingly large data volumes from high-bandwidth sensors such as LiDARs (an array of laser-based distance sensors that create a 3D map of the surroundings).The straightforward attempt of gathering all raw data from a VCPS to a central location for analysis often fails due to limits imposed by the infrastructure on the communication and storage capacities. In this Licentiate thesis, I present the results from my research that investigates techniques aiming at reducing the data volumes that need to be transmitted from vehicles through online compression and adaptive selection of participating vehicles. As explained in this work, the key to reducing the communication volume is in pushing parts of the necessary processing onto the vehicles\u27 on-board computers, thereby favorably leveraging the available distributed processing infrastructure in a VCPS.The findings highlight that existing analysis workflows can be sped up significantly while reducing their data volume footprint and incurring only modest accuracy decreases. At the same time, the adaptive selection of vehicles for analyses proves to provide a sufficiently large subset of vehicles that have compliant data for further analyses, while balancing the time needed for selection and the induced computational load
    corecore