12,280 research outputs found

    FLASH: Randomized Algorithms Accelerated over CPU-GPU for Ultra-High Dimensional Similarity Search

    Full text link
    We present FLASH (\textbf{F}ast \textbf{L}SH \textbf{A}lgorithm for \textbf{S}imilarity search accelerated with \textbf{H}PC), a similarity search system for ultra-high dimensional datasets on a single machine, that does not require similarity computations and is tailored for high-performance computing platforms. By leveraging a LSH style randomized indexing procedure and combining it with several principled techniques, such as reservoir sampling, recent advances in one-pass minwise hashing, and count based estimations, we reduce the computational and parallelization costs of similarity search, while retaining sound theoretical guarantees. We evaluate FLASH on several real, high-dimensional datasets from different domains, including text, malicious URL, click-through prediction, social networks, etc. Our experiments shed new light on the difficulties associated with datasets having several million dimensions. Current state-of-the-art implementations either fail on the presented scale or are orders of magnitude slower than FLASH. FLASH is capable of computing an approximate k-NN graph, from scratch, over the full webspam dataset (1.3 billion nonzeros) in less than 10 seconds. Computing a full k-NN graph in less than 10 seconds on the webspam dataset, using brute-force (n2Dn^2D), will require at least 20 teraflops. We provide CPU and GPU implementations of FLASH for replicability of our results

    Impliance: A Next Generation Information Management Appliance

    Full text link
    ably successful in building a large market and adapting to the changes of the last three decades, its impact on the broader market of information management is surprisingly limited. If we were to design an information management system from scratch, based upon today's requirements and hardware capabilities, would it look anything like today's database systems?" In this paper, we introduce Impliance, a next-generation information management system consisting of hardware and software components integrated to form an easy-to-administer appliance that can store, retrieve, and analyze all types of structured, semi-structured, and unstructured information. We first summarize the trends that will shape information management for the foreseeable future. Those trends imply three major requirements for Impliance: (1) to be able to store, manage, and uniformly query all data, not just structured records; (2) to be able to scale out as the volume of this data grows; and (3) to be simple and robust in operation. We then describe four key ideas that are uniquely combined in Impliance to address these requirements, namely the ideas of: (a) integrating software and off-the-shelf hardware into a generic information appliance; (b) automatically discovering, organizing, and managing all data - unstructured as well as structured - in a uniform way; (c) achieving scale-out by exploiting simple, massive parallel processing, and (d) virtualizing compute and storage resources to unify, simplify, and streamline the management of Impliance. Impliance is an ambitious, long-term effort to define simpler, more robust, and more scalable information systems for tomorrow's enterprises.Comment: This article is published under a Creative Commons License Agreement (http://creativecommons.org/licenses/by/2.5/.) You may copy, distribute, display, and perform the work, make derivative works and make commercial use of the work, but, you must attribute the work to the author and CIDR 2007. 3rd Biennial Conference on Innovative Data Systems Research (CIDR) January 710, 2007, Asilomar, California, US

    Parallel Architectures for Planetary Exploration Requirements (PAPER)

    Get PDF
    The Parallel Architectures for Planetary Exploration Requirements (PAPER) project is essentially research oriented towards technology insertion issues for NASA's unmanned planetary probes. It was initiated to complement and augment the long-term efforts for space exploration with particular reference to NASA/LaRC's (NASA Langley Research Center) research needs for planetary exploration missions of the mid and late 1990s. The requirements for space missions as given in the somewhat dated Advanced Information Processing Systems (AIPS) requirements document are contrasted with the new requirements from JPL/Caltech involving sensor data capture and scene analysis. It is shown that more stringent requirements have arisen as a result of technological advancements. Two possible architectures, the AIPS Proof of Concept (POC) configuration and the MAX Fault-tolerant dataflow multiprocessor, were evaluated. The main observation was that the AIPS design is biased towards fault tolerance and may not be an ideal architecture for planetary and deep space probes due to high cost and complexity. The MAX concepts appears to be a promising candidate, except that more detailed information is required. The feasibility for adding neural computation capability to this architecture needs to be studied. Key impact issues for architectural design of computing systems meant for planetary missions were also identified

    An Overview on Application of Machine Learning Techniques in Optical Networks

    Get PDF
    Today's telecommunication networks have become sources of enormous amounts of widely heterogeneous data. This information can be retrieved from network traffic traces, network alarms, signal quality indicators, users' behavioral data, etc. Advanced mathematical tools are required to extract meaningful information from these data and take decisions pertaining to the proper functioning of the networks from the network-generated data. Among these mathematical tools, Machine Learning (ML) is regarded as one of the most promising methodological approaches to perform network-data analysis and enable automated network self-configuration and fault management. The adoption of ML techniques in the field of optical communication networks is motivated by the unprecedented growth of network complexity faced by optical networks in the last few years. Such complexity increase is due to the introduction of a huge number of adjustable and interdependent system parameters (e.g., routing configurations, modulation format, symbol rate, coding schemes, etc.) that are enabled by the usage of coherent transmission/reception technologies, advanced digital signal processing and compensation of nonlinear effects in optical fiber propagation. In this paper we provide an overview of the application of ML to optical communications and networking. We classify and survey relevant literature dealing with the topic, and we also provide an introductory tutorial on ML for researchers and practitioners interested in this field. Although a good number of research papers have recently appeared, the application of ML to optical networks is still in its infancy: to stimulate further work in this area, we conclude the paper proposing new possible research directions

    Design and Service Provisioning Methods for Optical Networks in 5G and Beyond Scenarios

    Get PDF
    Network operators are deploying 5G while also considering the evolution towards 6G. They consider different enablers and address various challenges. One trend in the 5G deployment is network densification, i.e., deploying many small cell sites close to the users, which need a well-designed transport network (TN). The choice of the TN technology and the location for processing the 5G protocol stack functions are critical to contain capital and operational expenditures. Furthermore, it is crucial to ensure the resiliency of the TN infrastructure in case of a failure in nodes and/or links while the resource efficiency is maximized.Operators are also interested in 5G networks with flexibility and scalability features. In this context, one main question is where to deploy network functions so that the connectivity and compute resources are utilized efficiently while meeting strict service latency and availability requirements. Off-loading compute resources to large and central data centers (DCs) has some advantages, i.e., better utilization of compute resources at a lower cost. A backup path can be added to address service availability requirements when using compute off-loading strategies. This might impact the service blocking ratio and limit operators’ profit. The importance of this trade-off becomes more critical with the emergence of new 6G verticals.This thesis proposes novel methods to address the issues outlined above. To address the challenge of cost-efficient TN deployment, the thesis introduces a framework to study the total cost of ownership (TCO), latency, and reliability performance of a set of TN architectures for high-layer and low-layer functional split options. The architectural options are fiber- or microwave-based. To address the strict availability requirement, the thesis proposes a resource-efficient protection strategy against single node/link failure of the midhaul segment. The method selects primary and backup DCs for each aggregation node (i.e., nodes to which cell sites are connected) while maximizing the sharing of backup resources. Finally, to address the challenge of resource efficiency while provisioning services, the thesis proposes a backup-enhanced compute off-loading strategy (i.e., resource-efficient provisioning (REP)). REP selects a DC, a connectivity path, and (optionally) a backup path for each service request with the aim of minimizing resource usage while the service latency and availability requirements are met.Our results of the techno-economic assessment of the TN options reveal that, in some cases, microwave can be a good substitute for fiber technology. Several factors, including the geo-type, functional split option, and the cost of fiber trenching and microwave equipment, influence the effectiveness of the microwave. The considered architectures show similar latency and reliability performance and meet the 5G service requirements. The thesis also shows that a protection strategy based on shared connectivity and compute resources can lead to significant cost savings compared to benchmarks based on dedicated backup resources. Finally, the thesis shows that the proposed backup-enhanced compute off-loading strategy offers advantages in service blocking ratio and profit gain compared to a conventional off-loading approach that does not add a backup path. Benefits are even more evident considering next-generation services, e.g., expected on the market in 3 to 5 years, as the demand for services with stringent latency and availability will increase

    Is There Light at the Ends of the Tunnel? Wireless Sensor Networks for Adaptive Lighting in Road Tunnels

    Get PDF
    Existing deployments of wireless sensor networks (WSNs) are often conceived as stand-alone monitoring tools. In this paper, we report instead on a deployment where the WSN is a key component of a closed-loop control system for adaptive lighting in operational road tunnels. WSN nodes along the tunnel walls report light readings to a control station, which closes the loop by setting the intensity of lamps to match a legislated curve. The ability to match dynamically the lighting levels to the actual environmental conditions improves the tunnel safety and reduces its power consumption. The use of WSNs in a closed-loop system, combined with the real-world, harsh setting of operational road tunnels, induces tighter requirements on the quality and timeliness of sensed data, as well as on the reliability and lifetime of the network. In this work, we test to what extent mainstream WSN technology meets these challenges, using a dedicated design that however relies on wellestablished techniques. The paper describes the hw/sw architecture we devised by focusing on the WSN component, and analyzes its performance through experiments in a real, operational tunnel

    Real-Time Fault Detection and Diagnosis System for Analog and Mixed-Signal Circuits of Acousto-Magnetic EAS Devices

    Get PDF
    © 2015 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.The paper discusses fault diagnosis of the electronic circuit board, part of acousto-magnetic electronic article surveillance detection devices. The aim is that the end-user can run the fault diagnosis in real time using a portable FPGA-based platform so as to gain insight into the failures that have occurred.Peer reviewe
    • …
    corecore