4,624 research outputs found

    Scalable and Sustainable Deep Learning via Randomized Hashing

    Full text link
    Current deep learning architectures are growing larger in order to learn from complex datasets. These architectures require giant matrix multiplication operations to train millions of parameters. Conversely, there is another growing trend to bring deep learning to low-power, embedded devices. The matrix operations, associated with both training and testing of deep networks, are very expensive from a computational and energy standpoint. We present a novel hashing based technique to drastically reduce the amount of computation needed to train and test deep networks. Our approach combines recent ideas from adaptive dropouts and randomized hashing for maximum inner product search to select the nodes with the highest activation efficiently. Our new algorithm for deep learning reduces the overall computational cost of forward and back-propagation by operating on significantly fewer (sparse) nodes. As a consequence, our algorithm uses only 5% of the total multiplications, while keeping on average within 1% of the accuracy of the original model. A unique property of the proposed hashing based back-propagation is that the updates are always sparse. Due to the sparse gradient updates, our algorithm is ideally suited for asynchronous and parallel training leading to near linear speedup with increasing number of cores. We demonstrate the scalability and sustainability (energy efficiency) of our proposed algorithm via rigorous experimental evaluations on several real datasets

    Dynamic Time-Dependent Route Planning in Road Networks with User Preferences

    Full text link
    There has been tremendous progress in algorithmic methods for computing driving directions on road networks. Most of that work focuses on time-independent route planning, where it is assumed that the cost on each arc is constant per query. In practice, the current traffic situation significantly influences the travel time on large parts of the road network, and it changes over the day. One can distinguish between traffic congestion that can be predicted using historical traffic data, and congestion due to unpredictable events, e.g., accidents. In this work, we study the \emph{dynamic and time-dependent} route planning problem, which takes both prediction (based on historical data) and live traffic into account. To this end, we propose a practical algorithm that, while robust to user preferences, is able to integrate global changes of the time-dependent metric~(e.g., due to traffic updates or user restrictions) faster than previous approaches, while allowing subsequent queries that enable interactive applications

    Measuring and Managing Answer Quality for Online Data-Intensive Services

    Full text link
    Online data-intensive services parallelize query execution across distributed software components. Interactive response time is a priority, so online query executions return answers without waiting for slow running components to finish. However, data from these slow components could lead to better answers. We propose Ubora, an approach to measure the effect of slow running components on the quality of answers. Ubora randomly samples online queries and executes them twice. The first execution elides data from slow components and provides fast online answers; the second execution waits for all components to complete. Ubora uses memoization to speed up mature executions by replaying network messages exchanged between components. Our systems-level implementation works for a wide range of platforms, including Hadoop/Yarn, Apache Lucene, the EasyRec Recommendation Engine, and the OpenEphyra question answering system. Ubora computes answer quality much faster than competing approaches that do not use memoization. With Ubora, we show that answer quality can and should be used to guide online admission control. Our adaptive controller processed 37% more queries than a competing controller guided by the rate of timeouts.Comment: Technical Repor

    On Longest Repeat Queries Using GPU

    Full text link
    Repeat finding in strings has important applications in subfields such as computational biology. The challenge of finding the longest repeats covering particular string positions was recently proposed and solved by \.{I}leri et al., using a total of the optimal O(n)O(n) time and space, where nn is the string size. However, their solution can only find the \emph{leftmost} longest repeat for each of the nn string position. It is also not known how to parallelize their solution. In this paper, we propose a new solution for longest repeat finding, which although is theoretically suboptimal in time but is conceptually simpler and works faster and uses less memory space in practice than the optimal solution. Further, our solution can find \emph{all} longest repeats of every string position, while still maintaining a faster processing speed and less memory space usage. Moreover, our solution is \emph{parallelizable} in the shared memory architecture (SMA), enabling it to take advantage of the modern multi-processor computing platforms such as the general-purpose graphics processing units (GPU). We have implemented both the sequential and parallel versions of our solution. Experiments with both biological and non-biological data show that our sequential and parallel solutions are faster than the optimal solution by a factor of 2--3.5 and 6--14, respectively, and use less memory space.Comment: 14 page

    When private set intersection meets big data : an efficient and scalable protocol

    Get PDF
    Large scale data processing brings new challenges to the design of privacy-preserving protocols: how to meet the increasing requirements of speed and throughput of modern applications, and how to scale up smoothly when data being protected is big. Efficiency and scalability become critical criteria for privacy preserving protocols in the age of Big Data. In this paper, we present a new Private Set Intersection (PSI) protocol that is extremely efficient and highly scalable compared with existing protocols. The protocol is based on a novel approach that we call oblivious Bloom intersection. It has linear complexity and relies mostly on efficient symmetric key operations. It has high scalability due to the fact that most operations can be parallelized easily. The protocol has two versions: a basic protocol and an enhanced protocol, the security of the two variants is analyzed and proved in the semi-honest model and the malicious model respectively. A prototype of the basic protocol has been built. We report the result of performance evaluation and compare it against the two previously fastest PSI protocols. Our protocol is orders of magnitude faster than these two protocols. To compute the intersection of two million-element sets, our protocol needs only 41 seconds (80-bit security) and 339 seconds (256-bit security) on moderate hardware in parallel mode

    Social Collaborative Retrieval

    Full text link
    Socially-based recommendation systems have recently attracted significant interest, and a number of studies have shown that social information can dramatically improve a system's predictions of user interests. Meanwhile, there are now many potential applications that involve aspects of both recommendation and information retrieval, and the task of collaborative retrieval---a combination of these two traditional problems---has recently been introduced. Successful collaborative retrieval requires overcoming severe data sparsity, making additional sources of information, such as social graphs, particularly valuable. In this paper we propose a new model for collaborative retrieval, and show that our algorithm outperforms current state-of-the-art approaches by incorporating information from social networks. We also provide empirical analyses of the ways in which cultural interests propagate along a social graph using a real-world music dataset.Comment: 10 page
    • ā€¦
    corecore