3,765 research outputs found

    The Case for Learned Index Structures

    Full text link
    Indexes are models: a B-Tree-Index can be seen as a model to map a key to the position of a record within a sorted array, a Hash-Index as a model to map a key to a position of a record within an unsorted array, and a BitMap-Index as a model to indicate if a data record exists or not. In this exploratory research paper, we start from this premise and posit that all existing index structures can be replaced with other types of models, including deep-learning models, which we term learned indexes. The key idea is that a model can learn the sort order or structure of lookup keys and use this signal to effectively predict the position or existence of records. We theoretically analyze under which conditions learned indexes outperform traditional index structures and describe the main challenges in designing learned index structures. Our initial results show, that by using neural nets we are able to outperform cache-optimized B-Trees by up to 70% in speed while saving an order-of-magnitude in memory over several real-world data sets. More importantly though, we believe that the idea of replacing core components of a data management system through learned models has far reaching implications for future systems designs and that this work just provides a glimpse of what might be possible

    Optimal (Randomized) Parallel Algorithms in the Binary-Forking Model

    Full text link
    In this paper we develop optimal algorithms in the binary-forking model for a variety of fundamental problems, including sorting, semisorting, list ranking, tree contraction, range minima, and ordered set union, intersection and difference. In the binary-forking model, tasks can only fork into two child tasks, but can do so recursively and asynchronously. The tasks share memory, supporting reads, writes and test-and-sets. Costs are measured in terms of work (total number of instructions), and span (longest dependence chain). The binary-forking model is meant to capture both algorithm performance and algorithm-design considerations on many existing multithreaded languages, which are also asynchronous and rely on binary forks either explicitly or under the covers. In contrast to the widely studied PRAM model, it does not assume arbitrary-way forks nor synchronous operations, both of which are hard to implement in modern hardware. While optimal PRAM algorithms are known for the problems studied herein, it turns out that arbitrary-way forking and strict synchronization are powerful, if unrealistic, capabilities. Natural simulations of these PRAM algorithms in the binary-forking model (i.e., implementations in existing parallel languages) incur an Ω(logn)\Omega(\log n) overhead in span. This paper explores techniques for designing optimal algorithms when limited to binary forking and assuming asynchrony. All algorithms described in this paper are the first algorithms with optimal work and span in the binary-forking model. Most of the algorithms are simple. Many are randomized

    Predicting Autism over Large-Scale Child Dataset

    Get PDF
    Data Analytics and Machine learning in healthcare are one of the most emerging and needed fields in current time. Also, a lot of research has been performed and is still being done in this field. In healthcare, gone are those days when only doctor examines and patient listens. Now doctor has a lot of technologies which can assist him and help in accurately diagnosing the disease with which his patient is suffering. The backbone of such technologies is data analytics and machine learning where we can make out a lot of inferences from tons of patients‟ data already available. This project aims at performing research and implementation of big data and machine learning techniques on the data related to the patients suffering from the disease called Autism. Autism is a neural disorder disease characterized by impaired social communication, verbal and non-verbal interaction, restrictive and repetitive behavior [4]. Autism is majorly noticed in children under or about the age of two years. One very important thing to be observed here is that autism is highly heritable and the cause includes both environmental factors and genetic susceptibility. Hence it is very important to have such data which contains details of patients including their symptoms, lab test data, history, vaccination details etc. which gives specific details of patients and their history. The project ultimately aims at training the data model with the set of training data and then testing and evaluating the data model using the test data. In this way, it should be a research and solution for implementing machine learning to detect and diagnose autism
    corecore