165 research outputs found

    Tupleware: Redefining Modern Analytics

    Full text link
    There is a fundamental discrepancy between the targeted and actual users of current analytics frameworks. Most systems are designed for the data and infrastructure of the Googles and Facebooks of the world---petabytes of data distributed across large cloud deployments consisting of thousands of cheap commodity machines. Yet, the vast majority of users operate clusters ranging from a few to a few dozen nodes, analyze relatively small datasets of up to a few terabytes, and perform primarily compute-intensive operations. Targeting these users fundamentally changes the way we should build analytics systems. This paper describes the design of Tupleware, a new system specifically aimed at the challenges faced by the typical user. Tupleware's architecture brings together ideas from the database, compiler, and programming languages communities to create a powerful end-to-end solution for data analysis. We propose novel techniques that consider the data, computations, and hardware together to achieve maximum performance on a case-by-case basis. Our experimental evaluation quantifies the impact of our novel techniques and shows orders of magnitude performance improvement over alternative systems

    Bayesian Classifiers Programmed In SQL Using PCA

    Get PDF
    The Bayesian classifier is a fundamental classification technique We also consider different concepts regarding Dimensionality Reduction techniques for retrieving lossless data In this paper we proposed a new architecture for pre-processing the data Here we improved our Bayesian classifier to produce more accurate models with skewed distributions data sets with missing information and subsets of points having significant overlap with each other which are known issues for clustering algorithms so we are interested in combining Dimensionality Reduction technique like PCA with Bayesian Classifiers to accelerate computations and evaluate complex mathematical equations The proposed architecture in this project contains the following stages pre-processing of input data Na ve Bayesian classifier Bayesian classifier Principal component analysis and database Principal Component Analysis PCA is the process of reducing components by calculating Eigen values and Eigen Vectors We consider two algorithms in this paper Bayesian Classifier based on KMeans BKM and Na ve Bayesian Classifier Algorithm N

    Type Ahead Search in Database using SQL

    Get PDF
    A type ahead search system computes answers on the fly as a user types in a keyword query character by character. We are going to study how to support type ahead search on data in a relational DBMS. We focus on how to help this type of search using the SQL. A prominent task that tests is how to influence existing database functionalities to meet the high performance to achieve an interactive speed. We extended the efficient way to the case of fuzzy queries, and suggested various techniques to improve query performance. We suggested incremental computation method to answer multi keyword queries, and calculated how to support first N queries and incremental updates. Our experimental results on large and real data sets showed that the proposed techniques can enables DBMS systems to support search as you type on large tables. DOI: 10.17762/ijritcc2321-8169.15024
    • …
    corecore