165 research outputs found
Tupleware: Redefining Modern Analytics
There is a fundamental discrepancy between the targeted and actual users of
current analytics frameworks. Most systems are designed for the data and
infrastructure of the Googles and Facebooks of the world---petabytes of data
distributed across large cloud deployments consisting of thousands of cheap
commodity machines. Yet, the vast majority of users operate clusters ranging
from a few to a few dozen nodes, analyze relatively small datasets of up to a
few terabytes, and perform primarily compute-intensive operations. Targeting
these users fundamentally changes the way we should build analytics systems.
This paper describes the design of Tupleware, a new system specifically aimed
at the challenges faced by the typical user. Tupleware's architecture brings
together ideas from the database, compiler, and programming languages
communities to create a powerful end-to-end solution for data analysis. We
propose novel techniques that consider the data, computations, and hardware
together to achieve maximum performance on a case-by-case basis. Our
experimental evaluation quantifies the impact of our novel techniques and shows
orders of magnitude performance improvement over alternative systems
Bayesian Classifiers Programmed In SQL Using PCA
The Bayesian classifier is a fundamental classification technique We also consider different concepts regarding Dimensionality Reduction techniques for retrieving lossless data In this paper we proposed a new architecture for pre-processing the data Here we improved our Bayesian classifier to produce more accurate models with skewed distributions data sets with missing information and subsets of points having significant overlap with each other which are known issues for clustering algorithms so we are interested in combining Dimensionality Reduction technique like PCA with Bayesian Classifiers to accelerate computations and evaluate complex mathematical equations The proposed architecture in this project contains the following stages pre-processing of input data Na ve Bayesian classifier Bayesian classifier Principal component analysis and database Principal Component Analysis PCA is the process of reducing components by calculating Eigen values and Eigen Vectors We consider two algorithms in this paper Bayesian Classifier based on KMeans BKM and Na ve Bayesian Classifier Algorithm N
Type Ahead Search in Database using SQL
A type ahead search system computes answers on the fly as a user types in a keyword query character by character. We are going to study how to support type ahead search on data in a relational DBMS. We focus on how to help this type of search using the SQL. A prominent task that tests is how to influence existing database functionalities to meet the high performance to achieve an interactive speed. We extended the efficient way to the case of fuzzy queries, and suggested various techniques to improve query performance. We suggested incremental computation method to answer multi keyword queries, and calculated how to support first N queries and incremental updates. Our experimental results on large and real data sets showed that the proposed techniques can enables DBMS systems to support search as you type on large tables.
DOI: 10.17762/ijritcc2321-8169.15024
- …