Search CORE

602,220 research outputs found

Recommended from our members

AGM, a dataflow database machine

Author: Bic Lubomir
Hartmann Robert L.
Publication venue: eScholarship, University of California
Publication date: 01/01/1984
Field of study

In recent years, a number of database machines consisting of large numbers of parallel processing elements have been proposed. Unfortunately, one of the main limitations to parallelism in database processing is the I/O bandwidth of the underlying storage devices. One way to solve this problem is to use multiple parallel disk units. The main problem with this approach, however, is the lack of a computational model capable of utilizing the potential of any significant number of such devices.This paper presents a database model which is based on the principles of data-driven computation. According to this model, the database is represented as a network in which each node is conceptually an independent processing element, capable of communicating with other nodes by exchanging messages along the network arcs. To answer a query, one or more such messages, called tokens, are created and injected into the network. These then propagate asynchronously through the network in the search of results satisfying the given query.To investigate the performance of the proposed system, we have implemented the model on a simulated computer architecture. The results of the simulation ex-periments indicate that the model is capable of exploiting the potential I/O band-width of a large number of disk units as well as the computational power of the associated processing elements

eScholarship - University of California

Association Mining in Database Machine

Author: Jiao Jindou
Publication venue: SJSU ScholarWorks
Publication date: 01/10/2011
Field of study

Association rule is wildly used in most of the data mining technologies. Apriori algorithm is the fundamental association rule mining algorithm. FP-growth tree algorithm improves the performance by reduce the generation of the frequent item sets. Simplex algorithm is a advanced FP-growth algorithm by using bitmap structure with the simplex concept in geometry. The bitmap structure implementation is particular designed for storing the data in database machines to support parallel computing the association rule mining

SJSU ScholarWorks

Mining protein database using machine learning techniques

Author: Camargo Renata
Niranjan Mahesan
Publication venue: 'Walter de Gruyter GmbH'
Publication date: 01/06/2008
Field of study

With a large amount of information relating to proteins accumulating in databases widely available online, it is of interest to apply machine learning techniques that, by extracting underlying statistical regularities in the data, make predictions about the functional and evolutionary characteristics of unseen proteins. Such predictions can help in achieving a reduction in the space over which experiment designers need to search in order to improve our understanding of the biochemical properties. Previously it has been suggested that an integration of features computable by comparing a pair of proteins can be achieved by an artificial neural network, hence predicting the degree to which they may be evolutionary related and homologous. We compiled two datasets of pairs of proteins, each pair being characterised by seven distinct features. We performed an exhaustive search through all possible combinations of features, for the problem of separating remote homologous from analogous pairs, we note that significant performance gain was obtained by the inclusion of sequence and structure information. We find that the use of a linear classifier was enough to discriminate a protein pair at the family level. However, at the superfamily level, to detect remote homologous pairs was a relatively harder problem. We find that the use of nonlinear classifiers achieve significantly higher accuracies. In this paper, we compare three different pattern classification methods on two problems formulated as detecting evolutionary and functional relationships between pairs of proteins, and from extensive cross validation and feature selection based studies quantify the average limits and uncertainties with which such predictions may be made. Feature selection points to a "knowledge gap" in currently available functional annotations. We demonstrate how the scheme may be employed in a framework to associate an individual protein with an existing family of evolutionarily related proteins

Southampton (e-Prints Soton)

Crossref

Recommended from our members

Nutrient Estimation from 24-Hour Food Recalls Using Machine Learning and Database Mapping: A Case Study with Lactose.

Author: Bouzid Yasmine Y
Burnett Dustin J
Chin Elizabeth L
Kan Annie
Lemay Danielle G
Simmons Gabriel
Tagkopoulos Ilias
Publication venue: eScholarship, University of California
Publication date: 01/12/2019
Field of study

The Automated Self-Administered 24-Hour Dietary Assessment Tool (ASA24) is a free dietary recall system that outputs fewer nutrients than the Nutrition Data System for Research (NDSR). NDSR uses the Nutrition Coordinating Center (NCC) Food and Nutrient Database, both of which require a license. Manual lookup of ASA24 foods into NDSR is time-consuming but currently the only way to acquire NCC-exclusive nutrients. Using lactose as an example, we evaluated machine learning and database matching methods to estimate this NCC-exclusive nutrient from ASA24 reports. ASA24-reported foods were manually looked up into NDSR to obtain lactose estimates and split into training (n = 378) and test (n = 189) datasets. Nine machine learning models were developed to predict lactose from the nutrients common between ASA24 and the NCC database. Database matching algorithms were developed to match NCC foods to an ASA24 food using only nutrients ("Nutrient-Only") or the nutrient and food descriptions ("Nutrient + Text"). For both methods, the lactose values were compared to the manual curation. Among machine learning models, the XGB-Regressor model performed best on held-out test data (R2 = 0.33). For the database matching method, Nutrient + Text matching yielded the best lactose estimates (R2 = 0.76), a vast improvement over the status quo of no estimate. These results suggest that computational methods can successfully estimate an NCC-exclusive nutrient for foods reported in ASA24

eScholarship - University of California

The Gremlin Graph Traversal Machine and Language

Author: Hartig O.
Hopcroft J.
Prud’hommeaux E.
Rodriguez M. A.
Rodriguez M. A.
Shinavier J.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 16/08/2015
Field of study

Gremlin is a graph traversal machine and language designed, developed, and distributed by the Apache TinkerPop project. Gremlin, as a graph traversal machine, is composed of three interacting components: a graph

G

, a traversal

\Psi

, and a set of traversers

T

. The traversers move about the graph according to the instructions specified in the traversal, where the result of the computation is the ultimate locations of all halted traversers. A Gremlin machine can be executed over any supporting graph computing system such as an OLTP graph database and/or an OLAP graph processor. Gremlin, as a graph traversal language, is a functional language implemented in the user's native programming language and is used to define the

\Psi

of a Gremlin machine. This article provides a mathematical description of Gremlin and details its automaton and functional properties. These properties enable Gremlin to naturally support imperative and declarative querying, host language agnosticism, user-defined domain specific languages, an extensible compiler/optimizer, single- and multi-machine execution models, hybrid depth- and breadth-first evaluation, as well as the existence of a Universal Gremlin Machine and its respective entailments.Comment: To appear in the Proceedings of the 2015 ACM Database Programming Languages Conferenc

arXiv.org e-Print Archive

Crossref

Recommended from our members

Selective Laser Sintering Process Management Using a Relational Database

Author: Gibson Ian
Shi Dongping
Publication venue
Publication date: 01/01/1999
Field of study

With more and more materials used in the Selective Laser Sintering (SLS) process, it is becoming necessary to use a database to manage the process efficiently. In this paper, a relational database for the SLS process is described. The database includes powdered material data, sintering parameters, machine characteristics, mechanical properties and surface quality of prototypes. Use ofthis database will make it is easy to store and retrieve processing information and make decisions for planning the SLS. This paper will go on to describe how the database can be extended to include other RP technologies.Mechanical Engineerin

Texas ScholarWorks

Learning-based Analysis on the Exploitability of Security Vulnerabilities

Author: Bliss Adam
Publication venue: ScholarWorks@UARK
Publication date: 01/12/2018
Field of study

The purpose of this thesis is to develop a tool that uses machine learning techniques to make predictions about whether or not a given vulnerability will be exploited. Such a tool could help organizations such as electric utilities to prioritize their security patching operations. Three different models, based on a deep neural network, a random forest, and a support vector machine respectively, are designed and implemented. Training data for these models is compiled from a variety of sources, including the National Vulnerability Database published by NIST and the Exploit Database published by Offensive Security. Extensive experiments are conducted, including testing the accuracy of each model, dynamically training the models on a rolling window of training data, and filtering the training data by various features. Of the chosen models, the deep neural network and the support vector machine show the highest accuracy (approximately 94% and 93%, respectively), and could be developed by future researchers into an effective tool for vulnerability analysis

ScholarWorks@UARK

UARK (University of Arkansas )

Data Mining Using Relational Database Management Systems

Author: Beibei Zou
Bettina Kemme
Doina Precup
Glen Newton
Xuesong Ma
Publication venue
Publication date: 01/01/2006
Field of study

Software packages providing a whole set of data mining and machine learning algorithms are attractive because they allow experimentation with many kinds of algorithms in an easy setup. However, these packages are often based on main-memory data structures, limiting the amount of data they can handle. In this paper we use a relational database as secondary storage in order to eliminate this limitation. Unlike existing approaches, which often focus on optimizing a single algorithm to work with a database backend, we propose a general approach, which provides a database interface for several algorithms at once. We have taken a popular machine learning software package, Weka, and added a relational storage manager as back-tier to the system. The extension is transparent to the algorithms implemented in Weka, since it is hidden behind Weka’s standard main-memory data structure interface. Furthermore, some general mining tasks are transfered into the database system to speed up execution. We tested the extended system, refered to as WekaDB, and our results show that it achieves a much higher scalability than Weka, while providing the same output and maintaining good computation time

CogPrints Cognitive Sciences Eprint Archive