22,441 research outputs found

    Data Mining Using Relational Database Management Systems

    Get PDF
    Software packages providing a whole set of data mining and machine learning algorithms are attractive because they allow experimentation with many kinds of algorithms in an easy setup. However, these packages are often based on main-memory data structures, limiting the amount of data they can handle. In this paper we use a relational database as secondary storage in order to eliminate this limitation. Unlike existing approaches, which often focus on optimizing a single algorithm to work with a database backend, we propose a general approach, which provides a database interface for several algorithms at once. We have taken a popular machine learning software package, Weka, and added a relational storage manager as back-tier to the system. The extension is transparent to the algorithms implemented in Weka, since it is hidden behind Weka’s standard main-memory data structure interface. Furthermore, some general mining tasks are transfered into the database system to speed up execution. We tested the extended system, refered to as WekaDB, and our results show that it achieves a much higher scalability than Weka, while providing the same output and maintaining good computation time

    Bringing Back-in-Time Debugging Down to the Database

    Full text link
    With back-in-time debuggers, developers can explore what happened before observable failures by following infection chains back to their root causes. While there are several such debuggers for object-oriented programming languages, we do not know of any back-in-time capabilities at the database-level. Thus, if failures are caused by SQL scripts or stored procedures, developers have difficulties in understanding their unexpected behavior. In this paper, we present an approach for bringing back-in-time debugging down to the SAP HANA in-memory database. Our TARDISP debugger allows developers to step queries backwards and inspecting the database at previous and arbitrary points in time. With the help of a SQL extension, we can express queries covering a period of execution time within a debugging session and handle large amounts of data with low overhead on performance and memory. The entire approach has been evaluated within a development project at SAP and shows promising results with respect to the gathered developer feedback.Comment: 24th IEEE International Conference on Software Analysis, Evolution, and Reengineerin

    Online Journals: Utility of ToCs vs. Fulltext

    Get PDF
    The Caltech Library System (CLS) has maintained an extensive list of online journal websites for several years. The online journal list has grown to over 3000 entries, representing a mixture of free and subscription-based fulltext journals, as well as websites featuring tables of contents and abstracts. During the winter of 1999/2000, the online journals list was converted to an online journals database. Additional user functionality was added, without loss of previous features. In a previous study, search engines were employed to map the adoption rates of online journals into the web pages of research groups and individuals on the Caltech campus. It was established that the vast majority of online journal use on-campus was through the access avenues presented by the library, the online catalog and the online journals database. One of the new features introduced in the online journals database was an ability to limit displays to journals containing fulltext. Anecdotal evidence has been less than clear-cut with regard to the utility of non-fulltext resources. This study will allow for a thorough analysis of the question with hard data. It should be feasible to determine if there are discipline-based preferences or if personal preferences are the controlling factor. Analysis of the web server logs will also allow for a direct comparison of user preferences for searching and browsing. Again, we expect to be able to determine if there is a subject-specific bias or if behaviors are more individually idiosyncratic. Results of the study will inform the further development of the CLS online journal efforts - database development, online journal promotion, new candidates for licensing. The technologies employed in this project are well documented and may be exploited by other libraries seeking to gather empirical data for collection decisions and web development efforts

    Motivated proteins: a web application for studying small three-dimensional protein motifs

    Get PDF
    <b>BACKGROUND:</b> Small loop-shaped motifs are common constituents of the three-dimensional structure of proteins. Typically they comprise between three and seven amino acid residues, and are defined by a combination of dihedral angles and hydrogen bonding partners. The most abundant of these are alphabeta-motifs, asx-motifs, asx-turns, beta-bulges, beta-bulge loops, beta-turns, nests, niches, Schellmann loops, ST-motifs, ST-staples and ST-turns.We have constructed a database of such motifs from a range of high-quality protein structures and built a web application as a visual interface to this. <b>DESCRIPTION:</b> The web application, Motivated Proteins, provides access to these 12 motifs (with 48 sub-categories) in a database of over 400 representative proteins. Queries can be made for specific categories or sub-categories of motif, motifs in the vicinity of ligands, motifs which include part of an enzyme active site, overlapping motifs, or motifs which include a particular amino acid sequence. Individual proteins can be specified, or, where appropriate, motifs for all proteins listed. The results of queries are presented in textual form as an (X)HTML table, and may be saved as parsable plain text or XML. Motifs can be viewed and manipulated either individually or in the context of the protein in the Jmol applet structural viewer. Cartoons of the motifs imposed on a linear representation of protein secondary structure are also provided. Summary information for the motifs is available, as are histograms of amino acid distribution, and graphs of dihedral angles at individual positions in the motifs. <b>CONCLUSION:</b> Motivated Proteins is a publicly and freely accessible web application that enables protein scientists to study small three-dimensional motifs without requiring knowledge of either Structured Query Language or the underlying database schem

    Building an Archive with Saada

    Full text link
    Saada transforms a set of heterogeneous FITS files or VOTables of various categories (images, tables, spectra ...) in a database without writing code. Databases created with Saada come with a rich Web interface and an Application Programming Interface (API). They support the four most common VO services. Such databases can mix various categories of data in multiple collections. They allow a direct access to the original data while providing a homogenous view thanks to an internal data model compatible with the characterization axis defined by the VO. The data collections can be bound to each other with persistent links making relevant browsing paths and allowing data-mining oriented queries.Comment: 18 pages, 5 figures Special VO issu
    • …
    corecore