24 research outputs found

    A Pattern-Based Approach to Structural Query in the WWW

    Get PDF
    [[abstract]]The World Wide Web (WWW), a hypertext based, distributed collection of documents connected by hypertext links, contains millions of documents, thereby necessitating the ability to control navigation. This study, presents a query language to restrict navigation on the WWW. Query-based access mechanisms represent a fundamental solution to navigation problem in hypertext applications. Queries with structural search capabilities allow users to retrieve collections of information from a hypertext network based on a specification of their structures. In this paper, we propose a pattern-based structural query language for the WWW. This study also defines a set of patterns and uses pattern expressions to specify structural conditions in a query. The proposed pattern-based structural query language integrates the pattern-based query formulation and select-from-where construct of the structural query language (SQL). In addition, schema mapping rules are proposed to map the nodes and links of a hypertext application into objects and associations of an object-oriented database. Objects and their associations in an object-oriented database are uniformly represented by association patterns. A pattern-based structural query in hypertext systems is then translated into A-algebra expressions for query processing

    A Genetic Algorithm For Multiple Sequence Alignment

    Get PDF
    [[abstract]]Multiple sequence alignment is an important tool in molecular sequence analysis. This paper presents genetic algorithms to solve multiple sequence alignments. Several data sets are tested and the experimental results are compared with other methods. We find our approach could obtain good performance in the data sets with high similarity and long sequences.The software can be found in http://rsdb.csie.ncu.edu.tw/tools/msa.htm

    The Repetitive Sequence Database and Mining Putative Regulatory Elements in Gene Promoter Regions

    Get PDF
    [[abstract]]At least 43% of the human genome is occupied by repetitive elements. Moreover, around 51% of the rice genome is occupied by repetitive elements. The analysis of repetitive elements reveals that repetitive elements in our genome may have been very important in the evolutionary genomics. The ? rst part of this study is to describe a database of repetitive elements—RSDB. The RSDB database contains repetitive elements, which are classi? ed into the following categories: exact, tandem, and similar. The interfaces needed to query and show the results and statistical data, such as the relationship between repetitive elements and genes, cross-references of repetitive elements among different organisms, and so on, are provided. The second part of this study then attempts to mine the putative binding site for information on how combinations of the known regulatory sites and overrepresented repetitive elements in RSDB are distributed in the promoter regions of groups of functionally related genes. The overrepresented repetitive elements appearing in the associations are possible transcription factor binding sites. Our proposed approach is applied to Saccharomyces cerevisiae and the promoter regions of Yeast ORFs. The complete contents of RSDB and partial putative binding sites are available to the public at www.rsdb.csie.ncu.edu.tw. The readers may download partial query results

    Applying Genetic Algorithms to Query Optimization in Document Retrieval

    Get PDF
    [[abstract]]Proposes a novel approach to automatically retrieve keywords and then uses genetic algorithms to adapt the keyword weights. Discusses Chinese text retrieval, term frequency rating formulas, vector space models, bigrams, the PAT-tree structure for information retrieval, query vectors, and relevance feedback. (Author/LRW

    Predicting Regulatory Elements in Repetitive Sequences Using Transcription Factor Binding Sites

    Get PDF
    [[abstract]]Repeat sequences are the most abundant ones in the extragenic region of genomes. Biologists have already found a large number of regulatory elements in this region. These elements may profoundly impact the chromatin structure formation in nucleus and also contain important clues in genetic evolution and phylogenic study. This study attempts to mine rules on how combinations of individual binding sites are distributed repeat sequences. The association rules mined would facilitate efforts to identify gene classes regulated by similar mechanisms and accurately predict regulatory elements. Herein, the combinations of transcription factor binding sites in the repeat sequences are obtained and, then, data mining techniques are applied to mine the association rules from the combinations of binding sites. In addition, the discovered associations are further pruned to remove those insignificant associations and obtain a set of discovered associations. Finally, the discovered association rules are used to partially classify the repeat sequences in our repeat database. Experiments on several genomes include C. elegans, human chromosome 22 and yeast

    Modularized Design for Wrappers/Monitors in Data Warehouse Systems

    Get PDF
    [[abstract]]To simplify the task of constructing wrapper/monitor for the information sources in data warehouse systems, we provide a modularized design method to re-use the code. By substituting some parts of wrapper modules, we can re-use the wrapper on a di?erent information source. For each information source, we also develop a toolkit to generate a corresponding monitor. By the method, we can reduce much e?ort to code the monitor component. We also develop a method to map the object-relational schema into relational one. The mapping method helps us make an uniform interface between wrapper and an integrator. ? 2000 Elsevier Science Inc. All rights reserved

    A Mechanism for View Consistency in a Data Warehousing System

    Get PDF
    [[abstract]]Abstract A data warehouse is a repository of integrated information from distributed, autonomous and heterogeneous information sources. Materialized views in the data warehouse must be maintained when updating the data in the information sources. The conventionally used incremental view maintenance algorithm causes the problem of anomalies. In addition, previous attempts to solve anomalies focus on compensating the changes of the view. In this work, a novel method which uses the information already available at the data warehouse instead of performing the compensation is presented. Rules are used to predict information deemed necessary to maintain a materialized view. The information is stored as auxiliary views. The proposed method does not require any data transmission between the information sources and warehouse when a change of a materialized view from an updated message occurs in information sources. Moreover, the method proposed herein saves a significant amount of time when materialized views are incrementally maintained

    MultiProtIdent: Identifying Proteins Using Database Search and Protein-Protein Interactions

    Get PDF
    [[abstract]]Protein identification is important in proteomics. Proteomic analyses based on mass spectra (MS) constitute innovative ways to identify the components of protein complexes. Instruments can obtain the mass spectrum to an accuracy of 0.01 Da or better, but identification errors are inevitable. This study shows a novel tool, MultiProtIdent, which can identify proteins using additional information about protein-protein interactions and protein functional associations. Both single and multiple Peptide Mass Fingerprints (PMFs) are input to MultiProtIdent, which matches the PMFs to a theoretical peptide mass database. The relationships or interactions among proteins are considered to reduce false positives in PMF matching. Experiments to identify protein complexes reveal that MultiProtIdent is highly promising. The website associated with this study is http://dbms104.csie.ncu.edu.tw/

    A Theoretical Aspect of a Stochastic Sketching for Global Optimization

    Get PDF
    [[abstract]]In this paper, we propose Stochastic Sketching method for global optimization based on the simulation of human behavior. Stochastic Sketching models the thought process and strategies of human beings and applying the artificial model to problems. We introduce and discuss concepts and components essential to Stochastic Sketching in detail, including the sampling guide, zooming controller, sketching model, precision threshold, and satisfaction probability. The mathematical foundations of Stochastic Sketching are discussed and a preliminary theoretical base is presented

    A study on using genetic niching for query optimisation in document retrieval

    Get PDF
    International audienceThis paper presents a new genetic approach for query optimisation in document retrieval. The main contribution of the paper is to show the effectiveness of the genetic niching technique to reach multiple relevant regions of the document space. Moreover, suitable merging procedures have been proposed in order to improve the retrieval evaluation. Experimental results obtained using a TREC sub-collection indicate that the proposed approach is promising for applications
    corecore