15 research outputs found
Quasi-SLCA based Keyword Query Processing over Probabilistic XML Data
The probabilistic threshold query is one of the most common queries in
uncertain databases, where a result satisfying the query must be also with
probability meeting the threshold requirement. In this paper, we investigate
probabilistic threshold keyword queries (PrTKQ) over XML data, which is not
studied before. We first introduce the notion of quasi-SLCA and use it to
represent results for a PrTKQ with the consideration of possible world
semantics. Then we design a probabilistic inverted (PI) index that can be used
to quickly return the qualified answers and filter out the unqualified ones
based on our proposed lower/upper bounds. After that, we propose two efficient
and comparable algorithms: Baseline Algorithm and PI index-based Algorithm. To
accelerate the performance of algorithms, we also utilize probability density
function. An empirical study using real and synthetic data sets has verified
the effectiveness and the efficiency of our approaches
Child Prime Label Approaches to Evaluate XML Structured Queries
The adoption of the eXtensible Markup Language (XML) as the standard format to store and exchange semi-structure data has been gaining momentum. The growing number of XML documents leads to the need for appropriate XML querying algorithms which are able to retrieve XML data efficiently. Due to the importance of twig pattern matching in XML retrieval systems, finding all matching occurrences of a tree pattern query in an XML document is often considered as a specific task for XML databases as well as a core operation in XML query processing. This thesis presents a design and implementation of a new indexing technique, called the Child Prime Label (CPL) which exploits the property of prime numbers to identify Parent-Child (P-C) edges in twig pattern queries (TPQs) during query evaluation. The CPL approach can be incorporated efficiently within the existing labelling schemes. The major contributions of this thesis can be seen as a set of novel twig matching algorithms which apply the CPL approach and focus on reducing the overhead of storing useless elements and performing unnecessary computations during the output enumeration. The research presented here is the first to provide an efficient and general solution for TPQs containing ordering constraints and positional predicates specified by the XML query languages. To evaluate the CPL approaches, the holistic model was implemented as an experimental prototype in which the approaches proposed are compared against state-of-the-art holistic twig algorithms. Extensive performance studies on various real-world and artificial datasets were conducted to demonstrate the significant improvement of the CPL approaches over the previous indexing and querying methods. The experimental results demonstrate the validity and improvements of the new algorithms over other related methods on common various subclasses of TPQs. Moreover, the scalability tests reveal that the new algorithms are more suitable for processing large XML datasets
A Labeling DOM-Based Tree Walking Algorithm for Mapping XML Documents into Relational Databases
XML has emerged as the standard format for
representing and exchanging data on the World Wide Web. For
practical purposes, it is found to be critical to have efficient
mechanisms to store and query XML data, as well as to exploit the
full power of this new technology. Several researchers have
proposed to use relational databases to store and query XML data.
With the understanding the limitations of current approaches, this
thesis aims to propose an algorithm for automatic mapping XML
documents to RDBMS with XML-API as a database utility. The
algorithm uses best fit auto mapping technique, and dynamic
shredding, of a specified selected XML document type (datacentric,
document-centric, and mixed documents).e. The propose
algorithm use DOM(Data Object Model) as a warehouse and stack
as a data structure to mapping the XML document into relational
database and reconstructing the XML document from the relational
database. The experiment study show that the algorithm mapping
document and reconstructing it again well. Finally, the algorithm
compare with other algorithms the result is good in time and
efficiency, also the algorithm complexity is O(11n+2)
Evolutionary genomics : statistical and computational methods
This open access book addresses the challenge of analyzing and understanding the evolutionary dynamics of complex biological systems at the genomic level, and elaborates on some promising strategies that would bring us closer to uncovering of the vital relationships between genotype and phenotype. After a few educational primers, the book continues with sections on sequence homology and alignment, phylogenetic methods to study genome evolution, methodologies for evaluating selective pressures on genomic sequences as well as genomic evolution in light of protein domain architecture and transposable elements, population genomics and other omics, and discussions of current bottlenecks in handling and analyzing genomic data. Written for the highly successful Methods in Molecular Biology series, chapters include the kind of detail and expert implementation advice that lead to the best results. Authoritative and comprehensive, Evolutionary Genomics: Statistical and Computational Methods, Second Edition aims to serve both novices in biology with strong statistics and computational skills, and molecular biologists with a good grasp of standard mathematical concepts, in moving this important field of study forward
Evolutionary Genomics
This open access book addresses the challenge of analyzing and understanding the evolutionary dynamics of complex biological systems at the genomic level, and elaborates on some promising strategies that would bring us closer to uncovering of the vital relationships between genotype and phenotype. After a few educational primers, the book continues with sections on sequence homology and alignment, phylogenetic methods to study genome evolution, methodologies for evaluating selective pressures on genomic sequences as well as genomic evolution in light of protein domain architecture and transposable elements, population genomics and other omics, and discussions of current bottlenecks in handling and analyzing genomic data. Written for the highly successful Methods in Molecular Biology series, chapters include the kind of detail and expert implementation advice that lead to the best results. Authoritative and comprehensive, Evolutionary Genomics: Statistical and Computational Methods, Second Edition aims to serve both novices in biology with strong statistics and computational skills, and molecular biologists with a good grasp of standard mathematical concepts, in moving this important field of study forward
La bioinformática al servicio de la genómica
Este trabajo de tesis aborda distintos ámbitos de aplicación de técnicas bioinformáticas a la resolución de problemas surgidos del manejo, análisis, almacenamiento y consulta de grandes volúmenes de datos genómicos. Los principales retos a los que esta tesis ha tratado de dar respuesta han sido los siguientes:
- Procesar la información más básica de las tecnologías de genotipado de alto rendimiento, a fin de permitir obtener de manera rápida y sencilla una serie de parámetros y estadísticas básicas características de un experimento independientemente de la tecnología elegida.
- Facilitar la publicación y consulta de resultados de genotipado a baja y media escala, tanto de SNPs como de STRs, así como su interacción con los repositorios de variabilidad accesibles públicamente.
- Estudiar la viabilidad de gestionar localmente un repositorio propio de variabilidad humana basado en los recursos disponibles, tanto de información externa como de infraestructura interna.
- Transferir el conocimiento obtenido. Aportar herramientas existentes o soluciones ad hoc a los problemas que pueda presentar la investigación genética en el campo de la biología computacional