444 research outputs found

    Skyline matching: absolute localisation for planetary exploration rovers

    Get PDF
    Skyline matching is a technique for absolute localisation framed in the category of autonomous long-range exploration. Absolute localisation becomes crucial for planetary exploration to recalibrate position during long traverses or to estimate position with no a-priori information. In this project, a skyline matching algorithm is proposed, implemented and evaluated using real acquisitions and simulated data. The function is based on comparing the skyline extracted from rover images and orbital data. The results are promising but intensive testing on more real data is needed to further characterize the algorithm

    Towards an effective processing of XML keyword query

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Computational Framework for Data-Independent Acquisition Proteomics.

    Full text link
    Mass spectrometry (MS) is one of the main techniques for high throughput discovery- and targeted-based proteomics experiments. The most popular method for MS data acquisition has been data dependent acquisition (DDA) strategy which primarily selects high abundance peptides for MS/MS sequencing. DDA incorporates stochastic data acquisitions to avoid repetitive sequencing of same peptide, resulting in relatively irreproducible results for low abundance peptides between experiments. Data independent acquisition (DIA), in which peptide fragment signals are systematically acquired, is emerging as a promising alternative to address the DDA's stochasticity. DIA results in more complex signals, posing computational challenges for complex sample and high-throughput analysis. As a result, targeted extraction which requires pre-existing spectral libraries has been the most commonly used approach for automated DIA data analysis. However, building spectral libraries requires additional amount of analysis time and sample materials which are the major barriers for most research groups. In my dissertation, I develop a computational tool called DIA-Umpire, which includes computational and signal processing algorithms to enable untargeted DIA identification and quantification analysis without any prior spectral library. In the first study, a signal feature detection algorithm is developed to extract and assemble peptide precursor and fragment signals into pseudo MS/MS spectra which can be analyzed by the existing DDA untargeted analysis tools. This novel step enables direct and untargeted (spectral library-free) DIA identification analysis and we show the performance using complex samples including human cell lysate and glycoproteomics datasets. In the second study, a hybrid approach is developed to further improve the DIA quantification sensitivity and reproducibility. The performance of DIA-Umpire quantification approach is demonstrated using an affinity-purification mass spectrometry experiment for protein-protein interaction analysis. Lastly, in the third study, I improve the DIA-Umpire pipeline for data obtained from the Orbitrap family of mass spectrometers. Using public datasets, I show that the improved version of DIA-Umpire is capable of highly sensitive, untargeted analysis of DIA data for the data generated using Orbitrap family of mass spectrometers. The dissertation work addresses the barriers of DIA analysis and should facilitate the adoption of DIA strategy for a broad range of discovery proteomics applications.PhDBioinformaticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/120699/1/tsouc_1.pd

    Maximizing Insight from Modern Economic Analysis

    Full text link
    The last decade has seen a growing trend of economists exploring how to extract different economic insight from "big data" sources such as the Web. As economists move towards this model of analysis, their traditional workflow starts to become infeasible. The amount of noisy data from which to draw insights presents data management challenges for economists and limits their ability to discover meaningful information. This leads to economists needing to invest a great deal of energy in training to be data scientists (a catch-all role that has grown to describe the usage of statistics, data mining, and data management in the big data age), with little time being spent on applying their domain knowledge to the problem at hand. We envision an ideal workflow that generates accurate and reliable results, where results are generated in near-interactive time, and systems handle the "heavy lifting" required for working with big data. This dissertation presents several systems and methodologies that bring economists closer to this ideal workflow, helping them address many of the challenges faced in transitioning to working with big data sources like the Web. To help users generate accurate and reliable results, we present approaches to identifying relevant predictors in nowcasting applications, as well as methods for identifying potentially invalid nowcasting models and their inputs. We show how a streamlined workflow, combined with pruning and shared computation, can help handle the heavy lifting of big data analysis, allowing users to generate results in near-interactive time. We also present a novel user model and architecture for helping users avoid undesirable bias when doing data preparation: users interactively define constraints for transformation code and the data that the code produces, and an explain-and-repair system satisfies these constraints as best it can, also providing an explanation for any problems along the way. These systems combined represent a unified effort to streamline the transition for economists to this new big data workflow.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/144007/1/dol_1.pd

    Why-Query Support in Graph Databases

    Get PDF
    In the last few decades, database management systems became powerful tools for storing large amount of data and executing complex queries over them. In addition to extended functionality, novel types of databases appear like triple stores, distributed databases, etc. Graph databases implementing the property-graph model belong to this development branch and provide a new way for storing and processing data in the form of a graph with nodes representing some entities and edges describing connections between them. This consideration makes them suitable for keeping data without a rigid schema for use cases like social-network processing or data integration. In addition to a flexible storage, graph databases provide new querying possibilities in the form of path queries, detection of connected components, pattern matching, etc. However, the schema flexibility and graph queries come with additional costs. With limited knowledge about data and little experience in constructing the complex queries, users can create such ones, which deliver unexpected results. Forced to debug queries manually and overwhelmed by the amount of query constraints, users can get frustrated by using graph databases. What is really needed, is to improve usability of graph databases by providing debugging and explaining functionality for such situations. We have to assist users in the discovery of what were the reasons of unexpected results and what can be done in order to fix them. The unexpectedness of result sets can be expressed in terms of their size or content. In the first case, users have to solve the empty-answer, too-many-, or too-few-answers problems. In the second case, users care about the result content and miss some expected answers or wonder about presence of some unexpected ones. Considering the typical problems of receiving no or too many results by querying graph databases, in this thesis we focus on investigating the problems of the first group, whose solutions are usually represented by why-empty, why-so-few, and why-so-many queries. Our objective is to extend graph databases with debugging functionality in the form of why-queries for unexpected query results on the example of pattern matching queries, which are one of general graph-query types. We present a comprehensive analysis of existing debugging tools in the state-of-the-art research and identify their common properties. From them, we formulate the following features of why-queries, which we discuss in this thesis, namely: holistic support of different cardinality-based problems, explanation of unexpected results and query reformulation, comprehensive analysis of explanations, and non-intrusive user integration. To support different cardinality-based problems, we develop methods for explaining no, too few, and too many results. To cover different kinds of explanations, we present two types: subgraph- and modification-based explanations. The first type identifies the reasons of unexpectedness in terms of query subgraphs and delivers differential graphs as answers. The second one reformulates queries in such a way that they produce better results. Considering graph queries to be complex structures with multiple constraints, we investigate different ways of generating explanations starting from the most general one that considers only a query topology through coarse-grained rewriting up to fine-grained modification that allows fine changes of predicates and topology. To provide a comprehensive analysis of explanations, we propose to compare them on three levels including a syntactic description, a content, and a size of a result set. In order to deliver user-aware explanations, we discuss two models for non-intrusive user integration in the generation process. With the techniques proposed in this thesis, we are able to provide fundamentals for debugging of pattern-matching queries, which deliver no, too few, or too many results, in graph databases implementing the property-graph model

    Multifaceted Geotagging for Streaming News

    Get PDF
    News sources on the Web generate constant streams of information, describing the events that shape our world. In particular, geography plays a key role in the news, and understanding the geographic information present in news allows for its useful spatial browsing and retrieval. This process of understanding is called geotagging, and involves first finding in the document all textual references to geographic locations, known as toponyms, and second, assigning the correct lat/long values to each toponym, steps which are termed toponym recognition and toponym resolution, respectively. These steps are difficult due to ambiguities in natural language: some toponyms share names with non-location entities, and further, a given toponym can have many location interpretations. Removing these ambiguities is crucial for successful geotagging. To this end, geotagging methods are described which were developed for streaming news. First, a spatio-textual search engine named STEWARD, and an interactive map-based news browsing system named NewsStand are described, which feature geotaggers as central components, and served as motivating systems and experimental testbeds for developing geotagging methods. Next, a geotagging methodology is presented that follows a multifaceted approach involving a variety of techniques. First, a multifaceted toponym recognition process is described that uses both rule-based and machine learning–based methods to ensure high toponym recall. Next, various forms of toponym resolution evidence are explored. One such type of evidence is lists of toponyms, termed comma groups, whose toponyms share a common thread in their geographic properties that enables correct resolution. In addition to explicit evidence, authors take advantage of the implicit geographic knowledge of their audiences. Understanding the local places known by an audience, termed its local lexicon, affords great performance gains when geotagging articles from local newspapers, which account for the vast majority of news on the Web. Finally, considering windows of text of varying size around each toponym, termed adaptive context, allows for a tradeoff between geotagging execution speed and toponym resolution accuracy. Extensive experimental evaluations of all the above methods, using existing and two newly-created, large corpora of streaming news, show great performance gains over several competing prominent geotagging methods

    Internally Symmetrical Stwintrons and Related Canonical Introns in Hypoxylaceae Species

    Get PDF
    Spliceosomal introns are pervasive in eukaryotes. Intron gains and losses have occurred throughout evolution, but the origin of new introns is unclear. Stwintrons are complex intervening sequences where one of the sequence elements (5′-donor, lariat branch point element or 3′-acceptor) necessary for excision of a U2 intron (external intron) is itself interrupted by a second (internal) U2 intron. In Hypoxylaceae, a family of endophytic fungi, we uncovered scores of donor-disrupted stwintrons with striking sequence similarity among themselves and also with canonical introns. Intron–exon structure comparisons suggest that these stwintrons have proliferated within diverging taxa but also give rise to proliferating canonical introns in some genomes. The proliferated (stw)introns have integrated seamlessly at novel gene positions. The recently proliferated (stw)introns appear to originate from a conserved ancestral stwintron characterised by terminal inverted repeats (45–55 nucleotides), a highly symmetrical structure that may allow the formation of a double-stranded intron RNA molecule. No short tandem duplications flank the putatively inserted intervening sequences, which excludes a DNA transposition-based mechanism of proliferation. It is tempting to suggest that this highly symmetrical structure may have a role in intron proliferation by (an)other mechanism(s)
    corecore