16 research outputs found

    String Searching with Ranking Constraints and Uncertainty

    Get PDF
    Strings play an important role in many areas of computer science. Searching pattern in a string or string collection is one of the most classic problems. Different variations of this problem such as document retrieval, ranked document retrieval, dictionary matching has been well studied. Enormous growth of internet, large genomic projects, sensor networks, digital libraries necessitates not just efficient algorithms and data structures for the general string indexing, but indexes for texts with fuzzy information and support for queries with different constraints. This dissertation addresses some of these problems and proposes indexing solutions. One such variation is document retrieval query for included and excluded/forbidden patterns, where the objective is to retrieve all the relevant documents that contains the included patterns and does not contain the excluded patterns. We continue the previous work done on this problem and propose more efficient solution. We conjecture that any significant improvement over these results is highly unlikely. We also consider the scenario when the query consists of more than two patterns. The forbidden pattern problem suffers from the drawback that linear space (in words) solutions are unlikely to yield a solution better than O(root(n/occ)) per document reporting time, where n is the total length of the documents and occ is the number of output documents. Continuing this path, we introduce a new variation, namely document retrieval with forbidden extension query, where the forbidden pattern is an extension of the included pattern.We also address the more general top-k version of the problem, which retrieves the top k documents, where the ranking is based on PageRank relevance metric. This problem finds motivation from search applications. It also holds theoretical interest as we show that the hardness of forbidden pattern problem is alleviated in this problem. We achieve linear space and optimal query time for this variation. We also propose succinct indexes for both these problems. Position restricted pattern matching considers the scenario where only part of the text is searched. We propose succinct index for this problem with efficient query time. An important application for this problem stems from searching in genomic sequences, where only part of the gene sequence is searched for interesting patterns. The problem of computing discriminating(resp. generic) words is to report all minimal(resp. maximal) extensions of a query pattern which are contained in at most(resp. at least) a given number of documents. These problems are motivated from applications in computational biology, text mining and automated text classification. We propose succinct indexes for these problems. Strings with uncertainty and fuzzy information play an important role in increasingly many applications. We propose a general framework for indexing uncertain strings such that a deterministic query string can be searched efficiently. String matching becomes a probabilistic event when a string contains uncertainty, i.e. each position of the string can have different probable characters with associated probability of occurrence for each character. Such uncertain strings are prevalent in various applications such as biological sequence data, event monitoring and automatic ECG annotations. We consider two basic problems of string searching, namely substring searching and string listing. We formulate these well known problems for uncertain strings paradigm and propose exact and approximate solution for them. We also discuss a constrained variation of orthogonal range searching. Given a set of points, the task of orthogonal range searching is to build a data structure such that all the points inside a orthogonal query region can be reported. We introduce a new variation, namely shared constraint range searching which naturally arises in constrained pattern matching applications. Shared constraint range searching is a special four sided range reporting query problem where two constraints has sharing among them, effectively reducing the number of independent constraints. For this problem, we propose a linear space index that can match the best known bound for three dimensional dominance reporting problem. We extend our data structure in the external memory model

    Query Processing on Attributed Graphs

    Get PDF
    An attributed graph is a powerful tool for modeling a variety of information networks. It is not only able to represent relationships between objects easily, but it also allows every vertex and edge to have its attributes. Hence, a lot of data, such as the web, sensor networks, biological networks, economic graphs, and social networks, are modeled as attributed graphs. Due to the popularity of attributed graphs, the study of attributed graphs has caught attentions of researchers. For example, there are studies of attributed graph OLAP, query engine, clustering, summary, constrained pattern matching query, and graph visualization, etc. However, to the best of our knowledge, the studies of topological and attribute relationships between vertices on attributed graphs have not drawn much attentions of researchers. Given the high expressive power and popularity of attributed graph, in this thesis, we define and study the processing of three new attributed graph queries, which would help users to understand the topological and attribute relationships between entities in attributed graphs. For example, a reachability query on a social network can tell whether two persons can be connected given certain attribute constraints; a reachability query on a biological network can tell whether a compound can be transformed to another compound under given chemical reaction conditions; a How-to-Reach query can tell why the answers of the above two reachability query are negative; a visualizable path summary query can offer an overall picture of topological and attribute relationship between any two vertices in attributed graphs. Except for the proposed query types in this thesis, we believe that there is still penalty of meaningful attributed graph query types that have not been proposed and studied by the database and data mining community since an attributed graph is a very rich source of information. Through this thesis, we hope to draw people's attentions on attributed graph query processing so that more hidden information contained in attributed graphs can be queried and discovered

    Solving Gravity Anomaly Matching Problem Under Large Initial Errors in Gravity Aided Navigation by Using an Affine Transformation Based Artificial Bee Colony Algorithm

    Get PDF
    Gravity aided inertial navigation system (GAINS), which uses earth gravitational anomaly field for navigation, holds strong potential as an underwater navigation system. The gravity matching algorithm is one of the key factors in GAINS. Existing matching algorithms cannot guarantee the matching accuracy in the matching algorithms based gravity aided navigation when the initial errors are large. Evolutionary algorithms, which are mostly have the ability of global optimality and fast convergence, can be used to solve the gravity matching problem under large initial errors. However, simply applying evolutionary algorithms to GAINS may lead to false matching. Therefore, in order to deal with the underwater gravity matching problem, it is necessary to improve the traditional evolutionary algorithms. In this paper, an affine transformation based artificial bee colony (ABC) algorithm, which can greatly improve the positioning precision under large initial errors condition, is developed. The proposed algorithm introduces affine transformation to both initialization process and evolutionary process of ABC algorithm. The single-point matching strategy is replaced by the strategy of matching a sequence of several consecutive position vectors. In addition, several constraints are introduced to the process of evolution by using the output characteristics of the inertial navigation system (INS). Simulations based on the actual gravity anomaly base map have been performed for the validation of the proposed algorithm

    Static Computation and Reflection

    Get PDF
    Thesis (PhD) - Indiana University, Computer Sciences, 2008Most programming languages do not allow programs to inspect their static type information or perform computations on it. C++, however, lets programmers write template metaprograms, which enable programs to encode static information, perform compile-time computations, and make static decisions about run-time behavior. Many C++ libraries and applications use template metaprogramming to build specialized abstraction mechanisms, implement domain-specific safety checks, and improve run-time performance. Template metaprogramming is an emergent capability of the C++ type system, and the C++ language specification is informal and imprecise. As a result, template metaprogramming often involves heroic programming feats and often leads to code that is difficult to read and maintain. Furthermore, many template-based code generation and optimization techniques rely on particular compiler implementations, rather than language semantics, for performance gains. Motivated by the capabilities and techniques of C++ template metaprogramming, this thesis documents some common programming patterns, including static computation, type analysis, generative programming, and the encoding of domain-specific static checks. It also documents notable shortcomings to current practice, including limited support for reflection, semantic ambiguity, and other issues that arise from the pioneering nature of template metaprogramming. Finally, this thesis presents the design of a foundational programming language, motivated by the analysis of template metaprogramming, that allows programs to statically inspect type information, perform computations, and generate code. The language is specified as a core calculus and its capabilities are presented in an idealized setting

    Proceedings of Sixth International Workshop on Unification

    Full text link
    Swiss National Science Foundation; Austrian Federal Ministry of Science and Research; Deutsche Forschungsgemeinschaft (SFB 314); Christ Church, Oxford; Oxford University Computing Laborator

    LitCrit: exploring intentions as a basis for automated feedback on Related Work.

    Get PDF
    Learning the skill of academic writing is critical for post-graduate (PG) students to be successful, yet many struggle to master the required standard. Feedback can play a formative role in developing these skills, but many students do not find sufficiently helpful the kinds of feedback available to them. As the Related Work section is known to be particularly difficult for PG students to master that is the focus of this thesis. To date, models of academic writing have been built on observational studies of academic articles. In contrast, we carry out a user study to explore what content experts look for in Related Work and how this differs from PG students. We claim that by understanding what experts look for in Related Work and what aspects PG students struggle with, a useful author intention model can be developed to support writing feedback for Related Work sections. Our work demonstrates reliable annotation of the model intentions. Developing on existing algorithms, designed to identify rhetorical intentions in academic writing, we build a supervised machine learning classifier, showing how features focused on Related Work sections improve recognition of content aspects. Carrying out a study to rate the quality of Related Work, we demonstrate that the model is a good proxy for predicting quality, validating the choice of intentions in our model. In addition to recognising author intentions, we automate the generation of feedback based on observations of intentions that are present and missing, taking into account areas that PG students struggle to recognise. The thesis also contributes a new prototype writing analytic tool, called LitCrit, that supports visualising the intention narrative of Related Work and presents feedback. We claim this visualisation approach changes the PG student’s perception of Related Work, and demonstrate through a user study that it does draw attention to aspects previously missed bringing PG student responses in line with experts. Finally, we explore the performance of our classifier, originally set within the Computational Linguistics discipline, to that of Computer Graphics. This shows us that while performance may be lower when care is taken to understand those features which are discipline dependent, there is scope for improvement. Also, while a discipline may have the same intentions present in a section, their structural presentation may differ impacting feature choice
    corecore