Search CORE

16 research outputs found

String Searching with Ranking Constraints and Uncertainty

Author: Biswas Sudip
Publication venue: LSU Digital Commons
Publication date: 01/01/2015
Field of study

Strings play an important role in many areas of computer science. Searching pattern in a string or string collection is one of the most classic problems. Different variations of this problem such as document retrieval, ranked document retrieval, dictionary matching has been well studied. Enormous growth of internet, large genomic projects, sensor networks, digital libraries necessitates not just efficient algorithms and data structures for the general string indexing, but indexes for texts with fuzzy information and support for queries with different constraints. This dissertation addresses some of these problems and proposes indexing solutions. One such variation is document retrieval query for included and excluded/forbidden patterns, where the objective is to retrieve all the relevant documents that contains the included patterns and does not contain the excluded patterns. We continue the previous work done on this problem and propose more efficient solution. We conjecture that any significant improvement over these results is highly unlikely. We also consider the scenario when the query consists of more than two patterns. The forbidden pattern problem suffers from the drawback that linear space (in words) solutions are unlikely to yield a solution better than O(root(n/occ)) per document reporting time, where n is the total length of the documents and occ is the number of output documents. Continuing this path, we introduce a new variation, namely document retrieval with forbidden extension query, where the forbidden pattern is an extension of the included pattern.We also address the more general top-k version of the problem, which retrieves the top k documents, where the ranking is based on PageRank relevance metric. This problem finds motivation from search applications. It also holds theoretical interest as we show that the hardness of forbidden pattern problem is alleviated in this problem. We achieve linear space and optimal query time for this variation. We also propose succinct indexes for both these problems. Position restricted pattern matching considers the scenario where only part of the text is searched. We propose succinct index for this problem with efficient query time. An important application for this problem stems from searching in genomic sequences, where only part of the gene sequence is searched for interesting patterns. The problem of computing discriminating(resp. generic) words is to report all minimal(resp. maximal) extensions of a query pattern which are contained in at most(resp. at least) a given number of documents. These problems are motivated from applications in computational biology, text mining and automated text classification. We propose succinct indexes for these problems. Strings with uncertainty and fuzzy information play an important role in increasingly many applications. We propose a general framework for indexing uncertain strings such that a deterministic query string can be searched efficiently. String matching becomes a probabilistic event when a string contains uncertainty, i.e. each position of the string can have different probable characters with associated probability of occurrence for each character. Such uncertain strings are prevalent in various applications such as biological sequence data, event monitoring and automatic ECG annotations. We consider two basic problems of string searching, namely substring searching and string listing. We formulate these well known problems for uncertain strings paradigm and propose exact and approximate solution for them. We also discuss a constrained variation of orthogonal range searching. Given a set of points, the task of orthogonal range searching is to build a data structure such that all the points inside a orthogonal query region can be reported. We introduce a new variation, namely shared constraint range searching which naturally arises in constrained pattern matching applications. Shared constraint range searching is a special four sided range reporting query problem where two constraints has sharing among them, effectively reducing the number of independent constraints. For this problem, we propose a linear space index that can match the best known bound for three dimensional dominance reporting problem. We extend our data structure in the external memory model

Louisiana State University

Query Processing on Attributed Graphs

Author: Yung Ka Wai
Publication venue
Publication date: 31/01/2018
Field of study

An attributed graph is a powerful tool for modeling a variety of information networks. It is not only able to represent relationships between objects easily, but it also allows every vertex and edge to have its attributes. Hence, a lot of data, such as the web, sensor networks, biological networks, economic graphs, and social networks, are modeled as attributed graphs. Due to the popularity of attributed graphs, the study of attributed graphs has caught attentions of researchers. For example, there are studies of attributed graph OLAP, query engine, clustering, summary, constrained pattern matching query, and graph visualization, etc. However, to the best of our knowledge, the studies of topological and attribute relationships between vertices on attributed graphs have not drawn much attentions of researchers. Given the high expressive power and popularity of attributed graph, in this thesis, we define and study the processing of three new attributed graph queries, which would help users to understand the topological and attribute relationships between entities in attributed graphs. For example, a reachability query on a social network can tell whether two persons can be connected given certain attribute constraints; a reachability query on a biological network can tell whether a compound can be transformed to another compound under given chemical reaction conditions; a How-to-Reach query can tell why the answers of the above two reachability query are negative; a visualizable path summary query can offer an overall picture of topological and attribute relationship between any two vertices in attributed graphs. Except for the proposed query types in this thesis, we believe that there is still penalty of meaningful attributed graph query types that have not been proposed and studied by the database and data mining community since an attributed graph is a very rich source of information. Through this thesis, we hope to draw people's attentions on attributed graph query processing so that more hidden information contained in attributed graphs can be queried and discovered

D-Scholarship@Pitt

Recommended from our members

Chromatin accessibility plays a key role in selective targeting of Hox proteins.

Author: Fischer Bettina
Porcelli Damiano
Russell Steven
White Robert
Publication venue: Genome Biol
Publication date: 19/11/2018
Field of study

BACKGROUND: Hox transcription factors specify segmental diversity along the anterior-posterior body axis in metazoans. While the different Hox family members show clear functional specificity in vivo, they all show similar binding specificity in vitro and a satisfactory understanding of in vivo Hox target selectivity is still lacking. RESULTS: Using transient transfection in Kc167 cells, we systematically analyze the binding of all eight Drosophila Hox proteins. We find that Hox proteins show considerable binding selectivity in vivo even in the absence of canonical Hox cofactors Extradenticle and Homothorax. Hox binding selectivity is strongly associated with chromatin accessibility, being highest in less accessible chromatin. Individual Hox proteins exhibit different propensities to bind less accessible chromatin, and high binding selectivity is associated with high-affinity binding regions, leading to a model where Hox proteins derive binding selectivity through affinity-based competition with nucleosomes. Extradenticle/Homothorax cofactors generally facilitate Hox binding, promoting binding to regions in less accessible chromatin but with little effect on the overall selectivity of Hox targeting. These cofactors collaborate with Hox proteins in opening chromatin, in contrast to the pioneer factor, Glial cells missing, which facilitates Hox binding by independently generating accessible chromatin regions. CONCLUSIONS: These studies indicate that chromatin accessibility plays a key role in Hox selectivity. We propose that relative chromatin accessibility provides a basis for subtle differences in binding specificity and affinity to generate significantly different sets of in vivo genomic targets for different Hox proteins.The work was supported by the Biotechnology and Biological Sciences Research Council (Grant BB/M007081/1

Apollo (Cambridge)

FigShare

Solving Gravity Anomaly Matching Problem Under Large Initial Errors in Gravity Aided Navigation by Using an Affine Transformation Based Artificial Bee Colony Algorithm

Author: Haijun Shao
Lingjuan Miao
Tian Dai
Yongsheng Shi
Publication venue: 'Frontiers Media SA'
Publication date: 01/05/2019
Field of study

Gravity aided inertial navigation system (GAINS), which uses earth gravitational anomaly field for navigation, holds strong potential as an underwater navigation system. The gravity matching algorithm is one of the key factors in GAINS. Existing matching algorithms cannot guarantee the matching accuracy in the matching algorithms based gravity aided navigation when the initial errors are large. Evolutionary algorithms, which are mostly have the ability of global optimality and fast convergence, can be used to solve the gravity matching problem under large initial errors. However, simply applying evolutionary algorithms to GAINS may lead to false matching. Therefore, in order to deal with the underwater gravity matching problem, it is necessary to improve the traditional evolutionary algorithms. In this paper, an affine transformation based artificial bee colony (ABC) algorithm, which can greatly improve the positioning precision under large initial errors condition, is developed. The proposed algorithm introduces affine transformation to both initialization process and evolutionary process of ABC algorithm. The single-point matching strategy is replaced by the strategy of matching a sequence of several consecutive position vectors. In addition, several constraints are introduced to the process of evolution by using the output characteristics of the inertial navigation system (INS). Simulations based on the actual gravity anomaly base map have been performed for the validation of the proposed algorithm

Directory of Open Access Journals

Static Computation and Reflection

Author: Garcia Ronald
Publication venue: [Bloomington, Ind.] : Indiana University
Publication date: 01/01/2008
Field of study

Thesis (PhD) - Indiana University, Computer Sciences, 2008Most programming languages do not allow programs to inspect their static type information or perform computations on it. C++, however, lets programmers write template metaprograms, which enable programs to encode static information, perform compile-time computations, and make static decisions about run-time behavior. Many C++ libraries and applications use template metaprogramming to build specialized abstraction mechanisms, implement domain-specific safety checks, and improve run-time performance. Template metaprogramming is an emergent capability of the C++ type system, and the C++ language specification is informal and imprecise. As a result, template metaprogramming often involves heroic programming feats and often leads to code that is difficult to read and maintain. Furthermore, many template-based code generation and optimization techniques rely on particular compiler implementations, rather than language semantics, for performance gains. Motivated by the capabilities and techniques of C++ template metaprogramming, this thesis documents some common programming patterns, including static computation, type analysis, generative programming, and the encoding of domain-specific static checks. It also documents notable shortcomings to current practice, including limited support for reflection, semantic ambiguity, and other issues that arise from the pioneering nature of template metaprogramming. Finally, this thesis presents the design of a foundational programming language, motivated by the analysis of template metaprogramming, that allows programs to statically inspect type information, perform computations, and generate code. The language is specified as a core calculus and its capabilities are presented in an idealized setting

IUScholarWorks (University of Indiana)

Recommended from our members

DOE EPSCoR Initiative in Structural and computational Biology/Bioinformatics

Author: Wallace Susan S.
Publication venue: 'Office of Scientific and Technical Information (OSTI)'
Publication date: 21/02/2008
Field of study

The overall goal of the DOE EPSCoR Initiative in Structural and Computational Biology was to enhance the competiveness of Vermont research in these scientific areas. To develop self-sustaining infrastructure, we increased the critical mass of faculty, developed shared resources that made junior researchers more competitive for federal research grants, implemented programs to train graduate and undergraduate students who participated in these research areas and provided seed money for research projects. During the time period funded by this DOE initiative: (1) four new faculty were recruited to the University of Vermont using DOE resources, three in Computational Biology and one in Structural Biology; (2) technical support was provided for the Computational and Structural Biology facilities; (3) twenty-two graduate students were directly funded by fellowships; (4) fifteen undergraduate students were supported during the summer; and (5) twenty-eight pilot projects were supported. Taken together these dollars resulted in a plethora of published papers, many in high profile journals in the fields and directly impacted competitive extramural funding based on structural or computational biology resulting in 49 million dollars awarded in grants (Appendix I), a 600% return on investment by DOE, the State and University

UNT Digital Library

Proceedings of Sixth International Workshop on Unification

Author: Snyder Wayne
Publication venue: Boston University Computer Science Department
Publication date: 01/01/1993
Field of study

Swiss National Science Foundation; Austrian Federal Ministry of Science and Research; Deutsche Forschungsgemeinschaft (SFB 314); Christ Church, Oxford; Oxford University Computing Laborator

Boston University Institutional Repository (OpenBU)

LitCrit: exploring intentions as a basis for automated feedback on Related Work.

Author: Casey Arlene Jane
Publication venue: The University of Edinburgh
Publication date: 25/06/2020
Field of study

Learning the skill of academic writing is critical for post-graduate (PG) students to be successful, yet many struggle to master the required standard. Feedback can play a formative role in developing these skills, but many students do not find sufficiently helpful the kinds of feedback available to them. As the Related Work section is known to be particularly difficult for PG students to master that is the focus of this thesis. To date, models of academic writing have been built on observational studies of academic articles. In contrast, we carry out a user study to explore what content experts look for in Related Work and how this differs from PG students. We claim that by understanding what experts look for in Related Work and what aspects PG students struggle with, a useful author intention model can be developed to support writing feedback for Related Work sections. Our work demonstrates reliable annotation of the model intentions. Developing on existing algorithms, designed to identify rhetorical intentions in academic writing, we build a supervised machine learning classifier, showing how features focused on Related Work sections improve recognition of content aspects. Carrying out a study to rate the quality of Related Work, we demonstrate that the model is a good proxy for predicting quality, validating the choice of intentions in our model. In addition to recognising author intentions, we automate the generation of feedback based on observations of intentions that are present and missing, taking into account areas that PG students struggle to recognise. The thesis also contributes a new prototype writing analytic tool, called LitCrit, that supports visualising the intention narrative of Related Work and presents feedback. We claim this visualisation approach changes the PG student’s perception of Related Work, and demonstrate through a user study that it does draw attention to aspects previously missed bringing PG student responses in line with experts. Finally, we explore the performance of our classifier, originally set within the Computational Linguistics discipline, to that of Computer Graphics. This shows us that while performance may be lower when care is taken to understand those features which are discipline dependent, there is scope for improvement. Also, while a discipline may have the same intentions present in a section, their structural presentation may differ impacting feature choice

Edinburgh Research Archive