12,454 research outputs found

    Chemoinformatics Research at the University of Sheffield: A History and Citation Analysis

    Get PDF
    This paper reviews the work of the Chemoinformatics Research Group in the Department of Information Studies at the University of Sheffield, focusing particularly on the work carried out in the period 1985-2002. Four major research areas are discussed, these involving the development of methods for: substructure searching in databases of three-dimensional structures, including both rigid and flexible molecules; the representation and searching of the Markush structures that occur in chemical patents; similarity searching in databases of both two-dimensional and three-dimensional structures; and compound selection and the design of combinatorial libraries. An analysis of citations to 321 publications from the Group shows that it attracted a total of 3725 residual citations during the period 1980-2002. These citations appeared in 411 different journals, and involved 910 different citing organizations from 54 different countries, thus demonstrating the widespread impact of the Group's work

    Recent developments in chemoinformatics education

    Get PDF
    Chemoinformatics techniques are increasingly being used to analyse the huge volumes of chemical and biological data resulting from combinatorial synthesis and high-throughput screening programmes. Scientists with both the chemical and the computing skills required to carry out such analyses are currently in very short supply, this resulting in the establishment of MSc programmes for the training of chemoinformatics specialists

    Derandomized Construction of Combinatorial Batch Codes

    Full text link
    Combinatorial Batch Codes (CBCs), replication-based variant of Batch Codes introduced by Ishai et al. in STOC 2004, abstracts the following data distribution problem: nn data items are to be replicated among mm servers in such a way that any kk of the nn data items can be retrieved by reading at most one item from each server with the total amount of storage over mm servers restricted to NN. Given parameters m,c,m, c, and kk, where cc and kk are constants, one of the challenging problems is to construct cc-uniform CBCs (CBCs where each data item is replicated among exactly cc servers) which maximizes the value of nn. In this work, we present explicit construction of cc-uniform CBCs with Ω(mc1+1k)\Omega(m^{c-1+{1 \over k}}) data items. The construction has the property that the servers are almost regular, i.e., number of data items stored in each server is in the range [ncmn2ln(4m),ncm+n2ln(4m)][{nc \over m}-\sqrt{{n\over 2}\ln (4m)}, {nc \over m}+\sqrt{{n \over 2}\ln (4m)}]. The construction is obtained through better analysis and derandomization of the randomized construction presented by Ishai et al. Analysis reveals almost regularity of the servers, an aspect that so far has not been addressed in the literature. The derandomization leads to explicit construction for a wide range of values of cc (for given mm and kk) where no other explicit construction with similar parameters, i.e., with n=Ω(mc1+1k)n = \Omega(m^{c-1+{1 \over k}}), is known. Finally, we discuss possibility of parallel derandomization of the construction

    Multiround private information retrieval: Capacity and storage overhead

    Get PDF
    Private information retrieval (PIR) is the problem of retrieving one message out of KK messages from NN non-communicating replicated databases, where each database stores all KK messages, in such a way that each database learns no information about which message is being retrieved. The capacity of PIR is the maximum number of bits of desired information per bit of downloaded information among all PIR schemes. The capacity has recently been characterized for PIR as well as several of its variants. In every case it is assumed that all the queries are generated by the user simultaneously. Here we consider multiround PIR, where the queries in each round are allowed to depend on the answers received in previous rounds. We show that the capacity of multiround PIR is the same as the capacity of single-round PIR. The result is generalized to also include TT -privacy constraints. Combined with previous results, this shows that there is no capacity advantage from multiround over single-round schemes, non-linear over linear schemes or from ϵ\epsilon -error over zero-error schemes. However, we show through an example that there is an advantage in terms of storage overhead. We provide an example of a multiround, non-linear, ϵ\epsilon -error PIR scheme that requires a strictly smaller storage overhead than the best possible with single-round, linear, zero-error PIR schemes

    Quality of Service for Information Access

    Get PDF
    Information is available in many forms from different sources, in distributed locations; access to information is supported by networks of varying performance; the cost of accessing and transporting the information varies for both the source and the transport route. Users who vary in their preferences, background knowledge required to interpret the information and motivation for accessing it, gather information to perform many different tasks. This position paper outlines some of these variations in information provision and access, and explores the impact these variations have on the user’s task performance, and the possibilities they make available to adapt the user interface for the presentation of information

    Prospects and limitations of full-text index structures in genome analysis

    Get PDF
    The combination of incessant advances in sequencing technology producing large amounts of data and innovative bioinformatics approaches, designed to cope with this data flood, has led to new interesting results in the life sciences. Given the magnitude of sequence data to be processed, many bioinformatics tools rely on efficient solutions to a variety of complex string problems. These solutions include fast heuristic algorithms and advanced data structures, generally referred to as index structures. Although the importance of index structures is generally known to the bioinformatics community, the design and potency of these data structures, as well as their properties and limitations, are less understood. Moreover, the last decade has seen a boom in the number of variant index structures featuring complex and diverse memory-time trade-offs. This article brings a comprehensive state-of-the-art overview of the most popular index structures and their recently developed variants. Their features, interrelationships, the trade-offs they impose, but also their practical limitations, are explained and compared

    Fourteenth Biennial Status Report: März 2017 - February 2019

    No full text

    Some Applications of Coding Theory in Computational Complexity

    Full text link
    Error-correcting codes and related combinatorial constructs play an important role in several recent (and old) results in computational complexity theory. In this paper we survey results on locally-testable and locally-decodable error-correcting codes, and their applications to complexity theory and to cryptography. Locally decodable codes are error-correcting codes with sub-linear time error-correcting algorithms. They are related to private information retrieval (a type of cryptographic protocol), and they are used in average-case complexity and to construct ``hard-core predicates'' for one-way permutations. Locally testable codes are error-correcting codes with sub-linear time error-detection algorithms, and they are the combinatorial core of probabilistically checkable proofs
    corecore