12,454 research outputs found
Chemoinformatics Research at the University of Sheffield: A History and Citation Analysis
This paper reviews the work of the Chemoinformatics Research Group in the Department of Information Studies at the University of Sheffield, focusing particularly on the work carried out in the period 1985-2002. Four major research areas are discussed, these involving the development of methods for: substructure searching in databases of three-dimensional structures, including both rigid and flexible molecules; the representation and searching of the Markush structures that occur in chemical patents; similarity searching in databases of both two-dimensional and three-dimensional structures; and compound selection and the design of combinatorial libraries. An analysis of citations to 321 publications from the Group shows that it attracted a total of 3725 residual citations during the period 1980-2002. These citations appeared in 411 different journals, and involved 910 different citing organizations from 54 different countries, thus demonstrating the widespread impact of the Group's work
Recent developments in chemoinformatics education
Chemoinformatics techniques are increasingly being used to analyse the huge volumes of chemical and biological data resulting from combinatorial synthesis and high-throughput screening programmes. Scientists with both the chemical and the computing skills required to carry out such analyses are currently in very short supply, this resulting in the establishment of MSc programmes for the training of chemoinformatics specialists
Derandomized Construction of Combinatorial Batch Codes
Combinatorial Batch Codes (CBCs), replication-based variant of Batch Codes
introduced by Ishai et al. in STOC 2004, abstracts the following data
distribution problem: data items are to be replicated among servers in
such a way that any of the data items can be retrieved by reading at
most one item from each server with the total amount of storage over
servers restricted to . Given parameters and , where and
are constants, one of the challenging problems is to construct -uniform CBCs
(CBCs where each data item is replicated among exactly servers) which
maximizes the value of . In this work, we present explicit construction of
-uniform CBCs with data items. The
construction has the property that the servers are almost regular, i.e., number
of data items stored in each server is in the range . The
construction is obtained through better analysis and derandomization of the
randomized construction presented by Ishai et al. Analysis reveals almost
regularity of the servers, an aspect that so far has not been addressed in the
literature. The derandomization leads to explicit construction for a wide range
of values of (for given and ) where no other explicit construction
with similar parameters, i.e., with , is
known. Finally, we discuss possibility of parallel derandomization of the
construction
Multiround private information retrieval: Capacity and storage overhead
Private information retrieval (PIR) is the problem of retrieving one message out of messages from non-communicating replicated databases, where each database stores all messages, in such a way that each database learns no information about which message is being retrieved. The capacity of PIR is the maximum number of bits of desired information per bit of downloaded information among all PIR schemes. The capacity has recently been characterized for PIR as well as several of its variants. In every case it is assumed that all the queries are generated by the user simultaneously. Here we consider multiround PIR, where the queries in each round are allowed to depend on the answers received in previous rounds. We show that the capacity of multiround PIR is the same as the capacity of single-round PIR. The result is generalized to also include -privacy constraints. Combined with previous results, this shows that there is no capacity advantage from multiround over single-round schemes, non-linear over linear schemes or from -error over zero-error schemes. However, we show through an example that there is an advantage in terms of storage overhead. We provide an example of a multiround, non-linear, -error PIR scheme that requires a strictly smaller storage overhead than the best possible with single-round, linear, zero-error PIR schemes
Quality of Service for Information Access
Information is available in many forms from different sources, in distributed locations; access to information is supported by networks of varying performance; the cost of accessing and transporting the information varies for both the source and the transport route. Users who vary in their preferences, background knowledge required to interpret the information and motivation for accessing it, gather information to perform many different tasks. This position paper outlines some of these variations in information provision and access, and explores the impact these variations have on the user’s task performance, and the possibilities they make available to adapt the user interface for the presentation of information
Prospects and limitations of full-text index structures in genome analysis
The combination of incessant advances in sequencing technology producing large amounts of data and innovative bioinformatics approaches, designed to cope with this data flood, has led to new interesting results in the life sciences. Given the magnitude of sequence data to be processed, many bioinformatics tools rely on efficient solutions to a variety of complex string problems. These solutions include fast heuristic algorithms and advanced data structures, generally referred to as index structures. Although the importance of index structures is generally known to the bioinformatics community, the design and potency of these data structures, as well as their properties and limitations, are less understood. Moreover, the last decade has seen a boom in the number of variant index structures featuring complex and diverse memory-time trade-offs. This article brings a comprehensive state-of-the-art overview of the most popular index structures and their recently developed variants. Their features, interrelationships, the trade-offs they impose, but also their practical limitations, are explained and compared
Some Applications of Coding Theory in Computational Complexity
Error-correcting codes and related combinatorial constructs play an important
role in several recent (and old) results in computational complexity theory. In
this paper we survey results on locally-testable and locally-decodable
error-correcting codes, and their applications to complexity theory and to
cryptography.
Locally decodable codes are error-correcting codes with sub-linear time
error-correcting algorithms. They are related to private information retrieval
(a type of cryptographic protocol), and they are used in average-case
complexity and to construct ``hard-core predicates'' for one-way permutations.
Locally testable codes are error-correcting codes with sub-linear time
error-detection algorithms, and they are the combinatorial core of
probabilistically checkable proofs
Recommended from our members
Local search: A guide for the information retrieval practitioner
There are a number of combinatorial optimisation problems in information retrieval in which the use of local search methods are worthwhile. The purpose of this paper is to show how local search can be used to solve some well known tasks in information retrieval (IR), how previous research in the field is piecemeal, bereft of a structure and methodologically flawed, and to suggest more rigorous ways of applying local search methods to solve IR problems. We provide a query based taxonomy for analysing the use of local search in IR tasks and an overview of issues such as fitness functions, statistical significance and test collections when conducting experiments on combinatorial optimisation problems. The paper gives a guide on the pitfalls and problems for IR practitioners who wish to use local search to solve their research issues, and gives practical advice on the use of such methods. The query based taxonomy is a novel structure which can be used by the IR practitioner in order to examine the use of local search in IR
- …