11,826 research outputs found
Comparison of chemical clustering methods using graph- and fingerprint-based similarity measures
This paper compares several published methods for clustering chemical structures, using both graph- and fingerprint-based similarity measures. The clusterings from each method were compared to determine the degree of cluster overlap. Each method was also evaluated on how well it grouped structures into clusters possessing a non-trivial substructural commonality. The methods which employ adjustable parameters were tested to determine the stability of each parameter for datasets of varying size and composition. Our experiments suggest that both graph- and fingerprint-based similarity measures can be used effectively for generating chemical clusterings; it is also suggested that the CAST and Yin–Chen methods, suggested recently for the clustering of gene expression patterns, may also prove effective for the clustering of 2D chemical structures
Chemoinformatics Research at the University of Sheffield: A History and Citation Analysis
This paper reviews the work of the Chemoinformatics Research Group in the Department of Information Studies at the University of Sheffield, focusing particularly on the work carried out in the period 1985-2002. Four major research areas are discussed, these involving the development of methods for: substructure searching in databases of three-dimensional structures, including both rigid and flexible molecules; the representation and searching of the Markush structures that occur in chemical patents; similarity searching in databases of both two-dimensional and three-dimensional structures; and compound selection and the design of combinatorial libraries. An analysis of citations to 321 publications from the Group shows that it attracted a total of 3725 residual citations during the period 1980-2002. These citations appeared in 411 different journals, and involved 910 different citing organizations from 54 different countries, thus demonstrating the widespread impact of the Group's work
Constraining the Number of Positive Responses in Adaptive, Non-Adaptive, and Two-Stage Group Testing
Group testing is a well known search problem that consists in detecting the
defective members of a set of objects O by performing tests on properly chosen
subsets (pools) of the given set O. In classical group testing the goal is to
find all defectives by using as few tests as possible. We consider a variant of
classical group testing in which one is concerned not only with minimizing the
total number of tests but aims also at reducing the number of tests involving
defective elements. The rationale behind this search model is that in many
practical applications the devices used for the tests are subject to
deterioration due to exposure to or interaction with the defective elements. In
this paper we consider adaptive, non-adaptive and two-stage group testing. For
all three considered scenarios, we derive upper and lower bounds on the number
of "yes" responses that must be admitted by any strategy performing at most a
certain number t of tests. In particular, for the adaptive case we provide an
algorithm that uses a number of "yes" responses that exceeds the given lower
bound by a small constant. Interestingly, this bound can be asymptotically
attained also by our two-stage algorithm, which is a phenomenon analogous to
the one occurring in classical group testing. For the non-adaptive scenario we
give almost matching upper and lower bounds on the number of "yes" responses.
In particular, we give two constructions both achieving the same asymptotic
bound. An interesting feature of one of these constructions is that it is an
explicit construction. The bounds for the non-adaptive and the two-stage cases
follow from the bounds on the optimal sizes of new variants of d-cover free
families and (p,d)-cover free families introduced in this paper, which we
believe may be of interest also in other contexts
Learning Immune-Defectives Graph through Group Tests
This paper deals with an abstraction of a unified problem of drug discovery
and pathogen identification. Pathogen identification involves identification of
disease-causing biomolecules. Drug discovery involves finding chemical
compounds, called lead compounds, that bind to pathogenic proteins and
eventually inhibit the function of the protein. In this paper, the lead
compounds are abstracted as inhibitors, pathogenic proteins as defectives, and
the mixture of "ineffective" chemical compounds and non-pathogenic proteins as
normal items. A defective could be immune to the presence of an inhibitor in a
test. So, a test containing a defective is positive iff it does not contain its
"associated" inhibitor. The goal of this paper is to identify the defectives,
inhibitors, and their "associations" with high probability, or in other words,
learn the Immune Defectives Graph (IDG) efficiently through group tests. We
propose a probabilistic non-adaptive pooling design, a probabilistic two-stage
adaptive pooling design and decoding algorithms for learning the IDG. For the
two-stage adaptive-pooling design, we show that the sample complexity of the
number of tests required to guarantee recovery of the inhibitors, defectives,
and their associations with high probability, i.e., the upper bound, exceeds
the proposed lower bound by a logarithmic multiplicative factor in the number
of items. For the non-adaptive pooling design too, we show that the upper bound
exceeds the proposed lower bound by at most a logarithmic multiplicative factor
in the number of items.Comment: Double column, 17 pages. Updated with tighter lower bounds and other
minor edit
Visual and computational analysis of structure-activity relationships in high-throughput screening data
Novel analytic methods are required to assimilate the large volumes of structural and bioassay data generated by combinatorial chemistry and high-throughput screening programmes in the pharmaceutical and agrochemical industries. This paper reviews recent work in visualisation and data mining that can be used to develop structure-activity relationships from such chemical/biological datasets
- …