Search CORE

93,077 research outputs found

A Method for Similarity-Based Grouping of Biological Data

Author: A. Doms
J.M. Berg
K.G. Herbert
P.W. Lord
R. Shamir
S.F. Altschul
T. Gabaldon
The Gene Ontology Consortium
V.I. Levenshtein
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2006
Field of study

BClass: A Bayesian Approach Based on Mixture Models for Clustering and Classification of Heterogeneous Biological Data

Author: Arturo Medrano-Soto
J. Andres Christen
Julio Collado-vides
Publication venue
Publication date
Field of study

Based on mixture models, we present a Bayesian method (called BClass) to classify biological entities (e.g. genes) when variables of quite heterogeneous nature are analyzed. Various statistical distributions are used to model the continuous/categorical data commonly produced by genetic experiments and large-scale genomic projects. We calculate the posterior probability of each entry to belong to each element (group) in the mixture. In this way, an original set of heterogeneous variables is transformed into a set of purely homogeneous characteristics represented by the probabilities of each entry to belong to the groups. The number of groups in the analysis is controlled dynamically by rendering the groups as 'alive' and 'dormant' depending upon the number of entities classified within them. Using standard Metropolis-Hastings and Gibbs sampling algorithms, we constructed a sampler to approximate posterior moments and grouping probabilities. Since this method does not require the definition of similarity measures, it is especially suitable for data mining and knowledge discovery in biological databases. We applied BClass to classify genes in RegulonDB, a database specialized in information about the transcriptional regulation of gene expression in the bacterium Escherichia coli. The classification obtained is consistent with current knowledge and allowed prediction of missing values for a number of genes. BClass is object-oriented and fully programmed in Lisp-Stat. The output grouping probabilities are analyzed and interpreted using graphical (dynamically linked plots) and query-based approaches. We discuss the advantages of using Lisp-Stat as a programming language as well as the problems we faced when the data volume increased exponentially due to the ever-growing number of genomic projects.

Research Papers in Economics

Comparison of chemical clustering methods using graph- and fingerprint-based similarity measures

Author: Blankley C.J.
Raymond J.W.
Willett P.
Publication venue: 'Elsevier BV'
Publication date: 01/03/2003
Field of study

This paper compares several published methods for clustering chemical structures, using both graph- and fingerprint-based similarity measures. The clusterings from each method were compared to determine the degree of cluster overlap. Each method was also evaluated on how well it grouped structures into clusters possessing a non-trivial substructural commonality. The methods which employ adjustable parameters were tested to determine the stability of each parameter for datasets of varying size and composition. Our experiments suggest that both graph- and fingerprint-based similarity measures can be used effectively for generating chemical clusterings; it is also suggested that the CAST and Yin–Chen methods, suggested recently for the clustering of gene expression patterns, may also prove effective for the clustering of 2D chemical structures

White Rose Research Online

Incorporating peak grouping information for alignment of multiple liquid chromatography-mass spectrometry datasets

Author: Breitling Rainer
Daly Ronan
Rogers Simon
Wandy Joe
Publication venue: 'Oxford University Press (OUP)'
Publication date: 02/02/2015
Field of study

Motivation: The combination of liquid chromatography and mass spectrometry (LC/MS) has been widely used for large-scale comparative studies in systems biology, including proteomics, glycomics and metabolomics. In almost all experimental design, it is necessary to compare chromatograms across biological or technical replicates and across sample groups. Central to this is the peak alignment step, which is one of the most important but challenging preprocessing steps. Existing alignment tools do not take into account the structural dependencies between related peaks that co-elute and are derived from the same metabolite or peptide. We propose a direct matching peak alignment method for LC/MS data that incorporates related peaks information (within each LC/MS run) and investigate its effect on alignment performance (across runs). The groupings of related peaks necessary for our method can be obtained from any peak clustering method and are built into a pairwise peak similarity score function. The similarity score matrix produced is used by an approximation algorithm for the weighted matching problem to produce the actual alignment result. Results: We demonstrate that related peak information can improve alignment performance. The performance is evaluated on a set of benchmark datasets, where our method performs competitively compared to other popular alignment tools. Availability: The proposed alignment method has been implemented as a stand-alone application in Python, available for download at http://github.com/joewandy/peak-grouping-alignment.</p&gt

PubMed Central

Enlighten

The University of Manchester - Institutional Repository

Evaluation of Minnesota Geographic Classifications Based on Caddisfly (Trichoptera) Data

Author: Houghton David C
Publication venue: ValpoScholar
Publication date: 24/01/2018
Field of study

The ability to partition the variation of faunal assemblages into homogenous units valuable for biomonitoring is referred to as classification strength (CS). In this study, the CSs of three types of geographic classifications: watershed basin, ecological region, and caddisfly region, were compared based on 248 light trap samples of adult caddisflies collected in Minnesota during 1999–2001. The effect on CS of three different levels of taxonomic resolution: family, genus, and species, was also assessed. Primary (broadest possible) a priori classification by watershed basin and ecological region had a lower CS than did secondary classification by these regions. Caddisfly region, an a posteriori classification based directly on caddisfly distribution data, had nearly twice the CS of any a priori classification. CS decreased approximately 20% with a decrease in taxonomic resolution from species to genus, and from genus to family. These results suggest that geographic classification, spatial scale, and taxonomic resolution are all important factors to consider when sampling aquatic insects, and that widely used a priori geographic classifications are not the ideal units for sampling the aquatic biota

Valparaiso University

Recommended from our members

Sun exposure drives Antarctic cryptoendolithic community structure and composition

Author: Coleine Claudia
Onofri Silvano
Selbmann Laura
Stajich Jason E
Zucconi Laura
Publication venue: eScholarship, University of California
Publication date: 20/06/2019
Field of study

AbstractThe harsh environmental conditions of the ice-free regions of Continental Antarctica are considered one of the closest Martian analogues on Earth. There, rocks play a pivotal role as substratum for life and endolithism represents a primary habitat for microorganisms when external environmental conditions become incompatible with active life on rock surfaces. Due to the thermal inertia of rock, the internal airspace of lithic substratum is where microbiota find a protected and buffered microenvironment, allowing life to spread throughout these regions with extreme temperatures and low water availability. The high degree of adaptation and specialization of the endolithic communities makes them highly resistant but scarsely resilient to any external perturbation and thus, any shifts in microbial community composition may serve as early-alarm systems of environmental perturbation, including climate change.Previous research concluded that altitude and distance from sea do not play as driving factors in shaping microbial abundance and diversity, while sun exposure was hypothesized as significant parameter influencing endolithic settlement and development. This study aims to explore our hypothesis that changes in sun exposure translate to shifts in community composition and abundances of main biological compartments (fungi, algae and bacteria) in the Antarctic cryptoendolithic communities. We performed a preliminary molecular survey, based on DGGE and qPCR tecniques, of 48 rocks with varying sun exposure, collected in Victoria Land along an altitudinal transect from 834 to 3100 m a.s.l.Our findings demonstrate that differences in sun radiation between north and south exposure influence temperature of rocks surface, availability of water and metabolic activity and also have significant impact on community composition and microbial abundance

eScholarship - University of California

Grouping of coefficients for the calculation of inter-molecular similarity and dissimilarity using 2D fragment bit-strings

Author: Holliday J.D.
Hu C.-Y.
Willett P.
Publication venue: Bentham Science Publishers
Publication date: 01/03/2002
Field of study

This paper compares 22 different similarity coefficients when they are used for searching databases of 2D fragment bit-strings. Experiments with the National Cancer Institute's AIDS and IDAlert databases show that the coefficients fall into several well-marked clusters, in which the members of a cluster will produce comparable rankings of a set of molecules. These clusters provide a basis for selecting combinations of coefficients for use in data fusion experiments. The results of these experiments provide a simple way of increasing the effectiveness of fragment-based similarity searching systems

MIT Libraries Dome

White Rose Research Online

Grouping of coefficients for the calculation of inter-molecular similarity and dissimilarity using 2D fragment bit-strings

Author: Holliday J.D.
Hu C.-Y.
Willett P.
Publication venue: 'Bentham Science Publishers Ltd.'
Publication date: 01/03/2002
Field of study

White Rose Research Online