Search CORE

37 research outputs found

BClass: A Bayesian Approach Based on Mixture Models for Clustering and Classification of Heterogeneous Biological Data

Author: Arturo Medrano-Soto
J. Andres Christen
Julio Collado-vides
Publication venue
Publication date
Field of study

Based on mixture models, we present a Bayesian method (called BClass) to classify biological entities (e.g. genes) when variables of quite heterogeneous nature are analyzed. Various statistical distributions are used to model the continuous/categorical data commonly produced by genetic experiments and large-scale genomic projects. We calculate the posterior probability of each entry to belong to each element (group) in the mixture. In this way, an original set of heterogeneous variables is transformed into a set of purely homogeneous characteristics represented by the probabilities of each entry to belong to the groups. The number of groups in the analysis is controlled dynamically by rendering the groups as 'alive' and 'dormant' depending upon the number of entities classified within them. Using standard Metropolis-Hastings and Gibbs sampling algorithms, we constructed a sampler to approximate posterior moments and grouping probabilities. Since this method does not require the definition of similarity measures, it is especially suitable for data mining and knowledge discovery in biological databases. We applied BClass to classify genes in RegulonDB, a database specialized in information about the transcriptional regulation of gene expression in the bacterium Escherichia coli. The classification obtained is consistent with current knowledge and allowed prediction of missing values for a number of genes. BClass is object-oriented and fully programmed in Lisp-Stat. The output grouping probabilities are analyzed and interpreted using graphical (dynamically linked plots) and query-based approaches. We discuss the advantages of using Lisp-Stat as a programming language as well as the problems we faced when the data volume increased exponentially due to the ever-growing number of genomic projects.

Research Papers in Economics

Evolutionary, structural and functional relationships revealed by comparative analysis of syntenic genes in Rhizobiales

Author: Aguilar Alejandro
Díaz Rafael
Guerrero Gabriela
Medrano-Soto Arturo
Mora Jaime
Peralta Humberto
Villalobos Miguel Angel
Publication venue: BioMed Central
Publication date: 01/01/2005
Field of study

BACKGROUND: Comparative genomics has provided valuable insights into the nature of gene sequence variation and chromosomal organization of closely related bacterial species. However, questions about the biological significance of gene order conservation, or synteny, remain open. Moreover, few comprehensive studies have been reported for rhizobial genomes. RESULTS: We analyzed the genomic sequences of four fast growing Rhizobiales (Sinorhizobium meliloti, Agrobacterium tumefaciens, Mesorhizobium loti and Brucella melitensis). We made a comprehensive gene classification to define chromosomal orthologs, genes with homologs in other replicons such as plasmids, and those which were species-specific. About two thousand genes were predicted to be orthologs in each chromosome and about 80% of these were syntenic. A striking gene colinearity was found in pairs of organisms and a large fraction of the microsyntenic regions and operons were similar. Syntenic products showed higher identity levels than non-syntenic ones, suggesting a resistance to sequence variation due to functional constraints; also, an unusually high fraction of syntenic products contained membranal segments. Syntenic genes encode a high proportion of essential cell functions, presented a high level of functional relationships and a very low horizontal gene transfer rate. The sequence variability of the proteins can be considered the species signature in response to specific niche adaptation. Comparatively, an analysis with genomes of Enterobacteriales showed a different gene organization but gave similar results in the synteny conservation, essential role of syntenic genes and higher functional linkage among the genes of the microsyntenic regions. CONCLUSION: Syntenic bacterial genes represent a commonly evolved group. They not only reveal the core chromosomal segments present in the last common ancestor and determine the metabolic characteristics shared by these microorganisms, but also show resistance to sequence variation and rearrangement, possibly due to their essential character. In Rhizobiales and Enterobacteriales, syntenic genes encode a high proportion of essential cell functions and presented a high level of functional relationships

Springer - Publisher Connector

PubMed Central

Characterization of the Tetraspan Junctional Complex (4JC) superfamily

Author: Aasum
Al Khamici
Amy Chou
Andre Lee
Arthur
Arturo Medrano-Soto
Ayad
Biegert
Bonander
Capaldo
Cavinder
Chen
Cummins
Dahl
Falk
Filipovic
Findley
Fu
George
Gopal
Harikrishnan Kuppusamykrishnan
Herve
Hou
Hua
Ikeda
Jaspers
Kevin J. Hendargo
Krause
Krissinel
Krug
Kuppusamykrishnan
Ma
MacLean
Maeda
Maksim A. Shlykov
McCaffrey
McCaffrey
Milatz
Milton H. Saier
Moore
Morrow
Ogun
Oshima
Oshima
Overgaard
Phillips
Reddy
Reddy
Reiners
Rose
Saier
Saier
Saier
Saier
Stebbings
Suga
Suzuki
Thompson
Tovaranonte
Vamsee S. Reddy
Winn
Wu
Yaffe
Yen
Yen
Yen
Zhai
Zhai
Zhao
Publication venue: 'Elsevier BV'
Publication date: 01/03/2017
Field of study

Connexins or innexins form gap junctions, while claudins and occludins form tight junctions. In this study, statistical data, derived using novel software, indicate that these four junctional protein families and eleven other families of channel and channel auxiliary proteins are related by common descent and comprise the Tetraspan (4 TMS) Junctional Complex (4JC) Superfamily. These proteins all share similar 4 transmembrane α-helical (TMS) topologies. Evidence is presented that they arose via an intragenic duplication event, whereby a 2 TMS-encoding genetic element duplicated tandemly to give 4 TMS proteins. In cases where high resolution structural data were available, the conclusion of homology was supported by conducting structural comparisons. Phylogenetic trees reveal the probable relationships of these 15 families to each other. Long homologues containing fusions to other recognizable domains as well as internally duplicated or fused domains are reported. Large “fusion” proteins containing 4JC domains proved to fall predominantly into family-specific patterns as follows: (1) the 4JC domain was N-terminal; (2) the 4JC domain was C-terminal; (3) the 4JC domain was duplicated or occasionally triplicated and (4) mixed fusion types were present. Our observations provide insight into the evolutionary origins and subfunctions of these proteins as well as guides concerning their structural and functional relationships

Crossref

PubMed Central

eScholarship - University of California

Bioinformatic characterization of the Anoctamin Superfamily of Ca2+-activated ion channels and lipid scramblases

Author: Medrano-Soto Arturo,
Publication venue
Publication date: 11/09/2018
Field of study

Ezid

Inferring molecular function: contributions from functional linkages

Author: Eisenberg David
Medrano-Soto Arturo
Pal Debnath
Publication venue: Elsevier Science
Publication date
Field of study

In the current era of high-throughput sequencing and structure determination, functional annotation has become a bottleneck in biomedical science. Here, we show that automated inference of molecular function using functional linkages among genes increases the accuracy of functional assignments by >= 8% and enriches functional descriptions in >= 34% of top assignments. Furthermore, biochemical literature supports >80% of automated inferences for previously unannotated proteins. These results emphasize the benefit of incorporating functional linkages in protein annotation

Open Access Repository of IISc Research Publications

BClass: A Bayesian Approach Based on Mixture Models for Clustering and Classification of Heterogeneous Biological Data

Author: Arturo Medrano-Soto
J. Andres Christen
Julio Collado-Vides
Publication venue: Foundation for Open Access Statistics
Publication date: 01/01/2004
Field of study

Directory of Open Access Journals

Journal of Statistical Software

Recommended from our members

The Membrane Attack Complex/Perforin Superfamily.

Author: Medrano-Soto Arturo
Moreno-Hagelsieb Gabriel
Saier Milton H
Vitug Bennett
Publication venue: eScholarship, University of California
Publication date: 01/01/2017
Field of study

The membrane attack complex/perforin (MACPF) superfamily consists of a diverse group of proteins involved in bacterial pathogenesis and sporulation as well as eukaryotic immunity, embryonic development, neural migration and fruiting body formation. The present work shows that the evolutionary relationships between the members of the superfamily, previously suggested by comparison of their tertiary structures, can also be supported by analyses of their primary structures. The superfamily includes the MACPF family (TC 1.C.39), the cholesterol-dependent cytolysin (CDC) family (TC 1.C.12.1 and 1.C.12.2) and the pleurotolysin pore-forming (pleurotolysin B) family (TC 1.C.97.1), as revealed by expansion of each family by comparison against a large protein database, and by the comparisons of their hidden Markov models. Clustering analyses demonstrated grouping of the CDC homologues separately from the 12 MACPF subfamilies, which also grouped separately from the pleurotolysin B family. Members of the MACPF superfamily revealed a remarkably diverse range of proteins spanning eukaryotic, bacterial, and archaeal taxonomic domains, with notable variations in protein domain architectures. Our strategy should also be helpful in putting together other highly divergent protein families

eScholarship - University of California