1,334 research outputs found
Discriminative and informative features for biomolecular text mining with ensemble feature selection
Motivation: In the field of biomolecular text mining, black box behavior of machine learning systems currently limits understanding of the true nature of the predictions. However, feature selection (FS) is capable of identifying the most relevant features in any supervised learning setting, providing insight into the specific properties of the classification algorithm. This allows us to build more accurate classifiers while at the same time bridging the gap between the black box behavior and the end-user who has to interpret the results
Conservation of Nonsense-Mediated mRNA Decay Complex Components Throughout Eukaryotic Evolution
Nonsense-mediated mRNA decay (NMD) is an essential eukaryotic process regulating transcript quality and abundance, and is involved in diverse processes including brain development and plant defenses. Although some of the NMD machinery is conserved between kingdoms, little is known about its evolution. Phosphorylation of the core NMD component UPF1 is critical for NMD and is regulated in mammals by the SURF complex (UPF1, SMG1 kinase, SMG8, SMG9 and eukaryotic release factors). However, since SMG1 is reportedly missing from the genomes of fungi and the plant Arabidopsis thaliana, it remains unclear how UPF1 is activated outside the metazoa. We used comparative genomics to determine the conservation of the NMD pathway across eukaryotic evolution. We show that SURF components are present in all major eukaryotic lineages, including fungi, suggesting that in addition to UPF1 and SMG1, SMG8 and SMG9 also existed in the last eukaryotic common ancestor, 1.8 billion years ago. However, despite the ancient origins of the SURF complex, we also found that SURF factors have been independently lost across the Eukarya, pointing to genetic buffering within the essential NMD pathway. We infer an ancient role for SURF in regulating UPF1, and the intriguing possibility of undiscovered NMD regulatory pathways
The 18S ribosomal RNA sequence of the sea anemone <i>Ammonia sulcata</i> and its evolutionary position among other eukaryotes
Evolutionary trees based on partial small ribosomal subunit RNA sequences of 22 metazoa species have been published [(1988) Science 239, 748-753]. In these trees, cnidarians (Radiata) seemed to have evolved independently from the Bilateria, which is in contradiction with the general evolutionary view. In order to further investigate this problem, the complete srRNA sequence of the sea anemone Ammonia sulcata was determined and evolutionary trees were constructed using a matrix optimization method. In the tree thus obtained the sea anemone and Bilateria together form a monophyletic cluster, with the sea anemone forming the first line of descent of the metazoan group
Module networks revisited: computational assessment and prioritization of model predictions
The solution of high-dimensional inference and prediction problems in
computational biology is almost always a compromise between mathematical theory
and practical constraints such as limited computational resources. As time
progresses, computational power increases but well-established inference
methods often remain locked in their initial suboptimal solution. We revisit
the approach of Segal et al. (2003) to infer regulatory modules and their
condition-specific regulators from gene expression data. In contrast to their
direct optimization-based solution we use a more representative centroid-like
solution extracted from an ensemble of possible statistical models to explain
the data. The ensemble method automatically selects a subset of most
informative genes and builds a quantitatively better model for them. Genes
which cluster together in the majority of models produce functionally more
coherent modules. Regulators which are consistently assigned to a module are
more often supported by literature, but a single model always contains many
regulator assignments not supported by the ensemble. Reliably detecting
condition-specific or combinatorial regulation is particularly hard in a single
optimum but can be achieved using ensemble averaging.Comment: 8 pages REVTeX, 6 figure
Oidium neolycopersici: Intra-specific variability inferred from AFLP analysis and relationship with closely related powdery mildew fungi infecting various plant species
Previous works indicated a considerable variation in the pathogenicity, virulence, and host range of Oidium neolycopersici isolates causing tomato powdery mildew epidemics in many parts of the world. In this study, rDNA internal transcribed spacer (ITS) sequences, and amplified fragment length polymorphism (AFLP) patterns were analyzed in 17 O. neolycopersici samples collected in Europe, North America, and Japan, including those which overcame some of the tomato major resistance genes. The ITS sequences were identical in all 10 samples tested and were also identical to ITS sequences of eight previously studied O. neolycopersici specimens. The AFLP analysis revealed a high genetic diversity in O. neolycopersici and indicated that all 17 samples represented different genotypes. This might suggest the existence of either a yet unrevealed sexual reproduction or other genetic mechanisms that maintain a high genetic variability in O. neolycopersici. No clear correlation was found between the virulence and the AFLP patterns of the O. neolycopersici isolates studied. The relationship between O. neolycopersici and powdery mildew anamorphs infecting Aquilegia vulgaris, Chelidonium majus, Passiflora caerulea, and Sedum alboroseum was also investigated. These anamorphs are morphologically indistinguishable from and phylogenetically closely related to O. neolycopersici. The cross-inoculation tests and the analyses of ITS sequences and AFLP patterns jointly indicated that the powdery mildew anamorphs collected from the above mentioned plant species all represent distinct, but closely related species according to the phylogenetic species recognition. All these species were pathogenic only to their original host plant species, except O. neolycopersici which infected S. alboroseum, tobacco, petunia, and Arabidopsis thaliana, in addition to tomato, in cross-inoculation tests. This is the first genome-wide study that investigates the relationships among powdery mildews that are closely related based on ITS sequences and morphology. The results indicate that morphologically indistinguishable powdery mildews that differed in only one to five single nucleotide positions in their ITS region are to be considered as different taxa with distinct host ranges
Functional significance may underlie the taxonomic utility of single amino acid substitutions in conserved proteins
We hypothesized that some amino acid substitutions in conserved proteins that are strongly fixed by critical functional roles would show lineage-specific distributions. As an example of an archetypal conserved eukaryotic protein we considered the active site of ß-tubulin. Our analysis identified one amino acid substitution—ß-tubulin F224—which was highly lineage specific. Investigation of ß-tubulin for other phylogenetically restricted amino acids identified several with apparent specificity for well-defined phylogenetic groups. Intriguingly, none showed specificity for “supergroups” other than the unikonts. To understand why, we analysed the ß-tubulin Neighbor-Net and demonstrated a fundamental division between core ß-tubulins (plant-like) and divergent ß-tubulins (animal and fungal). F224 was almost completely restricted to the core ß-tubulins, while divergent ß-tubulins possessed Y224. Thus, our specific example offers insight into the restrictions associated with the co-evolution of ß-tubulin during the radiation of eukaryotes, underlining a fundamental dichotomy between F-type, core ß-tubulins and Y-type, divergent ß-tubulins. More broadly our study provides proof of principle for the taxonomic utility of critical amino acids in the active sites of conserved proteins
Prediction of a gene regulatory network linked to prostate cancer from gene expression, microRNA and clinical data
Motivation: Cancer is a complex disease, triggered by mutations in multiple genes and pathways. There is a growing interest in the application of systems biology approaches to analyze various types of cancer-related data to understand the overwhelming complexity of changes induced by the disease
- …