1,087 research outputs found

    Discriminative and informative features for biomolecular text mining with ensemble feature selection

    Get PDF
    Motivation: In the field of biomolecular text mining, black box behavior of machine learning systems currently limits understanding of the true nature of the predictions. However, feature selection (FS) is capable of identifying the most relevant features in any supervised learning setting, providing insight into the specific properties of the classification algorithm. This allows us to build more accurate classifiers while at the same time bridging the gap between the black box behavior and the end-user who has to interpret the results

    The 18S ribosomal RNA sequence of the sea anemone <i>Ammonia sulcata</i> and its evolutionary position among other eukaryotes

    Get PDF
    Evolutionary trees based on partial small ribosomal subunit RNA sequences of 22 metazoa species have been published [(1988) Science 239, 748-753]. In these trees, cnidarians (Radiata) seemed to have evolved independently from the Bilateria, which is in contradiction with the general evolutionary view. In order to further investigate this problem, the complete srRNA sequence of the sea anemone Ammonia sulcata was determined and evolutionary trees were constructed using a matrix optimization method. In the tree thus obtained the sea anemone and Bilateria together form a monophyletic cluster, with the sea anemone forming the first line of descent of the metazoan group

    Conservation of Nonsense-Mediated mRNA Decay Complex Components Throughout Eukaryotic Evolution

    Get PDF
    Nonsense-mediated mRNA decay (NMD) is an essential eukaryotic process regulating transcript quality and abundance, and is involved in diverse processes including brain development and plant defenses. Although some of the NMD machinery is conserved between kingdoms, little is known about its evolution. Phosphorylation of the core NMD component UPF1 is critical for NMD and is regulated in mammals by the SURF complex (UPF1, SMG1 kinase, SMG8, SMG9 and eukaryotic release factors). However, since SMG1 is reportedly missing from the genomes of fungi and the plant Arabidopsis thaliana, it remains unclear how UPF1 is activated outside the metazoa. We used comparative genomics to determine the conservation of the NMD pathway across eukaryotic evolution. We show that SURF components are present in all major eukaryotic lineages, including fungi, suggesting that in addition to UPF1 and SMG1, SMG8 and SMG9 also existed in the last eukaryotic common ancestor, 1.8 billion years ago. However, despite the ancient origins of the SURF complex, we also found that SURF factors have been independently lost across the Eukarya, pointing to genetic buffering within the essential NMD pathway. We infer an ancient role for SURF in regulating UPF1, and the intriguing possibility of undiscovered NMD regulatory pathways

    Module networks revisited: computational assessment and prioritization of model predictions

    Full text link
    The solution of high-dimensional inference and prediction problems in computational biology is almost always a compromise between mathematical theory and practical constraints such as limited computational resources. As time progresses, computational power increases but well-established inference methods often remain locked in their initial suboptimal solution. We revisit the approach of Segal et al. (2003) to infer regulatory modules and their condition-specific regulators from gene expression data. In contrast to their direct optimization-based solution we use a more representative centroid-like solution extracted from an ensemble of possible statistical models to explain the data. The ensemble method automatically selects a subset of most informative genes and builds a quantitatively better model for them. Genes which cluster together in the majority of models produce functionally more coherent modules. Regulators which are consistently assigned to a module are more often supported by literature, but a single model always contains many regulator assignments not supported by the ensemble. Reliably detecting condition-specific or combinatorial regulation is particularly hard in a single optimum but can be achieved using ensemble averaging.Comment: 8 pages REVTeX, 6 figure

    Oidium neolycopersici: Intra-specific variability inferred from AFLP analysis and relationship with closely related powdery mildew fungi infecting various plant species

    Get PDF
    Previous works indicated a considerable variation in the pathogenicity, virulence, and host range of Oidium neolycopersici isolates causing tomato powdery mildew epidemics in many parts of the world. In this study, rDNA internal transcribed spacer (ITS) sequences, and amplified fragment length polymorphism (AFLP) patterns were analyzed in 17 O. neolycopersici samples collected in Europe, North America, and Japan, including those which overcame some of the tomato major resistance genes. The ITS sequences were identical in all 10 samples tested and were also identical to ITS sequences of eight previously studied O. neolycopersici specimens. The AFLP analysis revealed a high genetic diversity in O. neolycopersici and indicated that all 17 samples represented different genotypes. This might suggest the existence of either a yet unrevealed sexual reproduction or other genetic mechanisms that maintain a high genetic variability in O. neolycopersici. No clear correlation was found between the virulence and the AFLP patterns of the O. neolycopersici isolates studied. The relationship between O. neolycopersici and powdery mildew anamorphs infecting Aquilegia vulgaris, Chelidonium majus, Passiflora caerulea, and Sedum alboroseum was also investigated. These anamorphs are morphologically indistinguishable from and phylogenetically closely related to O. neolycopersici. The cross-inoculation tests and the analyses of ITS sequences and AFLP patterns jointly indicated that the powdery mildew anamorphs collected from the above mentioned plant species all represent distinct, but closely related species according to the phylogenetic species recognition. All these species were pathogenic only to their original host plant species, except O. neolycopersici which infected S. alboroseum, tobacco, petunia, and Arabidopsis thaliana, in addition to tomato, in cross-inoculation tests. This is the first genome-wide study that investigates the relationships among powdery mildews that are closely related based on ITS sequences and morphology. The results indicate that morphologically indistinguishable powdery mildews that differed in only one to five single nucleotide positions in their ITS region are to be considered as different taxa with distinct host ranges

    Plasmodium knowlesi malaria in Vietnam: some clarifications

    Get PDF
    A recently published comment on a report of Plasmodium knowlesi infections in Vietnam states that this may not accurately represent the situation in the study area because the PCR primers used may cross-hybridize with Plasmodium vivax. Nevertheless, P. knowlesi infections have been confirmed by sequencing. In addition, a neighbour-joining tree based on the 18S S-Type SSUrRNA gene shows that the Vietnamese samples clearly cluster with the P. knowlesi isolates identified in Malaysia and are distinct from the corresponding P. vivax sequences. All samples came from asymptomatic individuals who did not consult for fever during the months preceding or following the survey, indicating that asymptomatic P. knowlesi infections occur in this population, although this does not exclude the occurrence of symptomatic cases. Large-scale studies to determine the extent and the epidemiology of P. knowlesi malaria in Vietnam are further needed

    Validating module network learning algorithms using simulated data

    Get PDF
    In recent years, several authors have used probabilistic graphical models to learn expression modules and their regulatory programs from gene expression data. Here, we demonstrate the use of the synthetic data generator SynTReN for the purpose of testing and comparing module network learning algorithms. We introduce a software package for learning module networks, called LeMoNe, which incorporates a novel strategy for learning regulatory programs. Novelties include the use of a bottom-up Bayesian hierarchical clustering to construct the regulatory programs, and the use of a conditional entropy measure to assign regulators to the regulation program nodes. Using SynTReN data, we test the performance of LeMoNe in a completely controlled situation and assess the effect of the methodological changes we made with respect to an existing software package, namely Genomica. Additionally, we assess the effect of various parameters, such as the size of the data set and the amount of noise, on the inference performance. Overall, application of Genomica and LeMoNe to simulated data sets gave comparable results. However, LeMoNe offers some advantages, one of them being that the learning process is considerably faster for larger data sets. Additionally, we show that the location of the regulators in the LeMoNe regulation programs and their conditional entropy may be used to prioritize regulators for functional validation, and that the combination of the bottom-up clustering strategy with the conditional entropy-based assignment of regulators improves the handling of missing or hidden regulators.Comment: 13 pages, 6 figures + 2 pages, 2 figures supplementary informatio
    corecore