1,087 research outputs found
Discriminative and informative features for biomolecular text mining with ensemble feature selection
Motivation: In the field of biomolecular text mining, black box behavior of machine learning systems currently limits understanding of the true nature of the predictions. However, feature selection (FS) is capable of identifying the most relevant features in any supervised learning setting, providing insight into the specific properties of the classification algorithm. This allows us to build more accurate classifiers while at the same time bridging the gap between the black box behavior and the end-user who has to interpret the results
The 18S ribosomal RNA sequence of the sea anemone <i>Ammonia sulcata</i> and its evolutionary position among other eukaryotes
Evolutionary trees based on partial small ribosomal subunit RNA sequences of 22 metazoa species have been published [(1988) Science 239, 748-753]. In these trees, cnidarians (Radiata) seemed to have evolved independently from the Bilateria, which is in contradiction with the general evolutionary view. In order to further investigate this problem, the complete srRNA sequence of the sea anemone Ammonia sulcata was determined and evolutionary trees were constructed using a matrix optimization method. In the tree thus obtained the sea anemone and Bilateria together form a monophyletic cluster, with the sea anemone forming the first line of descent of the metazoan group
Conservation of Nonsense-Mediated mRNA Decay Complex Components Throughout Eukaryotic Evolution
Nonsense-mediated mRNA decay (NMD) is an essential eukaryotic process regulating transcript quality and abundance, and is involved in diverse processes including brain development and plant defenses. Although some of the NMD machinery is conserved between kingdoms, little is known about its evolution. Phosphorylation of the core NMD component UPF1 is critical for NMD and is regulated in mammals by the SURF complex (UPF1, SMG1 kinase, SMG8, SMG9 and eukaryotic release factors). However, since SMG1 is reportedly missing from the genomes of fungi and the plant Arabidopsis thaliana, it remains unclear how UPF1 is activated outside the metazoa. We used comparative genomics to determine the conservation of the NMD pathway across eukaryotic evolution. We show that SURF components are present in all major eukaryotic lineages, including fungi, suggesting that in addition to UPF1 and SMG1, SMG8 and SMG9 also existed in the last eukaryotic common ancestor, 1.8 billion years ago. However, despite the ancient origins of the SURF complex, we also found that SURF factors have been independently lost across the Eukarya, pointing to genetic buffering within the essential NMD pathway. We infer an ancient role for SURF in regulating UPF1, and the intriguing possibility of undiscovered NMD regulatory pathways
Module networks revisited: computational assessment and prioritization of model predictions
The solution of high-dimensional inference and prediction problems in
computational biology is almost always a compromise between mathematical theory
and practical constraints such as limited computational resources. As time
progresses, computational power increases but well-established inference
methods often remain locked in their initial suboptimal solution. We revisit
the approach of Segal et al. (2003) to infer regulatory modules and their
condition-specific regulators from gene expression data. In contrast to their
direct optimization-based solution we use a more representative centroid-like
solution extracted from an ensemble of possible statistical models to explain
the data. The ensemble method automatically selects a subset of most
informative genes and builds a quantitatively better model for them. Genes
which cluster together in the majority of models produce functionally more
coherent modules. Regulators which are consistently assigned to a module are
more often supported by literature, but a single model always contains many
regulator assignments not supported by the ensemble. Reliably detecting
condition-specific or combinatorial regulation is particularly hard in a single
optimum but can be achieved using ensemble averaging.Comment: 8 pages REVTeX, 6 figure
Oidium neolycopersici: Intra-specific variability inferred from AFLP analysis and relationship with closely related powdery mildew fungi infecting various plant species
Previous works indicated a considerable variation in the pathogenicity, virulence, and host range of Oidium neolycopersici isolates causing tomato powdery mildew epidemics in many parts of the world. In this study, rDNA internal transcribed spacer (ITS) sequences, and amplified fragment length polymorphism (AFLP) patterns were analyzed in 17 O. neolycopersici samples collected in Europe, North America, and Japan, including those which overcame some of the tomato major resistance genes. The ITS sequences were identical in all 10 samples tested and were also identical to ITS sequences of eight previously studied O. neolycopersici specimens. The AFLP analysis revealed a high genetic diversity in O. neolycopersici and indicated that all 17 samples represented different genotypes. This might suggest the existence of either a yet unrevealed sexual reproduction or other genetic mechanisms that maintain a high genetic variability in O. neolycopersici. No clear correlation was found between the virulence and the AFLP patterns of the O. neolycopersici isolates studied. The relationship between O. neolycopersici and powdery mildew anamorphs infecting Aquilegia vulgaris, Chelidonium majus, Passiflora caerulea, and Sedum alboroseum was also investigated. These anamorphs are morphologically indistinguishable from and phylogenetically closely related to O. neolycopersici. The cross-inoculation tests and the analyses of ITS sequences and AFLP patterns jointly indicated that the powdery mildew anamorphs collected from the above mentioned plant species all represent distinct, but closely related species according to the phylogenetic species recognition. All these species were pathogenic only to their original host plant species, except O. neolycopersici which infected S. alboroseum, tobacco, petunia, and Arabidopsis thaliana, in addition to tomato, in cross-inoculation tests. This is the first genome-wide study that investigates the relationships among powdery mildews that are closely related based on ITS sequences and morphology. The results indicate that morphologically indistinguishable powdery mildews that differed in only one to five single nucleotide positions in their ITS region are to be considered as different taxa with distinct host ranges
Plasmodium knowlesi malaria in Vietnam: some clarifications
A recently published comment on a report of Plasmodium knowlesi infections in Vietnam states that this may not accurately represent the situation in the study area because the PCR primers used may cross-hybridize with Plasmodium vivax. Nevertheless, P. knowlesi infections have been confirmed by sequencing. In addition, a neighbour-joining tree based on the 18S S-Type SSUrRNA gene shows that the Vietnamese samples clearly cluster with the P. knowlesi isolates identified in Malaysia and are distinct from the corresponding P. vivax sequences. All samples came from asymptomatic individuals who did not consult for fever during the months preceding or following the survey, indicating that asymptomatic P. knowlesi infections occur in this population, although this does not exclude the occurrence of symptomatic cases. Large-scale studies to determine the extent and the epidemiology of P. knowlesi malaria in Vietnam are further needed
Validating module network learning algorithms using simulated data
In recent years, several authors have used probabilistic graphical models to
learn expression modules and their regulatory programs from gene expression
data. Here, we demonstrate the use of the synthetic data generator SynTReN for
the purpose of testing and comparing module network learning algorithms. We
introduce a software package for learning module networks, called LeMoNe, which
incorporates a novel strategy for learning regulatory programs. Novelties
include the use of a bottom-up Bayesian hierarchical clustering to construct
the regulatory programs, and the use of a conditional entropy measure to assign
regulators to the regulation program nodes. Using SynTReN data, we test the
performance of LeMoNe in a completely controlled situation and assess the
effect of the methodological changes we made with respect to an existing
software package, namely Genomica. Additionally, we assess the effect of
various parameters, such as the size of the data set and the amount of noise,
on the inference performance. Overall, application of Genomica and LeMoNe to
simulated data sets gave comparable results. However, LeMoNe offers some
advantages, one of them being that the learning process is considerably faster
for larger data sets. Additionally, we show that the location of the regulators
in the LeMoNe regulation programs and their conditional entropy may be used to
prioritize regulators for functional validation, and that the combination of
the bottom-up clustering strategy with the conditional entropy-based assignment
of regulators improves the handling of missing or hidden regulators.Comment: 13 pages, 6 figures + 2 pages, 2 figures supplementary informatio
- …