Search CORE

18 research outputs found

Predicting rice phenotypes with meta and multi-target learning

Author: Alexandrov Nickolai N.
King Ross
Orhobor Oghenejokpeme I.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2020
Field of study

The features in some machine learning datasets can naturally be divided into groups. This is the case with genomic data, where features can be grouped by chromosome. In many applications it is common for these groupings to be ignored, as interactions may exist between features belonging to different groups. However, including a group that does not influence a response introduces noise when fitting a model, leading to suboptimal predictive accuracy. Here we present two general frameworks for the generation and combination of meta-features when feature groupings are present. Furthermore, we make comparisons to multi-target learning, given that one is typically interested in predicting multiple phenotypes. We evaluated the frameworks and multi-target learning approaches on a genomic rice dataset where the regression task is to predict plant phenotype. Our results demonstrate that there are use cases for both the meta and multi-target approaches, given that overall, they significantly outperform the base case

Chalmers Research

Predicting rice phenotypes with meta and multi-target learning

Author: Alexandrov Nickolai N.
King Ross D.
Orhobor Oghenejokpeme I.
Publication venue: Machine Learning
Publication date: 01/01/2020
Field of study

Abstract: The features in some machine learning datasets can naturally be divided into groups. This is the case with genomic data, where features can be grouped by chromosome. In many applications it is common for these groupings to be ignored, as interactions may exist between features belonging to different groups. However, including a group that does not influence a response introduces noise when fitting a model, leading to suboptimal predictive accuracy. Here we present two general frameworks for the generation and combination of meta-features when feature groupings are present. Furthermore, we make comparisons to multi-target learning, given that one is typically interested in predicting multiple phenotypes. We evaluated the frameworks and multi-target learning approaches on a genomic rice dataset where the regression task is to predict plant phenotype. Our results demonstrate that there are use cases for both the meta and multi-target approaches, given that overall, they significantly outperform the base case

Chalmers Research

Apollo (Cambridge)

GC3 biology in corn, rice, sorghum and other grasses

Author: Alexandrov Nickolai N
Bouck John B
Feldmann Kenneth A
Tatarinova Tatiana V
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background The third, or wobble, position in a codon provides a high degree of possible degeneracy and is an elegant fault-tolerance mechanism. Nucleotide biases between organisms at the wobble position have been documented and correlated with the abundances of the complementary tRNAs. We and others have noticed a bias for cytosine and guanine at the third position in a subset of transcripts within a single organism. The bias is present in some plant species and warm-blooded vertebrates but not in all plants, or in invertebrates or cold-blooded vertebrates. Results Here we demonstrate that in certain organisms the amount of GC at the wobble position (GC3) can be used to distinguish two classes of genes. We highlight the following features of genes with high GC3 content: they (1) provide more targets for methylation, (2) exhibit more variable expression, (3) more frequently possess upstream TATA boxes, (4) are predominant in certain classes of genes (e.g., stress responsive genes) and (5) have a GC3 content that increases from 5'to 3'. These observations led us to formulate a hypothesis to explain GC3 bimodality in grasses. Conclusions Our findings suggest that high levels of GC3 typify a class of genes whose expression is regulated through DNA methylation or are a legacy of accelerated evolution through gene conversion. We discuss the three most probable explanations for GC3 bimodality: biased gene conversion, transcriptional and translational advantage and gene methylation.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

University of South Wales Research Explorer

PubMed Central

Recommended from our members

Characterization of the Leaf Microbiome from Whole-Genome Sequencing Data of the 3000 Rice Genomes Project

Author: Alexandrov Nickolai
Borja Frances N
Groen Simon C
Mauleon Ramil
Oliva Ricardo
Pinili Dale
Quibod Ian L
Roman-Reyna Veronica
Publication venue: eScholarship, University of California
Publication date: 01/12/2020
Field of study

BackgroundThe crop microbial communities are shaped by interactions between the host, microbes and the environment, however, their relative contribution is beginning to be understood. Here, we explore these interactions in the leaf bacterial community across 3024 rice accessions.FindingsBy using unmapped DNA sequencing reads as microbial reads, we characterized the structure of the rice bacterial microbiome. We identified central bacteria taxa that emerge as microbial "hubs" and may have an influence on the network of host-microbe interactions. We found regions in the rice genome that might control the assembly of these microbial hubs. To our knowledge this is one of the first studies that uses raw data from plant genome sequencing projects to characterize the leaf bacterial communities.ConclusionWe showed, that the structure of the rice leaf microbiome is modulated by multiple interactions among host, microbes, and environment. Our data provide insight into the factors influencing microbial assemblage in the rice leaf and also opens the door for future initiatives to modulate rice consortia for crop improvement efforts

eScholarship - University of California

Insights into corn genes derived from large-scale cDNA sequencing

Author: A Beletskii
A Grigoriev
B Ewing
BB Wang
CT Bull
DA Petrov
DA Samarsky
DJ Galas
EV Kriventseva
G Haberer
GE Crooks
H Walia
HC Wang
Hongyu Zhang
I Tirosh
J Jia
JD Kittle
John Bouck
Kenneth A. Feldmann
M Gidekel
M Jain
M Strathmann
Maxim E. Troukhan
MB Soares
Nickolai N. Alexandrov
NN Alexandrov
QC Cronk
Richard B. Flavell
S Fujimori
SS Merchant
Stanislav Freidin
Tatiana V. Tatarinova
Timothy J. Swaller
TZ Berardini
Vyacheslav V. Brover
WH Campbell
Yu-Ping Lu
Publication venue: Springer Netherlands
Publication date: 01/01/2008
Field of study

We present a large portion of the transcriptome of Zea mays, including ESTs representing 484,032 cDNA clones from 53 libraries and 36,565 fully sequenced cDNA clones, out of which 31,552 clones are non-redundant. These and other previously sequenced transcripts have been aligned with available genome sequences and have provided new insights into the characteristics of gene structures and promoters within this major crop species. We found that although the average number of introns per gene is about the same in corn and Arabidopsis, corn genes have more alternatively spliced isoforms. Examination of the nucleotide composition of coding regions reveals that corn genes, as well as genes of other Poaceae (Grass family), can be divided into two classes according to the GC content at the third position in the amino acid encoding codons. Many of the transcripts that have lower GC content at the third position have dicot homologs but the high GC content transcripts tend to be more specific to the grasses. The high GC content class is also enriched with intronless genes. Together this suggests that an identifiable class of genes in plants is associated with the Poaceae divergence. Furthermore, because many of these genes appear to be derived from ancestral genes that do not contain introns, this evolutionary divergence may be the result of horizontal gene transfer from species not only with different codon usage but possibly that did not have introns, perhaps outside of the plant kingdom. By comparing the cDNAs described herein with the non-redundant set of corn mRNAs in GenBank, we estimate that there are about 50,000 different protein coding genes in Zea. All of the sequence data from this study have been submitted to DDBJ/GenBank/EMBL under accession numbers EU940701–EU977132 (FLI cDNA) and FK944382-FL482108 (EST)

Crossref

Springer - Publisher Connector

PubMed Central

Application of a new method of pattern recognition in DNA sequence analysis: a study of E.coli

Author: Andrei A. Mironov
Nickolai N. Alexandrov
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/1990
Field of study

Crossref

Fast Protein Fold Recognition via Sequence to Structure Alignment and Contact Capacity Potentials

Author: Nickolai N. Alexandrov
Ralf M. Zimmer
Ruth Nussinov
Publication venue
Publication date
Field of study

We propose new empirical scoring potentials and associated alignment procedures for optimally aligning protein sequences to protein structures. The method has two main applications: first, the recognition of a plausible fold for a protein sequence of unknown structure out of a database of representative protein structures and, second, the improvement of sequence alignments by using structural information in order to find a better starting point for homology based modelling. The empirical scoring function is derived from an analysis of a non-- redundant database of known structures by converting relative frequencies into pseudoenergies using a normalization according to the inverse Boltzmann law. These -- so called contact capacity -- potentials turn out to be discriminative enough to detect structural folds in the absence of significant sequence similarity and at the same time simple enough to allow for a very fast optimization in an alignment procedure. 1 Introduction and Problem Defi..

CiteSeerX

Genome-Wide Discovery of cis

Author: Alexandrov
Initiative
Janaki C.
John Bouck
Maxim Troukhan
Nickolai N. Alexandrov
Richard B. Flavell
Tatiana Tatarinova
Publication venue: 'Mary Ann Liebert Inc'
Publication date
Field of study

Crossref