24 research outputs found
A Chromosome-Scale Assembly of the Garden Orach (Atriplex hortensis L.) Genome Using Oxford Nanopore Sequencing
Atriplex hortensis (2n = 2x = 18, 1C genome size 1.1 gigabases), also known
as garden orach and mountain-spinach, is a highly nutritious, broadleaf annual of
the Amaranthaceae-Chenopodiaceae alliance (Chenopodiaceae sensu stricto, subfam.
Chenopodioideae) that has spread in cultivation from its native primary domestication
area in Eurasia to other temperate and subtropical regions worldwide. Atriplex L. is a
highly complex but, as understood now, a monophyletic group of mainly halophytic
and/or xerophytic plants, of which A. hortensis has been a vegetable of minor
importance in some areas of Eurasia (from Central Asia to the Mediterranean) at least
since antiquity. Nonetheless, it is a crop with tremendous nutritional potential due
primarily to its exceptional leaf and seed protein quantities (approaching 30%) and
quality (high levels of lysine). Although there is some literature describing the taxonomy
and production of A. hortensis, there is a general lack of genetic and genomic data
that would otherwise help elucidate the genetic variation, phylogenetic positioning, and
future potential of the species. Here, we report the assembly of the first high-quality,
chromosome-scale reference genome for A. hortensis cv. “Golden.” Long-read data
from Oxford Nanopore’s MinION DNA sequencer was assembled with the program
Canu and polished with Illumina short reads. Contigs were scaffolded to chromosome
scale using chromatin-proximity maps (Hi-C) yielding a final assembly containing 1,325
scaffolds with a N50 of 98.9 Mb – with 94.7% of the assembly represented in the nine
largest, chromosome-scale scaffolds. Sixty-six percent of the genome was classified
as highly repetitive DNA, with the most common repetitive elements being Gypsy-
(32%) and Copia-like (11%) long-terminal repeats. The annotation was completed using
MAKER which identified 37,083 gene models and 2,555 tRNA genes. Completeness of the genome, assessed using the Benchmarking Universal Single Copy Orthologs
(BUSCO) metric, identified 97.5% of the conserved orthologs as complete, with only
2.2% being duplicated, reflecting the diploid nature of A. hortensis. A resequencing
panel of 21 wild, unimproved and cultivated A. hortensis accessions revealed three
distinct populations with little variation within subpopulations. These resources provide
vital information to better understand A. hortensis and facilitate future study
The CAFA challenge reports improved protein function prediction and new functional annotations for hundreds of genes through experimental screens
Background The Critical Assessment of Functional Annotation (CAFA) is an ongoing, global, community-driven effort to evaluate and improve the computational annotation of protein function. Results Here, we report on the results of the third CAFA challenge, CAFA3, that featured an expanded analysis over the previous CAFA rounds, both in terms of volume of data analyzed and the types of analysis performed. In a novel and major new development, computational predictions and assessment goals drove some of the experimental assays, resulting in new functional annotations for more than 1000 genes. Specifically, we performed experimental whole-genome mutation screening in Candida albicans and Pseudomonas aureginosa genomes, which provided us with genome-wide experimental data for genes associated with biofilm formation and motility. We further performed targeted assays on selected genes in Drosophila melanogaster, which we suspected of being involved in long-term memory. Conclusion We conclude that while predictions of the molecular function and biological process annotations have slightly improved over time, those of the cellular component have not. Term-centric prediction of experimental annotations remains equally challenging; although the performance of the top methods is significantly better than the expectations set by baseline methods in C. albicans and D. melanogaster, it leaves considerable room and need for improvement. Finally, we report that the CAFA community now involves a broad range of participants with expertise in bioinformatics, biological experimentation, biocuration, and bio-ontologies, working together to improve functional annotation, computational function prediction, and our ability to manage big data in the era of large experimental screens.Peer reviewe
The CAFA challenge reports improved protein function prediction and new functional annotations for hundreds of genes through experimental screens
BackgroundThe Critical Assessment of Functional Annotation (CAFA) is an ongoing, global, community-driven effort to evaluate and improve the computational annotation of protein function.ResultsHere, we report on the results of the third CAFA challenge, CAFA3, that featured an expanded analysis over the previous CAFA rounds, both in terms of volume of data analyzed and the types of analysis performed. In a novel and major new development, computational predictions and assessment goals drove some of the experimental assays, resulting in new functional annotations for more than 1000 genes. Specifically, we performed experimental whole-genome mutation screening in Candida albicans and Pseudomonas aureginosa genomes, which provided us with genome-wide experimental data for genes associated with biofilm formation and motility. We further performed targeted assays on selected genes in Drosophila melanogaster, which we suspected of being involved in long-term memory.ConclusionWe conclude that while predictions of the molecular function and biological process annotations have slightly improved over time, those of the cellular component have not. Term-centric prediction of experimental annotations remains equally challenging; although the performance of the top methods is significantly better than the expectations set by baseline methods in C. albicans and D. melanogaster, it leaves considerable room and need for improvement. Finally, we report that the CAFA community now involves a broad range of participants with expertise in bioinformatics, biological experimentation, biocuration, and bio-ontologies, working together to improve functional annotation, computational function prediction, and our ability to manage big data in the era of large experimental screens.</p
Predicting the Important Enzymes in Human Breast Milk Digestion
[Image: see text] Human milk is known to contain several proteases, but little is known about whether these enzymes are active, which proteins they cleave, and their relative contribution to milk protein digestion in vivo. This study analyzed the mass spectrometry-identified protein fragments found in pooled human milk by comparing their cleavage sites with the enzyme specificity patterns of an array of enzymes. The results indicate that several enzymes are actively taking part in the digestion of human milk proteins within the mammary gland, including plasmin and/or trypsin, elastase, cathepsin D, pepsin, chymotrypsin, a glutamyl endopeptidase-like enzyme, and proline endopeptidase. Two proteins were most affected by enzyme hydrolysis: β-casein and polymeric immunoglobulin receptor. In contrast, other highly abundant milk proteins such as α-lactalbumin and lactoferrin appear to have undergone no proteolytic cleavage. A peptide sequence containing a known antimicrobial peptide is released in breast milk by elastase and cathepsin D
Peptidomic Profile of Milk of Holstein Cows at Peak Lactation
Bovine milk is known to contain naturally occurring peptides, but relatively few of their sequences have been determined. Human milk contains hundreds of endogenous peptides and the ensemble has been documented for antimicrobial actions. Naturally occurring peptides from bovine milk were sequenced and compared with human milk peptides. Bovine milk samples from six cows in second stage peak lactation at 78–121 days post- partum revealed 159 peptides. Most peptides (73%) were found in all six cows sampled, demonstrating the similarity of the intra-mammary peptide degradation across these cows. One peptide sequence, ALPIIQKLEPQIA from bovine perilipin 2 was identical to another found in human milk. Most peptides derived from β-casein, α(s1)-casein and α(s2)- casein. No peptides derived from abundant bovine milk proteins like lactoferrin, β- lactoglobulin and secretory immunoglobulin A. The enzymatic cleavage analysis revealed that milk proteins were degraded by plasmin, cathepsins B and D and elastase in all samples