9 research outputs found
A bioinformatics and genotyping approach exploring personalised nutrition.
Personalised nutrition is at its early stages but shows the potential of improving
the health of the general population, at a time when diabetes and obesity are
becoming worldwide epidemics. However, it will need to be based on rigorous
scientific research, as well as being accompanied by public policies and ethical
considerations.
Research is making great progress towards the understanding of the impact of
genetics on complex diseases, which involve hundreds, or thousands, of variants,
each having varying effect on the disease. Personalised medicine aims at
harnessing this genetic information to tailor prevention and treatment according
to each individual.
Unfortunately, the links between the genotype and the phenotype are not yet fully
understood. And while the content of publicly available genetic databases is
exponentially growing, they are often using different formats and means of
access, making it difficult to get complete information. Moreover, evaluating the
genetic predisposition of an individual to a disease is not straightforward, and
while Polygenic Risk Score models can help in this regard, they are often only
based on common variants, which might lead to misevaluation of the risk for rare-
variants carriers.
In this thesis will be presented (i) VarGen, an R package to merge information
from different genetic databases, which has the potential to infer new variant-
disease relationships. (ii) a new method to improve Polygenic Risk Score models,
which includes variants obtained from VarGen on top of the common variants
from standard polygenic analyses. (iii) the results of a microRNA differential
expression analysis, aiming at identifying the impact of microRNAs, on the
development of severe Hypoxic-Ischemic Encephalopathy in new-borns.PhD in Environment and Agrifoo
MapOptics: A light-weight, cross-platform visualisation tool for optical mapping alignment
Availability and implementation:
MapOptics is implemented in Java 1.8 and released under an MIT licence. MapOptics can be downloaded from https://github.com/FadyMohareb/mapoptics and run on any standard desktop computer equipped with a Java Virtual Machine (JVM).
Supplementary data are available at Bioinformatics online.Bionano optical mapping is a technology that can assist in the final stages of genome assembly by lengthening and ordering scaffolds in a draft assembly by aligning the assembly to a genomic map. However, currently, tools for visualisation are limited to use on a Windows operating system or are developed initially for visualising large-scale structural variation. MapOptics is a lightweight cross-platform tool that enables the user to visualise and interact with the alignment of Bionano optical mapping data and can be used for in depth exploration of hybrid scaffolding alignments. It provides a fast, simple alternative to the large optical mapping analysis programs currently available for this area of research
VarGen: An R package for disease-associated variant discovery and annotation
Over the past decade, there has been an exponential increase in the amount of disease-related genomic data available in public databases. However, this high-quality information is spread across independent sources and researchers often need to access these separately. Hence, there is a growing need for tools that gather and compile this information in an easy and automated manner. Here we present “VarGen”, an easy to use, customisable R package that fetches, annotates and rank variants related to diseases and genetic disorders, using a collection public databases (viz. OMIM, FANTOM5, GTEx and the GWAS catalog). This package is also capable of annotating these variants to identify the most impactful ones. We expect that this tool will benefit the research of variant-disease relationships
De novo genome assembly of Solanum sitiens reveals structural variation associated with drought and salinity tolerance
Motivation: Solanum sitiens is a self-incompatible wild relative of tomato, characterised by salt and drought resistance traits, with the potential to contribute through breeding programmes to crop improvement in cultivated tomato. This species has a distinct morphology, classification and ecotype compared to other stress resistant wild tomato relatives such as S. pennellii and S. chilense. Therefore, the availability of a reference genome for S. sitiens will facilitate the genetic and molecular understanding of salt and drought resistance.
Results: A high-quality de novo genome and transcriptome assembly for S. sitiens (Accession LA1974) has been developed. A hybrid assembly strategy was followed using Illumina short reads (~159X coverage) and PacBio long reads (~44X coverage), generating a total of ~262 Gbp of DNA sequence. A reference genome of 1,245 Mbp, arranged in 1,483 scaffolds with a N50 of 1.826 Mbp was generated. Genome completeness was estimated at 95% using the Benchmarking Universal Single-Copy Orthologs (BUSCO) and the K-mer Analysis Tool (KAT). In addition, ~63 Gbp of RNA-Seq were generated to support the prediction of 31,164 genes from the assembly, and to perform a de novo transcriptome. Lastly, we identified three large inversions compared to S. lycopersicum, containing several drought resistance related genes, such as beta-amylase 1 and YUCCA7.
Availability: S. sitiens (LA1974) raw sequencing, transcriptome and genome assembly have been deposited at the NCBI’s Sequence Read Archive, under the BioProject number “PRJNA633104”
De novo genome assembly and functional annotation for Fusarium langsethiae
Background
Fusarium langsethiae is a T-2 and HT-2 mycotoxins producing species firstly characterised in 2004. It is commonly isolated from oats in Northern Europe. T-2 and HT-2 mycotoxins exhibit immunological and haemotological effects in animal health mainly through inhibition of protein, RNA and DNA synthesis. The development of a high-quality and comprehensively annotated assembly for this species is therefore essential in providing the molecular understanding and the mechanism of T-2 and HT-2 biosynthesis in F. langsethiae to help develop effective control strategies.
Results
The F. langsethiae assembly was produced using PacBio long reads, which were then assembled independently using Canu, SMARTdenovo and Flye. A total of 19,336 coding genes were identified using RNA-Seq informed ab-initio gene prediction. Finally, predicting genes were annotated using the basic local alignment search tool (BLAST) against the NCBI non-redundant (NR) genome database and protein hits were annotated using InterProScan. Genes with blast hits were functionally annotated with Gene Ontology.
Conclusions
We developed a high-quality genome assembly of a total length of 59 Mb and N50 of 3.51 Mb. Raw sequence reads and assembled genome is publicly available and can be downloaded from: GenBank under the accession JAFFKB000000000
CRAMER: A lightweight, highly customisable web-based genome browser supporting multiple visualisation instances
In recent years the ability to generate genomic data has increased dramatically along with the demand for easily personalised and customisable genome browsers for effective visualisation of diverse types of data. Despite the large number of web-based genome browsers available nowadays, none of the existing tools provide means for creating multiple visualisation instances without manual set up on the deployment server side. The Cranfield Genome Browser (CRAMER) is an open-source, lightweight and highly customisable web application for interactive visualisation of genomic data. Once deployed, CRAMER supports seamless creation of multiple visualisation instances in parallel while allowing users to control and customise multiple tracks. The application is deployed on a Node.js server and is supported by a MongoDB database which stored all customisations made by the users allowing quick navigation between instances. Currently, the browser supports visualising a large number of file formats for genome annotation, variant calling, reads coverage and gene expression. Additionally, the browser supports direct Javascript coding for personalised tracks, providing a whole new level of customisation both functionally and visually. Tracks can be added via direct file upload or processed in real-time via links to files stored remotely on an FTP repository. Furthermore, additional tracks can be added by users via simple drag and drop to an existing visualisation instance
A chromosome-level genome assembly of Solanum chilense, a tomato wild relative associated with resistance to salinity and drought
IntroductionSolanum chilense is a wild relative of tomato reported to exhibit resistance to biotic and abiotic stresses. There is potential to improve tomato cultivars via breeding with wild relatives, a process greatly accelerated by suitable genomic and genetic resources.MethodsIn this study we generated a high-quality, chromosome-level, de novo assembly for the S. chilense accession LA1972 using a hybrid assembly strategy with ~180 Gbp of Illumina short reads and ~50 Gbp long PacBio reads. Further scaffolding was performed using Bionano optical maps and 10x Chromium reads. ResultsThe resulting sequences were arranged into 12 pseudomolecules using Hi-C sequencing. This resulted in a 901 Mbp assembly, with a completeness of 95%, as determined by Benchmarking with Universal Single-Copy Orthologs (BUSCO). Sequencing of RNA from multiple tissues resulting in ~219 Gbp of reads was used to annotate the genome assembly with an RNA-Seq guided gene prediction, and for a de novo transcriptome assembly. This chromosome-level, high-quality reference genome for S. chilense accession LA1972 will support future breeding efforts for more sustainable tomato production. DiscussionGene sequences related to drought and salt resistance were compared between S. chilense and S. lycopersicum to identify amino acid variations with high potential for functional impact. These variants were subsequently analysed in 84 resequenced tomato lines across 12 different related species to explore the variant distributions. We identified a set of 7 putative impactful amino acid variants some of which may also impact on fruit development for example the ethylene-responsive transcription factor WIN1 and ethylene-insensitive protein 2. These variants could be tested for their ability to confer functional phenotypes to cultivars that have lost these variants
A chromosome-level genome assembly of Solanum chilense, a tomato wild relative associated with resistance to salinity and drought
Introduction:
Solanum chilense is a wild relative of tomato reported to exhibit resistance to biotic and abiotic stresses. There is potential to improve tomato cultivars via breeding with wild relatives, a process greatly accelerated by suitable genomic and genetic resources.
Methods:
In this study we generated a high-quality, chromosome-level, de novo assembly for the S. chilense accession LA1972 using a hybrid assembly strategy with ~180 Gbp of Illumina short reads and ~50 Gbp long PacBio reads. Further scaffolding was performed using Bionano optical maps and 10x Chromium reads.
Results:
The resulting sequences were arranged into 12 pseudomolecules using Hi-C sequencing. This resulted in a 901 Mbp assembly, with a completeness of 95%, as determined by Benchmarking with Universal Single-Copy Orthologs (BUSCO). Sequencing of RNA from multiple tissues resulting in ~219 Gbp of reads was used to annotate the genome assembly with an RNA-Seq guided gene prediction, and for a de novo transcriptome assembly. This chromosome-level, high-quality reference genome for S. chilense accession LA1972 will support future breeding efforts for more sustainable tomato production.
Discussion:
Gene sequences related to drought and salt resistance were compared between S. chilense and S. lycopersicum to identify amino acid variations with high potential for functional impact. These variants were subsequently analysed in 84 resequenced tomato lines across 12 different related species to explore the variant distributions. We identified a set of 7 putative impactful amino acid variants some of which may also impact on fruit development for example the ethylene-responsive transcription factor WIN1 and ethylene-insensitive protein 2. These variants could be tested for their ability to confer functional phenotypes to cultivars that have lost these variants.This work was jointly supported by the UK’s Biotechnology and Biological Sciences Research Council and the Indian Department of Biotechnology (BB/L011611/1)
Fact-based nutrition for infants and lactating mothers – The NUTRISHIELD study
Background: Human milk (HM) is the ideal source of nutrients for infants. Its composition is highly variable according to the infant’s needs. When not enough own mother’s milk (OMM) is available, the administration of pasteurized donor human milk (DHM) is considered a suitable alternative for preterm infants. This study protocol describes the NUTRISHIELD clinical study. The aim of this study is to evaluate the influence of diet, lifestyle habits, psychological stress, and pasteurization on the milk composition, and how it modulates infant’s growth, health, and development.
Methods and design: NUTRISHIELD is a prospective mother-infant birth cohort in the Spanish-Mediterranean area including three groups: preterm infants <32 weeks of gestation (i) exclusively receiving OMM, and (ii) exclusively receiving DHM, and (iii) term infants exclusively receiving OMM, as well as their mothers. Biological samples and nutritional, clinical, and anthropometric characteristics are collected at six time points covering the period from birth and until six months of infant’s age. The genotype, metabolome, and microbiota as well as the HM composition (i.e., macronutrients, fatty acids, vitamins, human milk oligosaccharides, and steroids) are characterized. Portable sensor prototypes for the analysis of HM and urine are benchmarked. Additionally, maternal psychosocial status is measured at the beginning of the study and at month six, including social support, family functioning, perceived stress, anxiety, and depression symptoms, and traumatic life events. Mother-infant postpartum bonding and parental stress are also examined. At six months, infant neurodevelopment scales are applied. Mother’s concerns and attitudes to breastfeeding are also registered through a specific questionnaire.
Discussion: NUTRISHIELD provides an in-depth longitudinal study of the mother-infant-microbiota triad combining multiple biological matrices, newly developed analytical methods, and ad-hoc designed sensor prototypes with a wide range of clinical outcome measures. Data obtained from this study will be used to train a machine-learning algorithm for providing dietary advice to lactating mothers and will be implemented in a user-friendly platform based on a combination of user-provided information and biomarker analysis. A better understanding of the factors affecting milk’s composition, together with the health implications for infants plays an important role in developing improved strategies of nutraceutical management in infant care