116 research outputs found

    Using paired-end sequences to optimise parameters for alignment of sequence reads against related genomes

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The advent of cheap high through-put sequencing methods has facilitated low coverage skims of a large number of organisms. To maximise the utility of the sequences, assembly into contigs and then ordering of those contigs is required. Whilst sequences can be assembled into contigs <it>de novo</it>, using assembled genomes of closely related organisms as a framework can considerably aid the process. However, the preferred search programs and parameters that will optimise the sensitivity and specificity of the alignments between the sequence reads and the framework genome(s) are not necessarily obvious. Here we demonstrate a process that uses paired-end sequence reads to choose an optimal program and alignment parameters.</p> <p>Results</p> <p>Unlike two single fragment reads, in paired-end sequence reads, such as BAC-end sequences, the two sequences in the pair have a known positional relationship in the original genome. This provides an additional level of confidence over match scores and e-values in the accuracy of the positional assignment of the reads in the comparative genome. Three commonly used sequence alignment programs: MegaBLAST, Blastz and PatternHunter were used to align a set of ovine BAC-end sequences against the equine genome assembly. A range of different search parameters, with a particular focus on contiguous and discontiguous seeds, were used for each program. The number of reads with a hit and the number of read pairs with hits for the two end sequences in the tail-to-tail paired-end configuration were plotted relative to the theoretical maximum expected curve. Of the programs tested, MegaBLAST with short contiguous seed lengths (word size 8-11) performed best in this particular task. In addition the data also provides estimates of the false positive and false negative rates, which can be used to determine the appropriate values of additional parameters, such as score cut-off, to balance sensitivity and specificity. To determine whether the approach also worked for the alignment of shorter reads, the first 240 bases of each BAC end sequence were also aligned to the equine genome. Again, contiguous MegaBLAST performed the best in optimising the sensitivity and specificity with which sheep BAC end reads map to the equine and bovine genomes.</p> <p>Conclusions</p> <p>Paired-end reads, such as BAC-end sequences, provide an efficient mechanism to optimise sequence alignment parameters, for example for comparative genome assemblies, by providing an objective standard to evaluate performance.</p

    Using Regulatory and Epistatic Networks to Extend the Findings of a Genome Scan: Identifying the Gene Drivers of Pigmentation in Merino Sheep

    Get PDF
    Extending genome wide association analysis by the inclusion of gene expression data may assist in the dissection of complex traits. We examined piebald, a pigmentation phenotype in both human and Merino sheep, by analysing multiple data types using a systems approach. First, a case control analysis of 49,034 ovine SNP was performed which confirmed a multigenic basis for the condition. We combined these results with gene expression data from five tissue types analysed with a skin-specific microarray. Promoter sequence analysis of differentially expressed genes allowed us to reverse-engineer a regulatory network. Likewise, by testing two-loci models derived from all pair-wise comparisons across piebald-associated SNP, we generated an epistatic network. At the intersection of both networks, we identified thirteen genes with insulin-like growth factor binding protein 7 (IGFBP7), platelet-derived growth factor alpha (PDGFRA) and the tetraspanin platelet activator CD9 at the kernel of the intersection. Further, we report a number of differentially expressed genes in regions containing highly associated SNP including ATRN, DOCK7, FGFR1OP, GLI3, SILV and TBX15. The application of network theory facilitated co-analysis of genetic variation with gene expression, recapitulated aspects of the known molecular biology of skin pigmentation and provided insights into the transcription regulation and epistatic interactions involved in piebald Merino sheep

    Characterisation and application of a bovine U6 promoter for expression of short hairpin RNAs

    Get PDF
    BackgroundThe use of small interfering RNA (siRNA) molecules in animals to achieve double-stranded RNA-mediated interference (RNAi) has recently emerged as a powerful method of sequence-specific gene knockdown. As DNA-based expression of short hairpin RNA (shRNA) for RNAi may offer some advantages over chemical and in vitro synthesised siRNA, a number of vectors for expression of shRNA have been developed. These often feature polymerase III (pol. III) promoters of either mouse or human origin.ResultsTo develop a shRNA expression vector specifically for bovine RNAi applications, we identified and characterised a novel bovine U6 small nuclear RNA (snRNA) promoter from bovine sequence data. This promoter is the putative bovine homologue of the human U6-8 snRNA promoter, and features a number of functional sequence elements that are characteristic of these types of pol. III promoters. A PCR based cloning strategy was used to incorporate this promoter sequence into plasmid vectors along with shRNA sequences for RNAi. The promoter was then used to express shRNAs, which resulted in the efficient knockdown of an exogenous reporter gene and an endogenous bovine gene.ConclusionWe have mined data from the bovine genome sequencing project to identify a functional bovine U6 promoter and used the promoter sequence to construct a shRNA expression vector. The use of this native bovine promoter in shRNA expression is an important component of our future development of RNAi therapeutic and transgenic applications in bovine species.<br /

    Analysis of the complement and molecular evolution of tRNA genes in cow

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Detailed information regarding the number and organization of transfer RNA (tRNA) genes at the genome level is becoming readily available with the increase of DNA sequencing of whole genomes. However the identification of functional tRNA genes is challenging for species that have large numbers of repetitive elements containing tRNA derived sequences, such as <it>Bos taurus</it>. Reliable identification and annotation of entire sets of tRNA genes allows the evolution of tRNA genes to be understood on a genomic scale.</p> <p>Results</p> <p>In this study, we explored the <it>B. taurus </it>genome using bioinformatics and comparative genomics approaches to catalogue and analyze cow tRNA genes. The initial analysis of the cow genome using tRNAscan-SE identified 31,868 putative tRNA genes and 189,183 pseudogenes, where 28,830 of the 31,868 predicted tRNA genes were classified as repetitive elements by the RepeatMasker program. We then used comparative genomics to further discriminate between functional tRNA genes and tRNA-derived sequences for the remaining set of 3,038 putative tRNA genes. For our analysis, we used the human, chimpanzee, mouse, rat, horse, dog, chicken and fugu genomes to predict that the number of active tRNA genes in cow lies in the vicinity of 439. Of this set, 150 tRNA genes were 100% identical in their sequences across all nine vertebrate genomes studied. Using clustering analyses, we identified a new tRNA-Gly<sup>CCC </sup>subfamily present in all analyzed mammalian genomes. We suggest that this subfamily originated from an ancestral tRNA-Gly<sup>GCC </sup>gene via a point mutation prior to the radiation of the mammalian lineages. Lastly, in a separate analysis we created phylogenetic profiles for each putative cow tRNA gene using a representative set of genomes to gain an overview of common evolutionary histories of tRNA genes.</p> <p>Conclusion</p> <p>The use of a combination of bioinformatics and comparative genomics approaches has allowed the confident identification of a set of cow tRNA genes that will facilitate further studies in understanding the molecular evolution of cow tRNA genes.</p

    Changed Patterns of Genomic Variation Following Recent Domestication: Selection Sweeps in Farmed Atlantic Salmon

    Get PDF
    The introduction of wild Atlantic salmon into captivity, and their subsequent artificial selection for production traits, has caused phenotypic differences between domesticated fish and their wild counterparts. Identification of regions of the genome underling these changes offers the promise of characterizing the early biological consequences of domestication. In the current study, we sequenced a population of farmed European Atlantic salmon and compared the observed patterns of SNP variation to those found in conspecific wild populations. This identified 139 genomic regions that contained significantly elevated SNP homozygosity in farmed fish when compared to their wild counterparts. The most extreme was adjacent to versican, a gene involved in control of neural crest cell migration. To control for false positive signals, a second and independent dataset of farmed and wild European Atlantic salmon was assessed using the same methodology. A total of 81 outlier regions detected in the first dataset showed significantly reduced homozygosity within the second one, strongly suggesting the genomic regions identified are enriched for true selection sweeps. Examination of the associated genes identified a number previously characterized as targets of selection in other domestic species and that have roles in development, behavior and olfactory system. These include arcvf, sema6, errb4, id2-like, and 6n1-like genes. Finally, we searched for evidence of parallel sweeps using a farmed population of North American origin. This failed to detect a convincing overlap to the putative sweeps present in European populations, suggesting the factors that drive patterns of variation under domestication and early artificial selection were largely independent. This is the first analysis on domestication of aquaculture species exploiting whole-genome sequence data and resulted in the identification of sweeps common to multiple independent populations of farmed European Atlantic salmon

    Net effects of life-history traits explain persistent differences in abundance among similar species

    Get PDF
    JSM and MM were supported by the National Science Foundation (NSF)1948946. MD is supported by the Warman Foundation, the Leverhulme Centre for Anthropocene Biodiversity (RC-2018-021) and NSF-NERC grant NE/V009338/1. MM is supported by a Leverhulme Trust Early Career Fellowship (ECF-2021-512).Life-history traits are promising tools to predict species commonness and rarity because they influence a population's fitness in a given environment. Yet, species with similar traits can have vastly different abundances, challenging the prospect of robust trait-based predictions. Using long-term demographic monitoring, we show that coral populations with similar morphological and life-history traits show persistent (decade-long) differences in abundance. Morphological groups predicted species positions along two, well-known life-history axes (the fast-slow continuum and size-specific fecundity). However, integral projection models revealed that density-independent population growth (λ) was more variable within morphological groups, and was consistently higher in dominant species relative to rare species. Within-group λ differences projected large abundance differences among similar species in short timeframes, and were generated by small but compounding variation in growth, survival, and reproduction. Our study shows that easily-measured morphological traits predict demographic strategies, yet small life-history differences can accumulate into large differences in λ and abundance among similar species. Quantifying the net effects of multiple traits on population dynamics is therefore essential to anticipate species commonness and rarity.Publisher PDFPeer reviewe

    Construction and validation of a Bovine Innate Immune Microarray

    Get PDF
    BACKGROUND: Microarray transcript profiling has the potential to illuminate the molecular processes that are involved in the responses of cattle to disease challenges. This knowledge may allow the development of strategies that exploit these genes to enhance resistance to disease in an individual or animal population. RESULTS: The Bovine Innate Immune Microarray developed in this study consists of 1480 characterised genes identified by literature searches, 31 positive and negative control elements and 5376 cDNAs derived from subtracted and normalised libraries. The cDNA libraries were produced from 'challenged' bovine epithelial and leukocyte cells. The microarray was found to have a limit of detection of 1 pg/μg of total RNA and a mean slide-to-slide correlation co-efficient of 0.88. The profiles of differentially expressed genes from Concanavalin A (ConA) stimulated bovine peripheral blood lymphocytes were determined. Three distinct profiles highlighted 19 genes that were rapidly up-regulated within 30 minutes and returned to basal levels by 24 h; 76 genes that were up-regulated between 2–8 hours and sustained high levels of expression until 24 h and 10 genes that were down-regulated. Quantitative real-time RT-PCR on selected genes was used to confirm the results from the microarray analysis. The results indicate that there is a dynamic process involving gene activation and regulatory mechanisms re-establishing homeostasis in the ConA activated lymphocytes. The Bovine Innate Immune Microarray was also used to determine the cross-species hybridisation capabilities of an ovine PBL sample. CONCLUSION: The Bovine Innate Immune Microarray has been developed which contains a set of well-characterised genes and anonymous cDNAs from a number of different bovine cell types. The microarray can be used to determine the gene expression profiles underlying innate immune responses in cattle and sheep

    Optimising use of 4D-CT phase information for radiomics analysis in lung cancer patients treated with stereotactic body radiotherapy

    Get PDF
    From IOP Publishing via Jisc Publications RouterHistory: received 2021-03-17, oa-requested 2021-04-07, accepted 2021-04-21, epub 2021-05-24, open-access 2021-05-24, ppub 2021-06-07Publication status: PublishedFunder: Cancer Research UK; doi: https://doi.org/10.13039/501100000289; Grant(s): C147/A25254Abstract: Purpose. 4D-CT is routine imaging for lung cancer patients treated with stereotactic body radiotherapy. No studies have investigated optimal 4D phase selection for radiomics. We aim to determine how phase data should be used to identify prognostic biomarkers for distant failure, and test whether stability assessment is required. A phase selection approach will be developed to aid studies with different 4D protocols and account for patient differences. Methods. 186 features were extracted from the tumour and peritumour on all phases for 258 patients. Feature values were selected from phase features using four methods: (A) mean across phases, (B) median across phases, (C) 50% phase, and (D) the most stable phase (closest in value to two neighbours), coined personalised selection. Four levels of stability assessment were also analysed, with inclusion of: (1) all features, (2) stable features across all phases, (3) stable features across phase and neighbour phases, and (4) features averaged over neighbour phases. Clinical-radiomics models were built for twelve combinations of feature type and assessment method. Model performance was assessed by concordance index (c-index) and fraction of new information from radiomic features. Results. The most stable phase spanned the whole range but was most often near exhale. All radiomic signatures provided new information for distant failure prediction. The personalised model had the highest c-index (0.77), and 58% of new information was provided by radiomic features when no stability assessment was performed. Conclusion. The most stable phase varies per-patient and selecting this improves model performance compared to standard methods. We advise the single most stable phase should be determined by minimising feature differences to neighbour phases. Stability assessment over all phases decreases performance by excessively removing features. Instead, averaging of neighbour phases should be used when stability is of concern. The models suggest that higher peritumoural intensity predicts distant failure
    corecore