272 research outputs found

    Variant calling:Considerations, practices, and developments

    Get PDF
    The success of many clinical, association, or population genetics studies critically relies on properly performed variant calling step. The variety of modern genomics protocols, techniques, and platforms makes our choices of methods and algorithms difficult and there is no "one size fits all" solution for study design and data analysis. In this review, we discuss considerations that need to be taken into account while designing the study and preparing for the experiments. We outline the variety of variant types that can be detected using sequencing approaches and highlight some specific requirements and basic principles of their detection. Finally, we cover interesting developments that enable variant calling for a broad range of applications in the genomics field. We conclude by discussing technological and algorithmic advances that have the potential to change the ways of calling DNA variants in the nearest future

    CONREAL web server: identification and visualization of conserved transcription factor binding sites

    Get PDF
    The use of orthologous sequences and phylogenetic footprinting approaches have become popular for the recognition of conserved and potentially functional sequences. Several algorithms have been developed for the identification of conserved transcription factor binding sites (TFBSs), which are characterized by their relatively short and degenerative recognition sequences. The CONREAL (conserved regulatory elements anchored alignment) web server provides a versatile interface to CONREAL-, LAGAN-, BLASTZ- and AVID-based predictions of conserved TFBSs in orthologous promoters. Comparative analysis using different algorithms can be started by keyword without any prior sequence retrieval. The interface is available at

    CASCAD: a database of annotated candidate single nucleotide polymorphisms associated with expressed sequences

    Get PDF
    BACKGROUND: With the recent progress made in large-scale genome sequencing projects a vast amount of novel data is becoming available. A comparative sequence analysis, exploiting sequence information from various resources, can be used to uncover hidden information, such as genetic variation. Although there are enormous amounts of SNPs for a wide variety of organisms submitted to NCBI dbSNP and annotated in most genome assembly viewers like Ensembl and the UCSC Genome Browser, these platforms do not easily allow for extensive annotation and incorporation of experimental data supporting the polymorphism. However, such information is very important for selecting the most promising and useful candidate polymorphisms for use in experimental setups. DESCRIPTION: The CASCAD database is designed for presentation and query of candidate SNPs that are retrieved by in silico mining of high-throughput sequencing data. Currently, the database provides collections of laboratory rat (Rattus norvegicus) and zebrafish (Danio rerio) candidate SNPs. The database stores detailed information about raw data supporting the candidate, extensive annotation and links to external databases (e.g. GenBank, Ensembl, UniGene, and LocusLink), verification information, and predictions of a potential effect for non-synonymous polymorphisms in coding regions. The CASCAD website allows search based on an arbitrary combination of 27 different parameters related to characteristics like candidate SNP quality, genomic localization, and sequence data source or strain. In addition, the database can be queried with any custom nucleotide sequences of interest. The interface is crosslinked to other public databases and tightly coupled with primer design and local genome assembly interfaces in order to facilitate experimental verification of candidates. CONCLUSIONS: The CASCAD database discloses detailed information on rat and zebrafish candidate SNPs, including the raw data underlying its discovery. An advanced web-based search interface allows universal access to the database content and allows various queries supporting many types of research utilizing single nucleotide polymorphisms

    Multi-platform​ ​ discovery​ ​ of​ ​ haplotype-resolved structural​ ​ variation​ ​ in​ ​ human​ ​ genomes

    Get PDF
    The incomplete identification of structural variants from whole-genome sequencing data limits studies of human genetic diversity and disease association. Here, we apply a suite of long- and short-read, strand-specific sequencing technologies, optical mapping, and variant discovery algorithms to comprehensively analyze three human parent-child trios to define the full spectrum of human genetic variation in a haplotype-resolved manner. We identify 818,181 indel variants (<50 bp) and 31,599 structural variants (≥50 bp) per human genome, a seven fold increase in structural variation compared to previous reports, including from the 1000 Genomes Project. We also discovered 156 inversions per genome, most of which previously escaped detection, as well as large unbalanced chromosomal rearrangements. We provide near-complete, haplotype-resolved structural variation for three genomes that can now be used as a gold standard for the scientific community and we make specific recommendations for maximizing structural variation sensitivity for future large-scale genome sequencing studies

    The 'un-shrunk' partial correlation in Gaussian graphical models

    Get PDF
    Abstract Background In systems biology, it is important to reconstruct regulatory networks from quantitative molecular profiles. Gaussian graphical models (GGMs) are one of the most popular methods to this end. A GGM consists of nodes (representing the transcripts, metabolites or proteins) inter-connected by edges (reflecting their partial correlations). Learning the edges from quantitative molecular profiles is statistically challenging, as there are usually fewer samples than nodes (‘high dimensional problem’). Shrinkage methods address this issue by learning a regularized GGM. However, it remains open to study how the shrinkage affects the final result and its interpretation. Results We show that the shrinkage biases the partial correlation in a non-linear way. This bias does not only change the magnitudes of the partial correlations but also affects their order. Furthermore, it makes networks obtained from different experiments incomparable and hinders their biological interpretation. We propose a method, referred to as ‘un-shrinking’ the partial correlation, which corrects for this non-linear bias. Unlike traditional methods, which use a fixed shrinkage value, the new approach provides partial correlations that are closer to the actual (population) values and that are easier to interpret. This is demonstrated on two gene expression datasets from Escherichia coli and Mus musculus. Conclusions GGMs are popular undirected graphical models based on partial correlations. The application of GGMs to reconstruct regulatory networks is commonly performed using shrinkage to overcome the ‘high-dimensional problem’. Besides it advantages, we have identified that the shrinkage introduces a non-linear bias in the partial correlations. Ignoring this type of effects caused by the shrinkage can obscure the interpretation of the network, and impede the validation of earlier reported results

    Cytochrome P450scc spin state transitions in the thin solid films

    Get PDF
    Langmuir-Blodgett films of cytochrome P450scc were prepared on the solid supports and their spectral properties were investigated. Being immobilized, hemoprotein changes its spin state from initially high to low spin. This transition is reversible since after the solubilization of hemoprotein, the spin state equilibrium is shifted towards high-spin state. Anaerobic reduction of film incorporated cytochrome P450scc by electron transfer chain (NADPH-->adrenodoxin reductase-->adrenodoxin) revealed the low rate of the reaction that coincides well with the content of the hemoprotein low-spin form. We suggest that particularly regular orientation of solid cytochrome P450scc are of crucial importance for this phenomenon
    • …
    corecore