1,043 research outputs found

    Quantitative evaluation of chromosomal rearrangements in gene-edited human stem cells by CAST-Seq

    Get PDF
    Genome editing has shown great promise for clinical translation but also revealed the risk of genotoxicity caused by off-target effects of programmable nucleases. Here we describe chromosomal aberrations analysis by single targeted linker-mediated PCR sequencing (CAST-Seq), a preclinical assay to identify and quantify chromosomal aberrations derived from on-target and off-target activities of CRISPR-Cas nucleases or transcriptional activator-like effector nucleases (TALENs), respectively, in human hematopoietic stem cells (HSCs). Depending on the employed designer nuclease, CAST-Seq detected translocations in 0%–0.5% of gene-edited human CD34+ HSCs, and up to 20% of on-target loci harbored gross rearrangements. Moreover, CAST-Seq detected distinct types of chromosomal aberrations, such as homology-mediated translocations, that are mediated by homologous recombination and not off-target activity. CAST-Seq is a sensitive assay able to identify and quantify unintended chromosomal rearrangements in addition to the more typical mutations at off-target sites. CAST-Seq analyses may be particularly relevant for therapeutic genome editing to enable thorough risk assessment before clinical application of gene-edited products

    Multispecies Genomic Sex Identification Using DDX3 Gene Polymorphisms

    Get PDF
    PCR sex determination assays must be reliable and cost effective due to the frequent and integral use of these assays in biological research and the animal production industry. Thus, the design of proof of a primer pair with a built-in control is warranted to not only bypass the extra cost of a multiplex reaction, but also to prevent anomalous results that have been documented with other primer pairs. The objective of this study was to design primer pairs with built in PCR amplification control to identify sex in Equus caballus (domestic horse), Homo sapiens (humans), Macaca mulatta (rhesus macaque), and Sus scrofa (domestic pig) DNA samples. The procedures utilized in this study were to first align the DDX3X gene with the Y chromosome homolog and create primer pairs to flank Y chromosome specific indels with each species. PCR with gel electrophoresis results showed confirmation of the hypothesis that the primer pairs can be used to accurately identify separate X and Y chromosome specific sequences via species-specific single primer pairs in Equus caballus, Homo sapiens, Macaca mulatta, and Sus scrofa genomic samples. Additionally, in silico and qualitative gel analyses were completed to assess the efficacy of the Equus caballus specific primers in alternative species, which yielded no results. Summarily, results concluded that there are suitable indel regions for PCR amplification in Equus caballus, Homo sapiens, Macaca mulatta, and Sus scrofa. The results presented herein represent a meaningful contribution to the field from this methodology, these indel regions can be identified via the designed primers and utilized in an efficient, quick, and cost-effective way to identify sex without the need for multiplex or the risk of false identification

    Computational analysis of human genomic variants and lncRNAs from sequence data

    Get PDF
    The high-throughput sequencing technologies have been developed and applied to the human genome studies for nearly 20 years. These technologies have provided numerous research applications and have significantly expanded our knowledge about the human genome. In this thesis, computational methods that utilize sequence data to study human genomic variants and transcripts were evaluated and developed. Indel represents insertion and deletion, which are two types of common genomic variants that are widespread in the human genome. Detecting indels from human genomes is the crucial step for diagnosing indel related genomic disorders and may potentially identify novel indel makers for studying certain diseases. Compared with previous techniques, the high-throughput sequencing technologies, especially the next- generation sequencing (NGS) technology, enable to detect indels accurately and efficiently in wide ranges of genome. In the first part of the thesis, tools with indel calling abilities are evaluated with an assortment of indels and different NGS settings. The results show that the selection of tools and NGS settings impact on indel detection significantly, which provide suggestions for tool selection and future developments. In bioinformatics analysis, an indel’s position can be marked inconsistently on the reference genome, which may result in an indel having different but equivalent representations and cause troubles for downstream. This problem is related to the complex sequence context of the indels, for example, short tandem repeats (STRs), where the same short stretch of nucleotides is amplified. In the second part of the thesis, a novel computational tool VarSCAT was described, which has various functions for annotating the sequence context of variants, including ambiguous positions, STRs, and other sequence context features. Analysis of several high- confidence human variant sets with VarSCAT reveals that a large number of genomic variants, especially indels, have sequence features associated with STRs. In the human genome, not all genes and their transcripts are translated into proteins. Long non-coding ribonucleic acid (lncRNA) is a typical example. Sequence recognition built with machine learning models have improved significantly in recent years. In the last part of the thesis, several machine learning-based lncRNA prediction tools were evaluated on their predictions for coding potentiality of transcripts. The results suggest that tools based on deep learning identify lncRNAs best. Ihmisen genomivarianttien ja lncRNA:iden laskennallinen analyysi sekvenssiaineistosta Korkean suorituskyvyn sekvensointiteknologioita on kehitetty ja sovellettu ihmisen genomitutkimuksiin lähes 20 vuoden ajan. Nämä teknologiat ovat mahdollistaneet ihmisen genomin laaja-alaisen tutkimisen ja lisänneet merkittävästi tietoamme siitä. Tässä väitöstyössä arvioitiin ja kehitettiin sekvenssiaineistoa hyödyntäviä laskennallisia menetelmiä ihmisen genomivarianttien sekä transkriptien tutkimiseen. Indeli on yhteisnimitys lisäys- eli insertio-varianteille ja häviämä- eli deleetio-varianteille, joita esiintyy koko genomin alueella. Indelien tunnistaminen on ratkaisevaa geneettisten poikkeavuuksien diagnosoinnissa ja eri sairauksiin liittyvien uusien indeli-markkereiden löytämisessä. Aiempiin teknologioihin verrattuna korkean suorituskyvyn sekvensointiteknologiat, erityisesti seuraavan sukupolven sekvensointi (NGS) mahdollistavat indelien havaitsemisen tarkemmin ja tehokkaammin laajemmilta genomialueilta. Väitöstyön ensimmäisessä osassa indelien kutsumiseen tarkoitettuja laskentatyökaluja arvioitiin käyttäen laajaa valikoimaa indeleitä ja erilaisia NGS-asetuksia. Tulokset osoittivat, että työkalujen valinta ja NGS-asetukset vaikuttivat indelien tunnistukseen merkittävästi ja siten ne voivat ohjata työkalujen valinnassa ja kehitystyössä. Bioinformatiivisessa analyysissä saman indelin sijainti voidaan merkitä eri kohtiin referenssigenomia, joka voi aiheuttaa ongelmia loppupään analyysiin, kuten indeli-kutsujen arviointiin. Tämä ongelma liittyy sekvenssikontekstiin, koska variantit voivat sijoittua lyhyille perättäisille tandem-toistojaksoille (STR), jossa sama lyhyt nukleotidijakso on monistunut. Väitöstyön toisessa osassa kehitettiin laskentatyökalu VarSCAT, jossa on eri toimintoja, mm. monitulkintaisten sijaintitietojen, vierekkäisten alueiden ja STR-alueiden tarkasteluun. Luotettaviksi arvioitujen ihmisen varianttiaineistojen analyysi VarSCAT-työkalulla paljasti, että monien geneettisten varianttien ja erityisesti indelien ominaisuudet liittyvät STR-alueisiin. Kaikkia ihmisen geenejä ja niiden geenituotteita, kuten esimerkiksi ei-koodaavia RNA:ta (lncRNA) ei käännetä proteiiniksi. Koneoppimismenetelmissä ja sekvenssitunnistuksessa on tapahtunut huomattavaa parannusta viime vuosina. Väitöstyön viimeisessä osassa arvioitiin useiden koneoppimiseen perustuvien lncRNA-ennustustyökalujen ennusteita. Tulokset viittaavat siihen, että syväoppimiseen perustuvat työkalut tunnistavat lncRNA:t parhaiten

    Integrative computational approaches to study protein-nucleic acid interactions

    Get PDF
    Interactions between proteins and nucleic acid molecules are central to the cellular regulation and homeostasis. To study them, I employ a wide range of computational analysis methods to integrate genomic data from many types of experiment. This thesis has three parts. In the first part, I explore the patterns of indels created by CRISPR-Cas9 genome editing. By thorough characterisation of the precision of editing at thousands of genomic target sites, we identify simple sequence rules that can help predict these outcomes. Furthermore, we examine the role of the structural chromatin context in fine-tuning Cas9-DNA interactions. In the second part, I explore methods to study protein-RNA interactions. I use comparative computational analyses to assess both the data quality of, and data analysis methods for, different crosslinking and immunoprecipitation (CLIP) technologies. I then develop new methods to analyse data generated by hybrid individual-nucleotide resolution CLIP (hiCLIP). By tailoring computational solutions to an understanding of experimental conditions, I improve the overall sensitivity of hiCLIP, and ultimately feedback to drive ongoing experimental development. In the third part, I focus on the Staufen family of double-stranded RNA binding proteins and using hiCLIP data to define transcriptome-wide atlases of RNA duplexes bound by these proteins both in a cell line and in rat brain tissue. Through integration with other data sets, both publicly available and newly generated, I derive insights into their function in RNA metabolism, and in how these interactions change during the course of mammalian brain development with putative roles in ribonucleoprotein complex formation. In summary, I present a range of tailored computational methods and analyses developed to understand interactions between proteins and nucleic acids; aiming to link these interactions to functional outcomes
    corecore