Search CORE

7,863 research outputs found

Spectral Analysis of Guanine and Cytosine Fluctuations of Mouse Genomic DNA

Author: Bernardi G.
Calladine C. R.
Clay O.
Daniell P. J.
DIRK HOLSTE
Gerton J. L.
Kong A.
Li W.
Li W.
Mansilla R.
Meyer A.
Press W.
Priestley M. B.
Waterston R. H.
WENTIAN LI
West B. J.
Publication venue: 'World Scientific Pub Co Pte Lt'
Publication date: 03/11/2004
Field of study

We study global fluctuations of the guanine and cytosine base content (GC%) in mouse genomic DNA using spectral analyses. Power spectra S(f) of GC% fluctuations in all nineteen autosomal and two sex chromosomes are observed to have the universal functional form S(f) \sim 1/f^alpha (alpha \approx 1) over several orders of magnitude in the frequency range 10^-7< f < 10^-5 cycle/base, corresponding to long-ranging GC% correlations at distances between 100 kb and 10 Mb. S(f) for higher frequencies (f > 10^-5 cycle/base) shows a flattened power-law function with alpha < 1 across all twenty-one chromosomes. The substitution of about 38% interspersed repeats does not affect the functional form of S(f), indicating that these are not predominantly responsible for the long-ranged multi-scale GC% fluctuations in mammalian genomes. Several biological implications of the large-scale GC% fluctuation are discussed, including neutral evolutionary history by DNA duplication, chromosomal bands, spatial distribution of transcription units (genes), replication timing, and recombination hot spots.Comment: 15 pages (figures included), 2 figure

arXiv.org e-Print Archive

Crossref

Human Promoter Prediction Using DNA Numerical Representation

Author: Arniker Swarna Bai
Publication venue: 'University of Windsor Leddy Library'
Publication date: 01/01/2010
Field of study

With the emergence of genomic signal processing, numerical representation techniques for DNA alphabet set {A, G, C, T} play a key role in applying digital signal processing and machine learning techniques for processing and analysis of DNA sequences. The choice of the numerical representation of a DNA sequence affects how well the biological properties can be reflected in the numerical domain for the detection and identification of the characteristics of special regions of interest within the DNA sequence. This dissertation presents a comprehensive study of various DNA numerical and graphical representation methods and their applications in processing and analyzing long DNA sequences. Discussions on the relative merits and demerits of the various methods, experimental results and possible future developments have also been included. Another area of the research focus is on promoter prediction in human (Homo Sapiens) DNA sequences with neural network based multi classifier system using DNA numerical representation methods. In spite of the recent development of several computational methods for human promoter prediction, there is a need for performance improvement. In particular, the high false positive rate of the feature-based approaches decreases the prediction reliability and leads to erroneous results in gene annotation.To improve the prediction accuracy and reliability, DigiPromPred a numerical representation based promoter prediction system is proposed to characterize DNA alphabets in different regions of a DNA sequence.The DigiPromPred system is found to be able to predict promoters with a sensitivity of 90.8% while reducing false prediction rate for non-promoter sequences with a specificity of 90.4%. The comparative study with state-of-the-art promoter prediction systems for human chromosome 22 shows that our proposed system maintains a good balance between prediction accuracy and reliability. To reduce the system architecture and computational complexity compared to the existing system, a simple feed forward neural network classifier known as SDigiPromPred is proposed. The SDigiPromPred system is found to be able to predict promoters with a sensitivity of 87%, 87%, 99% while reducing false prediction rate for non-promoter sequences with a specificity of 92%, 94%, 99% for Human, Drosophila, and Arabidopsis sequences respectively with reconfigurable capability compared to existing system

Scholarship at UWindsor

Recommended from our members

Spatial intratumoral heterogeneity and temporal clonal evolution in esophageal squamous cell carcinoma.

Author: Berman Benjamin P
Cai Yan
Chang Chen
Dinh Huy Q
Hao Jia-Jie
Jiang Yan-Yi
Jiang Ye
Koeffler H Phillip
Lin De-Chen
Lu Chen-Chen
Mayakonda Anand
Shi Zhi-Zhou
Wang Jin-Wu
Wang Ming-Rong
Wei Wen-Qiang
Xu Xin
Zhan Qi-Min
Zhang Yu
Publication venue: eScholarship, University of California
Publication date: 01/12/2016
Field of study

Esophageal squamous cell carcinoma (ESCC) is among the most common malignancies, but little is known about its spatial intratumoral heterogeneity (ITH) and temporal clonal evolutionary processes. To address this, we performed multiregion whole-exome sequencing on 51 tumor regions from 13 ESCC cases and multiregion global methylation profiling for 3 of these 13 cases. We found an average of 35.8% heterogeneous somatic mutations with strong evidence of ITH. Half of the driver mutations located on the branches of tumor phylogenetic trees targeted oncogenes, including PIK3CA, NFE2L2 and MTOR, among others. By contrast, the majority of truncal and clonal driver mutations occurred in tumor-suppressor genes, including TP53, KMT2D and ZNF750, among others. Interestingly, phyloepigenetic trees robustly recapitulated the topological structures of the phylogenetic trees, indicating a possible relationship between genetic and epigenetic alterations. Our integrated investigations of spatial ITH and clonal evolution provide an important molecular foundation for enhanced understanding of tumorigenesis and progression in ESCC

eScholarship - University of California

Deep Domain Adaptation Learning Framework for Associating Image Features to Tumour Gene Profile

Author: Xia Tian
Publication venue: Faculty of Engineering and Information Technologies, School of Information Technologies
Publication date: 30/03/2018
Field of study

While medical imaging and general pathology are routine in cancer diagnosis, genetic sequencing is not always assessable due to the strong phenotypic and genetic heterogeneity of human cancers. Image-genomics integrates medical imaging and genetics to provide a complementary approach to optimise cancer diagnosis by associating tumour imaging traits with clinical data and has demonstrated its potential in identifying imaging surrogates for tumour biomarkers. However, existing image-genomics research has focused on quantifying tumour visual traits according to human understanding, which may not be optimal across different cancer types. The challenge hence lies in the extraction of optimised imaging representations in an objective data-driven manner. Such an approach requires large volumes of annotated image data that are difficult to acquire. We propose a deep domain adaptation learning framework for associating image features to tumour genetic information, exploiting the ability of domain adaptation technique to learn relevant image features from close knowledge domains. Our proposed framework leverages the current state-of-the-art in image object recognition to provide image features to encode subtle variations of tumour phenotypic characteristics with domain adaptation techniques. The proposed framework was evaluated with current state-of-the-art in: (i) tumour histopathology image classification and; (ii) image-genomics associations. The proposed framework demonstrated improved accuracy of tumour classification, as well as providing additional data-derived representations of tumour phenotypic characteristics that exhibit strong image-genomics association. This thesis advances and indicates the potential of image-genomics research to reveal additional imaging surrogates to genetic biomarkers, which has the potential to facilitate cancer diagnosis

Sydney eScholarship

Modern Computing Techniques for Solving Genomic Problems

Author: Yu Ning
Publication venue: ScholarWorks @ Georgia State University
Publication date: 12/08/2016
Field of study

With the advent of high-throughput genomics, biological big data brings challenges to scientists in handling, analyzing, processing and mining this massive data. In this new interdisciplinary field, diverse theories, methods, tools and knowledge are utilized to solve a wide variety of problems. As an exploration, this dissertation project is designed to combine concepts and principles in multiple areas, including signal processing, information-coding theory, artificial intelligence and cloud computing, in order to solve the following problems in computational biology: (1) comparative gene structure detection, (2) DNA sequence annotation, (3) investigation of CpG islands (CGIs) for epigenetic studies. Briefly, in problem #1, sequences are transformed into signal series or binary codes. Similar to the speech/voice recognition, similarity is calculated between two signal series and subsequently signals are stitched/matched into a temporal sequence. In the nature of binary operation, all calculations/steps can be performed in an efficient and accurate way. Improving performance in terms of accuracy and specificity is the key for a comparative method. In problem #2, DNA sequences are encoded and transformed into numeric representations for deep learning methods. Encoding schemes greatly influence the performance of deep learning algorithms. Finding the best encoding scheme for a particular application of deep learning is significant. Three applications (detection of protein-coding splicing sites, detection of lincRNA splicing sites and improvement of comparative gene structure identification) are used to show the computing power of deep neural networks. In problem #3, CpG sites are assigned certain energy and a Gaussian filter is applied to detection of CpG islands. By using the CpG box and Markov model, we investigate the properties of CGIs and redefine the CGIs using the emerging epigenetic data. In summary, these three problems and their solutions are not isolated; they are linked to modern techniques in such diverse areas as signal processing, information-coding theory, artificial intelligence and cloud computing. These novel methods are expected to improve the efficiency and accuracy of computational tools and bridge the gap between biology and scientific computing

ScholarWorks @ Georgia State University

Computational Methods for Delineating Multiple Nuclear Phenotypes from Different Imaging Modalities

Author: Khoshdeli Mina
Publication venue
Publication date: 18/06/2019
Field of study

Characterizing histopathology or organoid models of breast cancer can provide fundamental knowledge that will lead to a better understanding of tumors, response to therapeutic agents, and discovery of new targeted therapies. To this aim, the delineation of nuclei is significantly interesting since it provides rich information about the aberrant microanatomy or colony formation. For example, (i) cancer cells tend to be larger and, if coupled with high chromatin content, may indicate aneuploidy; (ii) cellular density can be the result of rapid proliferation; (iii) nuclear micro-texture can be a surrogate for fluctuation of heterochromatin patterns, where epigenetic aberrations in cancers are sometimes correlated with alterations in heterochromatin distribution; and (iv) normalized colony formation of cancer cells, in 3D culture, can serve as a surrogate metric for tumor suppression. These evidences suggest that nuclear segmentation and profiling is a major step for subsequent bioinformatics analysis. However, there are two barriers which include technical variations during the sample preparation step and biological heterogeneity since no two patients/samples are alike. As a result of these complexities, extension of deep learning methodologies will have a significant impact on the robust characterization and profiling of pathology sections or organoid models. In this presentation, we demonstrate that integration of regional and contextual representations, within the framework of a deep encoder-decoder architecture, contribute to robust delineation of various nuclear phenotypes from both bright field and confocal microscopy. The deep encoder-decoder architecture can infer perceptual boundaries that are necessary to decompose clumps of nuclei. The method has been validated on pathology section and organoid models of human mammary epithelial cells

University of Nevada, Reno ScholarWorks Repository

Computational mapping of regulatory domains of human genes

Author: Patarčić Inga
Publication venue: Humboldt-Universität zu Berlin
Publication date: 02/11/2021
Field of study

Das menschliche Genom enthält Millionen von regulatorischen Elementen - Enhancern -, die die Genexpression quantitativ regulieren. Trotz des enormen Fortschritts beim Verständnis, wie Enhancer die Genexpression steuern, fehlt es in diesem Bereich immer noch an einem systematischen, integrativen und zugänglichen Ansatz zur Entdeckung und Dokumentation von cis-regulatorischen Beziehungen im gesamten Genom. Wir haben eine neuartige Methode - reg2gene - entwickelt, die Genexpression~Enhancer-Aktivität modelliert und integriert. reg2gene besteht aus drei Hauptschritten: 1) Datenquantifizierung, 2) Datenmodellierung und Signifikanzbewertung und 3) Datenintegration, die in dem R-Paket reg2gene zusammengefasst sind. Als Ergebnis haben wir zwei Sätze von Enhancer-Gen-Assoziationen (EGAs) identifiziert: den flexiblen Satz von ~230K EGAs (flexibleC) und den stringenten Satz von ~60K EGAs (stringentC). Wir haben große Unterschiede zwischen den bisher veröffentlichten Berechnungsmodellen für Enhancer-Gene-Assoziationen festgestellt, vor allem in Bezug auf die Lage, die Anzahl und die Eigenschaften der definierten Enhancer-Regionen und EGAs. Wir führten ein detailliertes Benchmarking von sieben Sets von rechnerisch modellierten EGAs durch, zeigten jedoch, dass keiner der derzeit verfügbaren Benchmark-Datensätze als "goldener Standard" verwendet werden kann. Wir definierten einen zusätzlichen Benchmark-Datensatz mit positiven und negativen EGAs, mit dem wir zeigten, dass das stringentC-Modell den höchsten positiven Vorhersagewert (PPV) hatte. Wir haben das Potenzial von EGAs zur Identifizierung von Genzielen von nicht-kodierenden SNP-Gene-Assoziationen nachgewiesen. Schließlich führten wir eine funktionelle Analyse durch, um neue Genziele, Enhancer-Pleiotropie und Mechanismen der Enhancer-Aktivität zu ermitteln. Insgesamt bringt diese Arbeit unser Verständnis der durch Enhancer vermittelten Regulierung der Genexpression in Gesundheit und Krankheit voran.Human genome contains millions of regulatory elements - enhancers - that quantitatively regulate gene expression. Multiple experimental and computational approaches were developed to associate enhancers with their gene targets. Despite the tremendous progress in understanding how enhancers tune gene expression, the field still lacks an approach that is systematic, integrative and accessible for discovering and documenting cis-regulatory relationships across the genome. We developed a novel computational approach - reg2gene- that models and integrates gene expression ~ enhancer activity. reg2gene consists of three main steps: 1) data quantification, 2) data modelling and significance assessment, and 3) data integration gathered in the reg2gene R package. As a result we identified two sets of enhancer-gene associations (EGAs): the flexible set of ~230K EGAs (flexibleC), and the stringent set of ~60K EGAs (stringentC). We identified major differences across previously published computational models of enhancer-gene associations; mostly in the location, number and properties of defined enhancer regions and EGAs. We performed detailed benchmarking of seven sets of computationally modelled EGAs, but showed that none of the currently available benchmark datasets could be used as a “golden-standard” benchmark dataset. To account for that observation, we defined an additional benchmark set of positive and negative EGAs with which we showed that the stringentC model had the highest positive predictive value (PPV) across all analyzed computational models. We reviewed the influence of EGA sets on the functional analysis of risk SNPs and demonstrated the potential of EGAs to identify gene targets of non-coding SNP-gene associations. Lastly, we performed a functional analysis to detect novel gene targets, enhancer pleiotropy, and mechanisms of enhancer activity. Altogether, this work advances our understanding of enhancer-mediated gene expression regulation in health and disease.Ljudski genom sadrži milijune regulatornih elemenata - enhancera - koji kvantitativno reguliraju ekspresiju gena. Unatoč ogromnom napretku u razumijevanju načina na koji enhanceri reguliraju ekspresiju gena, području još uvijek nedostaje pristup koji je sustavan, integrativan i dostupan za otkrivanje i dokumentiranje cis-regulatornih odnosa u cijelom genomu. Razvili smo novu računalnu metodu - reg2gene - koja modelira i integrira aktivnost enhancera~ekspresije gena. reg2gene sastoji se od tri glavna koraka: 1) kvantifikacija podataka, 2) modeliranje podataka i procjena značaja, i 3) integracija podataka prikupljenih u reg2gene R paketu. Kao rezultat toga, identificirali smo dva skupa enhancer-gen interakcija (EGA): fleksibilni skup od ~ 230K EGA (flexibleC) i strogi skup od ~ 60K EGA (stringentC). Utvrdili smo velike razlike u prethodno objavljenim računalnim modelima enhancer-gen interakcija; uglavnom u lokaciji, broju i svojstvima definiranih enhancera i EGA. Izveli smo detaljno mjerenje performansi sedam skupova računalno modeliranih EGA-a, ali smo pokazali da se niti jedan od trenutno dostupnih skupova referentnih podataka ne može koristiti kao referentni skup podataka "zlatnI standard". Definirali smo dodatni referentni skup pozitivnih i negativnih EGA -a pomoću kojih smo pokazali da stringentC ima najveću pozitivnu prediktivnu vrijednost (PPV). Pokazali smo potencijal EGA-a za identifikaciju genskih meta nekodirajucih SNP-ova. Proveli smo funkcionalnu analizu kako bismo otkrili nove genske mete, pleiotropiju enhancera i mehanizme aktivnosti enhancera. Ovaj rad poboljšava naše razumijevanje regulacije ekspresije gena posredovane enhancerima

Dokumenten-Publikationsserver der Humboldt-Universität zu Berlin

NuChart-II: a graph-based approach for the analysis and interpretation of Hi-C data

Author: Aldinucci Marco
Drocco Maurizio
Li&#242
Merelli Ivan
Milanes Luciano
Tordini Fabio
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

Institutional Research Information System University of Turin