120 research outputs found

    Multi Layer Analysis

    Get PDF
    This thesis presents a new methodology to analyze one-dimensional signals trough a new approach called Multi Layer Analysis, for short MLA. It also provides some new insights on the relationship between one-dimensional signals processed by MLA and tree kernels, test of randomness and signal processing techniques. The MLA approach has a wide range of application to the fields of pattern discovery and matching, computational biology and many other areas of computer science and signal processing. This thesis includes also some applications of this approach to real problems in biology and seismology

    A motif-independent metric for DNA sequence specificity

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Genome-wide mapping of protein-DNA interactions has been widely used to investigate biological functions of the genome. An important question is to what extent such interactions are regulated at the DNA sequence level. However, current investigation is hampered by the lack of computational methods for systematic evaluating sequence specificity.</p> <p>Results</p> <p>We present a simple, unbiased quantitative measure for DNA sequence specificity called the Motif Independent Measure (MIM). By analyzing both simulated and real experimental data, we found that the MIM measure can be used to detect sequence specificity independent of presence of transcription factor (TF) binding motifs. We also found that the level of specificity associated with H3K4me1 target sequences is highly cell-type specific and highest in embryonic stem (ES) cells. We predicted H3K4me1 target sequences by using the N- score model and found that the prediction accuracy is indeed high in ES cells.The software to compute the MIM is freely available at: <url>https://github.com/lucapinello/mim</url>. </p> <p>Conclusions</p> <p>Our method provides a unified framework for quantifying DNA sequence specificity and serves as a guide for development of sequence-based prediction models.</p

    A Multi-Layer Method to Study Genome-Scale Positions of Nucleosomes

    Get PDF
    The basic unit of eukaryotic chromatin is the nucleosome, consisting of about 150 by of DNA wrapped around a protein core made of histone proteins. Nucleosomes position is modulated in vivo to regulate fundamental nuclear processes. To measure nucleosome positions on a genomic scale both theoretical and experimental approaches have been recently reported. We have developed a new method, Multi-Layer Model (MLM), for the analysis of nucleosome position data obtained with microarray-based approach. The MLM is a feature extraction method in which the input data is processed by a classifier to distinguish between several kinds of patterns. We applied our method to simulated-synthetic and experimental nucleosome position data and found that besides a high nucleosome recognition and a strong agreement with standard statistical methods, the MLM can identify distinct classes of nucleosomes, making it an important tool for the genome wide analysis of nucleosome position and function. In conclusion, the MLM allows a better representation of nucleosome position data and a significant reduction in computational time

    A new dissimilarity measure for clustering seismic signals

    Get PDF
    Hypocenter and focal mechanism of an earthquake can be determined by the analysis of signals, named waveforms, related to the wave field produced and recorded by a seismic network. Assuming that waveform similarity implies the similarity of focal parameters, the analysis of those signals characterized by very similar shapes can be used to give important details about the physical phenomena which have generated an earthquake. Recent works have shown the effectiveness of cross-correlation and/or cross-spectral dissimilarities to identify clusters of seismic events. In this work we propose a new dissimilarity measure between seismic signals whose reliability has been tested on real seismic data by computing external and internal validation indices on the obtained clustering. Results show its superior quality in terms of cluster homogeneity and computational time with respect to the largely adopted cross correlation dissimilarity

    Robust And Scalable Learning Of Complex Dataset Topologies Via Elpigraph

    Full text link
    Large datasets represented by multidimensional data point clouds often possess non-trivial distributions with branching trajectories and excluded regions, with the recent single-cell transcriptomic studies of developing embryo being notable examples. Reducing the complexity and producing compact and interpretable representations of such data remains a challenging task. Most of the existing computational methods are based on exploring the local data point neighbourhood relations, a step that can perform poorly in the case of multidimensional and noisy data. Here we present ElPiGraph, a scalable and robust method for approximation of datasets with complex structures which does not require computing the complete data distance matrix or the data point neighbourhood graph. This method is able to withstand high levels of noise and is capable of approximating complex topologies via principal graph ensembles that can be combined into a consensus principal graph. ElPiGraph deals efficiently with large and complex datasets in various fields from biology, where it can be used to infer gene dynamics from single-cell RNA-Seq, to astronomy, where it can be used to explore complex structures in the distribution of galaxies.Comment: 32 pages, 14 figure

    Fault-Tolerant Distributed Deployment of Embedded Control Software

    Full text link

    ZNF410 represses fetal globin by devoted control of CHD4/NuRD [preprint]

    Get PDF
    Major effectors of adult-stage fetal globin silencing include the transcription factors (TFs) BCL11A and ZBTB7A/LRF and the NuRD chromatin complex, although each has potential on-target liabilities for rational β-hemoglobinopathy therapeutic inhibition. Here through CRISPR screening we discover ZNF410 to be a novel fetal hemoglobin (HbF) repressing TF. ZNF410 does not bind directly to the γ-globin genes but rather its chromatin occupancy is solely concentrated at CHD4, encoding the NuRD nucleosome remodeler, itself required for HbF repression. CHD4 has two ZNF410-bound regulatory elements with 27 combined ZNF410 binding motifs constituting unparalleled genomic clusters. These elements completely account for ZNF410’s effects on γ-globin repression. Knockout of ZNF410 reduces CHD4 by 60%, enough to substantially de-repress HbF while avoiding the cellular toxicity of complete CHD4 loss. Mice with constitutive deficiency of the homolog Zfp410 are born at expected Mendelian ratios with unremarkable hematology. ZNF410 is dispensable for human hematopoietic engraftment potential and erythroid maturation unlike known HbF repressors. These studies identify a new rational target for HbF induction for the β-hemoglobin disorders with a wide therapeutic index. More broadly, ZNF410 represents a special class of gene regulator, a conserved transcription factor with singular devotion to regulation of a chromatin subcomplex

    Vet-ICD-O-Canine-1, a System for Coding Canine Neoplasms Based on the Human ICD-O-3.2

    Full text link
    Cancer registries are fundamental tools for collecting epidemiological cancer data and developing cancer prevention and control strategies. While cancer registration is common in the human medical field, many attempts to develop animal cancer registries have been launched over time, but most have been discontinued. A pivotal aspect of cancer registration is the availability of cancer coding systems, as provided by the International Classification of Diseases for Oncology (ICD-O). Within the Global Initiative for Veterinary Cancer Surveillance (GIVCS), established to foster and coordinate animal cancer registration worldwide, a group of veterinary pathologists and epidemiologists developed a comparative coding system for canine neoplasms. Vet-ICD-O-canine-1 is compatible with the human ICD-O-3.2 and is consistent with the currently recognized classification schemes for canine tumors. It comprises 335 topography codes and 534 morphology codes. The same code as in ICD-O-3.2 was used for the majority of canine tumors showing a high level of similarity to their human counterparts (n = 408). De novo codes (n = 152) were created for specific canine tumor entities (n = 126) and topographic sites (n = 26). The Vet-ICD-O-canine-1 coding system represents a user-friendly, easily accessible, and comprehensive resource for developing a canine cancer registration system that will enable studies within the One Health space

    BRCA1 Recruitment to Transcriptional Pause Sites Is Required for R-Loop-Driven DNA Damage Repair

    Get PDF
    The mechanisms contributing to transcription-associated genomic instability are both complex and incompletely understood. Although R-loops are normal transcriptional intermediates, they are also associated with genomic instability. Here, we show that BRCA1 is recruited to R-loops that form normally over a subset of transcription termination regions. There it mediates the recruitment of a specific, physiological binding partner, senataxin (SETX). Disruption of this complex led to R-loop-driven DNA damage at those loci as reflected by adjacent Îł-H2AX accumulation and ssDNA breaks within the untranscribed strand of relevant R-loop structures. Genome-wide analysis revealed widespread BRCA1 binding enrichment at R-loop-rich termination regions (TRs) of actively transcribed genes. Strikingly, within some of these genes in BRCA1 null breast tumors, there are specific insertion/deletion mutations located close to R-loop-mediated BRCA1 binding sites within TRs. Thus, BRCA1/SETX complexes support a DNA repair mechanism that addresses R-loop-based DNA damage at transcriptional pause sites
    • …
    corecore