125 research outputs found

    Multi Layer Analysis

    Get PDF
    This thesis presents a new methodology to analyze one-dimensional signals trough a new approach called Multi Layer Analysis, for short MLA. It also provides some new insights on the relationship between one-dimensional signals processed by MLA and tree kernels, test of randomness and signal processing techniques. The MLA approach has a wide range of application to the fields of pattern discovery and matching, computational biology and many other areas of computer science and signal processing. This thesis includes also some applications of this approach to real problems in biology and seismology

    A methodology to assess the intrinsic discriminative ability of a distance function and its interplay with clustering algorithms for microarray data analysis

    Get PDF
    Background: Clustering is one of the most well known activities in scientific investigation and the object of research in many disciplines, ranging from statistics to computer science. Following Handl et al., it can be summarized as a three step process: (1) choice of a distance function; (2) choice of a clustering algorithm; (3) choice of a validation method. Although such a purist approach to clustering is hardly seen in many areas of science, genomic data require that level of attention, if inferences made from cluster analysis have to be of some relevance to biomedical research. Results: A procedure is proposed for the assessment of the discriminative ability of a distance function. That is, the evaluation of the ability of a distance function to capture structure in a dataset. It is based on the introduction of a new external validation index, referred to as Balanced Misclassification Index (BMI, for short) and of a nontrivial modification of the well known Receiver Operating Curve (ROC, for short), which we refer to as Corrected ROC (CROC, for short). The main results are: (a) a quantitative and qualitative method to describe the intrinsic separation ability of a distance; (b) a quantitative method to assess the performance of a clustering algorithm in conjunction with the intrinsic separation ability of a distance function. The proposed procedure is more informative than the ones available in the literature due to the adopted tools. Indeed, the first one allows to map distances and clustering solutions as graphical objects on a plane, and gives information about the bias of the clustering algorithm with respect to a distance. The second tool is a new external validity index which shows similar performances with respect to the state of the art, but with more flexibility, allowing for a broader spectrum of applications. In fact, it allows not only to quantify the merit of each clustering solution but also to quantify the agglomerative or divisive errors due to the algorithm. Conclusions: The new methodology has been used to experimentally study three popular distance functions, namely, Euclidean distance d2, Pearson correlation dr and mutual information dMI. Based on the results of the experiments, we have that the Euclidean and Pearson correlation distances have a good intrinsic discrimination ability. Conversely, the mutual information distance does not seem to offer the same flexibility and versatility as the other two distances. Apparently, that is due to well known problems in its estimation. since it requires that a dataset must have a substantial number of features to be reliable. Nevertheless, taking into account such a fact, together with results presented in Priness et al., one receives an indication that dMI may be superior to the other distances considered in this study only in conjunction with clustering algorithms specifically designed for its use. In addition, it results that K-means, Average Link, and Complete link clustering algorithms are in most cases able to improve the discriminative ability of the distances considered in this study with respect to clustering. The methodology has a range of applicability that goes well beyond microarray data since it is independent of the nature of the input data. The only requirement is that the input data must have the same format of a "feature matrix". In particular it can be used to cluster ChIP-seq data

    A motif-independent metric for DNA sequence specificity

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Genome-wide mapping of protein-DNA interactions has been widely used to investigate biological functions of the genome. An important question is to what extent such interactions are regulated at the DNA sequence level. However, current investigation is hampered by the lack of computational methods for systematic evaluating sequence specificity.</p> <p>Results</p> <p>We present a simple, unbiased quantitative measure for DNA sequence specificity called the Motif Independent Measure (MIM). By analyzing both simulated and real experimental data, we found that the MIM measure can be used to detect sequence specificity independent of presence of transcription factor (TF) binding motifs. We also found that the level of specificity associated with H3K4me1 target sequences is highly cell-type specific and highest in embryonic stem (ES) cells. We predicted H3K4me1 target sequences by using the N- score model and found that the prediction accuracy is indeed high in ES cells.The software to compute the MIM is freely available at: <url>https://github.com/lucapinello/mim</url>. </p> <p>Conclusions</p> <p>Our method provides a unified framework for quantifying DNA sequence specificity and serves as a guide for development of sequence-based prediction models.</p

    Development of a space exploration rover digital twin for damage detection

    Get PDF
    This study focuses on the creation of a digital twin of a space exploration rover to perform damage detection. The digital twin incorporates various subsystems of real rovers to accurately simulate the rover’s behaviour. Damage detection is performed by introducing damages into the digital twin and comparing signals obtained in healthy and damaged conditions. By using the multiphysics model created by integrating different subsystems, the effect of damages can be observed in other subsystems of the rover. The study aims to demonstrate the potentiality of a digital twin for damage detection, reducing the risk of mission failure and data loss

    A Multi-Layer Method to Study Genome-Scale Positions of Nucleosomes

    Get PDF
    The basic unit of eukaryotic chromatin is the nucleosome, consisting of about 150 by of DNA wrapped around a protein core made of histone proteins. Nucleosomes position is modulated in vivo to regulate fundamental nuclear processes. To measure nucleosome positions on a genomic scale both theoretical and experimental approaches have been recently reported. We have developed a new method, Multi-Layer Model (MLM), for the analysis of nucleosome position data obtained with microarray-based approach. The MLM is a feature extraction method in which the input data is processed by a classifier to distinguish between several kinds of patterns. We applied our method to simulated-synthetic and experimental nucleosome position data and found that besides a high nucleosome recognition and a strong agreement with standard statistical methods, the MLM can identify distinct classes of nucleosomes, making it an important tool for the genome wide analysis of nucleosome position and function. In conclusion, the MLM allows a better representation of nucleosome position data and a significant reduction in computational time

    A new dissimilarity measure for clustering seismic signals

    Get PDF
    Hypocenter and focal mechanism of an earthquake can be determined by the analysis of signals, named waveforms, related to the wave field produced and recorded by a seismic network. Assuming that waveform similarity implies the similarity of focal parameters, the analysis of those signals characterized by very similar shapes can be used to give important details about the physical phenomena which have generated an earthquake. Recent works have shown the effectiveness of cross-correlation and/or cross-spectral dissimilarities to identify clusters of seismic events. In this work we propose a new dissimilarity measure between seismic signals whose reliability has been tested on real seismic data by computing external and internal validation indices on the obtained clustering. Results show its superior quality in terms of cluster homogeneity and computational time with respect to the largely adopted cross correlation dissimilarity

    Robust And Scalable Learning Of Complex Dataset Topologies Via Elpigraph

    Full text link
    Large datasets represented by multidimensional data point clouds often possess non-trivial distributions with branching trajectories and excluded regions, with the recent single-cell transcriptomic studies of developing embryo being notable examples. Reducing the complexity and producing compact and interpretable representations of such data remains a challenging task. Most of the existing computational methods are based on exploring the local data point neighbourhood relations, a step that can perform poorly in the case of multidimensional and noisy data. Here we present ElPiGraph, a scalable and robust method for approximation of datasets with complex structures which does not require computing the complete data distance matrix or the data point neighbourhood graph. This method is able to withstand high levels of noise and is capable of approximating complex topologies via principal graph ensembles that can be combined into a consensus principal graph. ElPiGraph deals efficiently with large and complex datasets in various fields from biology, where it can be used to infer gene dynamics from single-cell RNA-Seq, to astronomy, where it can be used to explore complex structures in the distribution of galaxies.Comment: 32 pages, 14 figure

    Fault-Tolerant Distributed Deployment of Embedded Control Software

    Full text link

    ZNF410 represses fetal globin by devoted control of CHD4/NuRD [preprint]

    Get PDF
    Major effectors of adult-stage fetal globin silencing include the transcription factors (TFs) BCL11A and ZBTB7A/LRF and the NuRD chromatin complex, although each has potential on-target liabilities for rational β-hemoglobinopathy therapeutic inhibition. Here through CRISPR screening we discover ZNF410 to be a novel fetal hemoglobin (HbF) repressing TF. ZNF410 does not bind directly to the γ-globin genes but rather its chromatin occupancy is solely concentrated at CHD4, encoding the NuRD nucleosome remodeler, itself required for HbF repression. CHD4 has two ZNF410-bound regulatory elements with 27 combined ZNF410 binding motifs constituting unparalleled genomic clusters. These elements completely account for ZNF410’s effects on γ-globin repression. Knockout of ZNF410 reduces CHD4 by 60%, enough to substantially de-repress HbF while avoiding the cellular toxicity of complete CHD4 loss. Mice with constitutive deficiency of the homolog Zfp410 are born at expected Mendelian ratios with unremarkable hematology. ZNF410 is dispensable for human hematopoietic engraftment potential and erythroid maturation unlike known HbF repressors. These studies identify a new rational target for HbF induction for the β-hemoglobin disorders with a wide therapeutic index. More broadly, ZNF410 represents a special class of gene regulator, a conserved transcription factor with singular devotion to regulation of a chromatin subcomplex
    • …
    corecore