173 research outputs found

    Probabilistic KK-mean with local alignment for clustering and motif discovery in functional data

    Full text link
    We develop a new method to locally cluster curves and discover functional motifs, i.e.~typical ``shapes'' that may recur several times along and across the curves capturing important local characteristics. In order to identify these shared curve portions, our method leverages ideas from functional data analysis (joint clustering and alignment of curves), bioinformatics (local alignment through the extension of high similarity seeds) and fuzzy clustering (curves belonging to more than one cluster, if they contain more than one typical ``shape''). It can employ various dissimilarity measures and incorporate derivatives in the discovery process, thus exploiting complex facets of shapes. We demonstrate the performance of our method with an extensive simulation study, and show how it generalizes other clustering methods for functional data. Finally, we provide real data applications to Berkeley growth data, Italian Covid-19 death curves and ``Omics'' data related to mutagenesis.Comment: 22 pages, 6 figures. This work has been presented at various conference

    Enhancing Threat Model Validation: A White-Box Approach based on Statistical Model Checking and Process Mining

    Get PDF
    Our method addresses the challenge of validating threat models by comparing actual behavior with expected behavior. Statistical Model Checking (SMC) is frequently the more appropriate technique for validating models, as it relies on statistically relevant samples to analyze systems with potentially infinite state spaces. In the case of black-box systems, where it is not possible to make complete assumptions about the transition structure, black-box SMC becomes necessary. However, the numeric results of the SMC analysis lack insights on the model’s dynamics, prompting our proposal to enhance SMC analysis by incorporating visual information on the behavior that led to a given estimation. Our method improves traditional model validation using SMC by enriching its analyses with Process Mining (PM) techniques. Our approach takes simulated event logs as inputs, and uses PM techniques to reconstruct an observed model to be compared with the graphical representation of the original model, obtaining a diff model highlighting discrepancies among expected and actual behavior. This allows the modeler to address unexpected or missing behaviors. In this paper we further customize the diff model for aspects specific to threat model analysis, incorporating features such as new colored edges to symbolize an attacker’s initial assets and a automatic fix for simple classes of modeling errors which generate unexpected deadlocks in the simulated model. Our approach offers an effective and scalable solution for threat model validation, contributing to the evolving landscape of risk modeling and analysis

    The shapes of an epidemic: using Functional Data Analysis to characterize COVID-19 in Italy

    Full text link
    We investigate patterns of COVID-19 mortality across 20 Italian regions and their association with mobility, positivity, and socio-demographic, infrastructural and environmental covariates. Notwithstanding limitations in accuracy and resolution of the data available from public sources, we pinpoint significant trends exploiting information in curves and shapes with Functional Data Analysis techniques. These depict two starkly different epidemics; an "exponential" one unfolding in Lombardia and the worst hit areas of the north, and a milder, "flat(tened)" one in the rest of the country -- including Veneto, where cases appeared concurrently with Lombardia but aggressive testing was implemented early on. We find that mobility and positivity can predict COVID-19 mortality, also when controlling for relevant covariates. Among the latter, primary care appears to mitigate mortality, and contacts in hospitals, schools and work places to aggravate it. The techniques we describe could capture additional and potentially sharper signals if applied to richer data

    Qualitative network models and genome-wide expression data define carbon/nitrogen-responsive molecular machines in Arabidopsis

    Get PDF
    BACKGROUND: Carbon (C) and nitrogen (N) metabolites can regulate gene expression in Arabidopsis thaliana. Here, we use multinetwork analysis of microarray data to identify molecular networks regulated by C and N in the Arabidopsis root system. RESULTS: We used the Arabidopsis whole genome Affymetrix gene chip to explore global gene expression responses in plants exposed transiently to a matrix of C and N treatments. We used ANOVA analysis to define quantitative models of regulation for all detected genes. Our results suggest that about half of the Arabidopsis transcriptome is regulated by C, N or CN interactions. We found ample evidence for interactions between C and N that include genes involved in metabolic pathways, protein degradation and auxin signaling. To provide a global, yet detailed, view of how the cell molecular network is adjusted in response to the CN treatments, we constructed a qualitative multinetwork model of the Arabidopsis metabolic and regulatory molecular network, including 6,176 genes, 1,459 metabolites and 230,900 interactions among them. We integrated the quantitative models of CN gene regulation with the wiring diagram in the multinetwork, and identified specific interacting genes in biological modules that respond to C, N or CN treatments. CONCLUSION: Our results indicate that CN regulation occurs at multiple levels, including potential post-transcriptional control by microRNAs. The network analysis of our systematic dataset of CN treatments indicates that CN sensing is a mechanism that coordinates the global and coordinated regulation of specific sets of molecular machines in the plant cell

    On the compatibility of benevolence and self-interest:philanthropy and entrepreneurial orientation

    Get PDF
    This article explores the philanthropy of owner–managers of small- and medium-sized enterprises (SMEs) investigating whether and why more entrepreneurially oriented SMEs are also more likely to engage in philanthropic activities. We find support for a positive link between entrepreneurial orientation (EO) and philanthropy in a representative sample of 270 Lithuanian SMEs controlling for alternative explanations. We highlight that philanthropy is relatively common among SME owner–managers and thus complement existing research which views philanthropy as sequentially following wealth generation. In line with our theorizing, further qualitative findings point to drivers of philanthropy beyond those considered in the dominant strategic-instrumental perspective. Building on social-psychological theories of motivation, we argue and confirm that philanthropy can also be an expression of owner–managers’ altruistic values; these values can be compatible and even mutually reinforcing with entrepreneurship. Our study is set in a transition economy, Lithuania, facilitating the analysis of heterogeneity in attitudes toward philanthropy

    Maternal age effect and severe germ-line bottleneck in the inheritance of human mitochondrial DNA

    Get PDF
    The manifestation of mitochondrial DNA (mtDNA) diseases depends on the frequency of heteroplasmy (the presence of several alleles in an individual), yet its transmission across generations cannot be readily predicted owing to a lack of data on the size of the mtDNA bottleneck during oogenesis. For deleterious heteroplasmies, a severe bottleneck may abruptly transform a benign (low) frequency in a mother into a disease-causing (high) frequency in her child. Here we present a high-resolution study of heteroplasmy transmission conducted on blood and buccal mtDNA of 39 healthy mother–child pairs of European ancestry (a total of 156 samples, each sequenced at ∼20,000× per site). On average, each individual carried one heteroplasmy, and one in eight individuals carried a disease-associated heteroplasmy, with minor allele frequency ≥1%. We observed frequent drastic heteroplasmy frequency shifts between generations and estimated the effective size of the germ-line mtDNA bottleneck at only ∼30–35 (interquartile range from 9 to 141). Accounting for heteroplasmies, we estimated the mtDNA germ-line mutation rate at 1.3 × 10−8 (interquartile range from 4.2 × 10−9 to 4.1 × 10−8) mutations per site per year, an order of magnitude higher than for nuclear DNA. Notably, we found a positive association between the number of heteroplasmies in a child and maternal age at fertilization, likely attributable to oocyte aging. This study also took advantage of droplet digital PCR (ddPCR) to validate heteroplasmies and confirm a de novo mutation. Our results can be used to predict the transmission of disease-causing mtDNA variants and illuminate evolutionary dynamics of the mitochondrial genome

    Support vector machines categorize the scaling of human grip configurations

    Full text link
    In previous work (Cesari & Newell, 2002), we used a graphical dimensional analysis to show that grip transitions obey the body-scaled relation K = InLo + InMo/(a + bM(h) + cL(h)), where L-o and M-o are the object's length and mass, and L-h and M-h the length and mass of the grasper's hand. However, the generality of the equation was limited by the ad hoc graphical method that defined the lines for grip separation and by the assumption that these lines be negatively sloped and parallel to one another. This article reports an independent test of this relation by the geometrical and statistical categorization of body-scaled invariants for the transition of human grip configurations through support vector machines (SVMs). The SVM analysis confirmed the fit of linear, negatively sloped, and approximately parallel transition boundaries in the scaling of human grip configuration within a single hand. The SVM analysis has provided a theoretical refinement to the scaling model of human grip configurations

    Gentle Masking of Low-Complexity Sequences Improves Homology Search

    Get PDF
    Detection of sequences that are homologous, i.e. descended from a common ancestor, is a fundamental task in computational biology. This task is confounded by low-complexity tracts (such as atatatatatat), which arise frequently and independently, causing strong similarities that are not homologies. There has been much research on identifying low-complexity tracts, but little research on how to treat them during homology search. We propose to find homologies by aligning sequences with “gentle” masking of low-complexity tracts. Gentle masking means that the match score involving a masked letter is , where is the unmasked score. Gentle masking slightly but noticeably improves the sensitivity of homology search (compared to “harsh” masking), without harming specificity. We show examples in three useful homology search problems: detection of NUMTs (nuclear copies of mitochondrial DNA), recruitment of metagenomic DNA reads to reference genomes, and pseudogene detection. Gentle masking is currently the best way to treat low-complexity tracts during homology search
    corecore