499 research outputs found

    An introduction to spectral distances in networks (extended version)

    Full text link
    Many functions have been recently defined to assess the similarity among networks as tools for quantitative comparison. They stem from very different frameworks - and they are tuned for dealing with different situations. Here we show an overview of the spectral distances, highlighting their behavior in some basic cases of static and dynamic synthetic and real networks

    Dyslexic children's reading pattern as input for ASR: Data, analysis, and pronunciation model

    Get PDF
    To realize an automatic speech recognition (ASR) model that is able to recognize the Bahasa Melayu reading difficulties of dyslexic children, the language corpora has to be generated beforehand. For this purpose, data collection is performed in two public schools involving ten dyslexic children aged between seven to fourteen years old. A total of 114 Bahasa Melayu words,representing 23 consonant-vowel patterns in the spelling system of the language, served as the stimuli. The patterns range from simple to somewhat complex formations of consonant-vowel pairs in words listed in a level one primary school syllabus. An analysis was performed aimed at identifying the most frequent errors made by these dyslexic children when reading aloud, and describing the emerging reading pattern of dyslexic children in general. This paper hence provides an overview of the entire process from data collection to analysis to modeling the pronunciations of words which will serve as the active lexicon for the ASR model. This paper also highlights the challenges of data collection involving dyslexic children when they are reading aloud, and other factors that contribute to the complex nature of the data collected

    Hunting for Pirated Software Using Metamorphic Analysis

    Get PDF
    In this paper, we consider the problem of detecting software that has been pirated and modified. We analyze a variety of detection techniques that have been previously studied in the context of malware detection. For each technique, we empirically determine the detection rate as a function of the degree of modification of the original code. We show that the code must be greatly modified before we fail to reliably distinguish it, and we show that our results offer a significant improvement over previous related work. Our approach can be applied retroactively to any existing software and hence, it is both practical and effective

    Gene Family Histories: Theory and Algorithms

    Get PDF
    Detailed gene family histories and reconciliations with species trees are a prerequisite for studying associations between genetic and phenotypic innovations. Even though the true evolutionary scenarios are usually unknown, they impose certain constraints on the mathematical structure of data obtained from simple yes/no questions in pairwise comparisons of gene sequences. Recent advances in this field have led to the development of methods for reconstructing (aspects of) the scenarios on the basis of such relation data, which can most naturally be represented by graphs on the set of considered genes. We provide here novel characterizations of best match graphs (BMGs) which capture the notion of (reciprocal) best hits based on sequence similarities. BMGs provide the basis for the detection of orthologous genes (genes that diverged after a speciation event). There are two main sources of error in pipelines for orthology inference based on BMGs. Firstly, measurement errors in the estimation of best matches from sequence similarity in general lead to violations of the characteristic properties of BMGs. The second issue concerns the reconstruction of the orthology relation from a BMG. We show how to correct estimated BMG to mathematically valid ones and how much information about orthologs is contained in BMGs. We then discuss implicit methods for horizontal gene transfer (HGT) inference that focus on pairs of genes that have diverged only after the divergence of the two species in which the genes reside. This situation defines the edge set of an undirected graph, the later-divergence-time (LDT) graph. We explore the mathematical structure of LDT graphs and show how much information about all HGT events is contained in such LDT graphs

    Scalable string reconciliation by recursive content-dependent shingling

    Get PDF
    We consider the problem of reconciling similar strings in a distributed system. Specifically, we are interested in performing this reconciliation in an efficient manner, minimizing the communication cost. Our problem applies to several types of large-scale distributed networks, file synchronization utilities, and any system that manages the consistency of string encoded ordered data. We present the novel Recursive Content-Dependent Shingling (RCDS) protocol that can handle large strings and has the communication complexity that scales with the edit distance between the reconciling strings. Also, we provide analysis, experimental results, and comparisons to existing synchronization software such as the Rsync utility with an implementation of our protocol.2019-12-03T00:00:00

    Alternative Ranking-Based Clustering and Reliability Index-Based Consensus Reaching Process for Hesitant Fuzzy Large Scale Group Decision Making

    Get PDF
    The paper addresses the growing importance of Large Scale Group Decision Making (LSGDM) problems, focusing on hesitant fuzzy LSGDM. It introduces a Reliability Index-based Consensus Reaching Process (RI-CRP) to enhance efficiency. The proposed method assesses the ordinal consistency of decision makers' (DMs) information, measures deviation, and assigns a reliability index to DMs' opinions. An unreliable DMs management method is presented to filter out unreliable information. Additionally, an Alternative Ranking-based Clustering (ARC) method with hesitant fuzzy reciprocal preference relations is proposed to improve the efficiency of RI-CRP. The numerical example demonstrates the feasibility and effectiveness of the ARC method and RI-CRP for hesitant fuzzy LSGDM problems.Este artículo aborda la creciente importancia de los problemas de Toma de Decisiones en Grupo a Gran Escala (LSGDM), centrándose en el LSGDM difuso vacilante. Introduce un Proceso de Consenso Basado en Índices de Fiabilidad (RI-CRP) para mejorar la eficiencia. El método propuesto evalúa la consistencia ordinal de la información de los decisores, mide la desviación y asigna un índice de fiabilidad a las opiniones de los decisores. Se presenta un método de gestión de los decisores poco fiables para filtrar la información poco fiable. Además, se propone un método de agrupamiento alternativo basado en la clasificación (ARC) con relaciones de preferencia recíproca difusas vacilantes para mejorar la eficacia de RI-CRP. El ejemplo numérico demuestra la viabilidad y eficacia del método ARC y del RI-CRP para problemas LSGDM difusos vacilantes.Instituto Interuniversitario de Investigación en Data Science and Computational Intelligence (DaSCI

    Heuristic Algorithms for Best Match Graph Editing

    Full text link
    Best match graphs (BMGs) are a class of colored digraphs that naturally appear in mathematical phylogenetics and can be approximated with the help of similarity measures between gene sequences, albeit not without errors. The corresponding graph editing problem can be used as a means of error correction. Since the arc set modification problems for BMGs are NP-complete, efficient heuristics are needed if BMGs are to be used for the practical analysis of biological sequence data. Since BMGs have a characterization in terms of consistency of a certain set of rooted triples, we consider heuristics that operate on triple sets. As an alternative, we show that there is a close connection to a set partitioning problem that leads to a class of top-down recursive algorithms that are similar to Aho's supertree algorithm and give rise to BMG editing algorithms that are consistent in the sense that they leave BMGs invariant. Extensive benchmarking shows that community detection algorithms for the partitioning steps perform best for BMG editing

    Novel Method for Measuring Structure and Semantic Similarity of XML Documents Based on Extended Adjacency Matrix

    Get PDF
    AbstractSimilarity measurement of XML documents is crucial to meet various needs of approximate searches and document classifications in XML-oriented applications. Some methods have been proposed for this purpose. Nevertheless, few methods can be elegantly exploited to depict structure and semantic information and hence to effectively measure the similarity of XML documents. In this paper, we present a new method of computing the structure and semantic similarity of XML documents based on extended adjacency matrix(EAM). Different from a general adjacency matrix, in an EAM, the structure information of not only the adjacent layers but also the ancestor-descendant layers can be stored. For measuring the similarity of two XML documents, the proposed method firstly stores the structure and semantic information in two extended adjacency matrices(M1, M2). Then it computes similarity of the two documents through cos(M1, M2) Experimental results on bench-mark data show that the method holds high efficiency and accuracy

    The Circular Variance as a Visual Summary of Synchronized Voltage Angle Measurements

    Get PDF
    Phasor measurement units (PMUs) allow voltage angle differences across power grids to be monitored to identify sudden shifts associated with system disturbances. The Eastern Interconnection Situational Awareness and Monitoring System (ESAMS) was developed to identify such wide-area disturbances and summarize them in reports released the following day. Demonstration of ESAMS in North America's Eastern Interconnection revealed the need for an effective visual summary of the disturbance's impact on voltage angle pairs. This paper proposes the use of the circular variance, a measure of dispersion applicable to angular data, for this purpose. Results based on PMU data from North America's Eastern and Western interconnections indicate that the circular variance provides useful summaries of wide-area voltage angle measurements. They also show that the circular variance may have potential uses when applied to historical data to identify unusual grid conditions
    corecore