7,041 research outputs found

    BOOL-AN: A method for comparative sequence analysis and phylogenetic reconstruction

    Get PDF
    A novel discrete mathematical approach is proposed as an additional tool for molecular systematics which does not require prior statistical assumptions concerning the evolutionary process. The method is based on algorithms generating mathematical representations directly from DNA/RNA or protein sequences, followed by the output of numerical (scalar or vector) and visual characteristics (graphs). The binary encoded sequence information is transformed into a compact analytical form, called the Iterative Canonical Form (or ICF) of Boolean functions, which can then be used as a generalized molecular descriptor. The method provides raw vector data for calculating different distance matrices, which in turn can be analyzed by neighbor-joining or UPGMA to derive a phylogenetic tree, or by principal coordinates analysis to get an ordination scattergram. The new method and the associated software for inferring phylogenetic trees are called the Boolean analysis or BOOL-AN

    Pattern vectors from algebraic graph theory

    Get PDF
    Graphstructures have proven computationally cumbersome for pattern analysis. The reason for this is that, before graphs can be converted to pattern vectors, correspondences must be established between the nodes of structures which are potentially of different size. To overcome this problem, in this paper, we turn to the spectral decomposition of the Laplacian matrix. We show how the elements of the spectral matrix for the Laplacian can be used to construct symmetric polynomials that are permutation invariants. The coefficients of these polynomials can be used as graph features which can be encoded in a vectorial manner. We extend this representation to graphs in which there are unary attributes on the nodes and binary attributes on the edges by using the spectral decomposition of a Hermitian property matrix that can be viewed as a complex analogue of the Laplacian. To embed the graphs in a pattern space, we explore whether the vectors of invariants can be embedded in a low- dimensional space using a number of alternative strategies, including principal components analysis ( PCA), multidimensional scaling ( MDS), and locality preserving projection ( LPP). Experimentally, we demonstrate that the embeddings result in well- defined graph clusters. Our experiments with the spectral representation involve both synthetic and real- world data. The experiments with synthetic data demonstrate that the distances between spectral feature vectors can be used to discriminate between graphs on the basis of their structure. The real- world experiments show that the method can be used to locate clusters of graphs

    Exploring the randomness of Directed Acyclic Networks

    Get PDF
    The feed-forward relationship naturally observed in time-dependent processes and in a diverse number of real systems -such as some food-webs and electronic and neural wiring- can be described in terms of so-called directed acyclic graphs (DAGs). An important ingredient of the analysis of such networks is a proper comparison of their observed architecture against an ensemble of randomized graphs, thereby quantifying the {\em randomness} of the real systems with respect to suitable null models. This approximation is particularly relevant when the finite size and/or large connectivity of real systems make inadequate a comparison with the predictions obtained from the so-called {\em configuration model}. In this paper we analyze four methods of DAG randomization as defined by the desired combination of topological invariants (directed and undirected degree sequence and component distributions) aimed to be preserved. A highly ordered DAG, called \textit{snake}-graph and a Erd\:os-R\'enyi DAG were used to validate the performance of the algorithms. Finally, three real case studies, namely, the \textit{C. elegans} cell lineage network, a PhD student-advisor network and the Milgram's citation network were analyzed using each randomization method. Results show how the interpretation of degree-degree relations in DAGs respect to their randomized ensembles depend on the topological invariants imposed. In general, real DAGs provide disordered values, lower than the expected by chance when the directedness of the links is not preserved in the randomization process. Conversely, if the direction of the links is conserved throughout the randomization process, disorder indicators are close to the obtained from the null-model ensemble, although some deviations are observed.Comment: 13 pages, 5 figures and 5 table
    corecore