12 research outputs found

    Discovering Patterns of Interest in IP Traffic Using Cliques in Bipartite Link Streams

    Full text link
    Studying IP traffic is crucial for many applications. We focus here on the detection of (structurally and temporally) dense sequences of interactions, that may indicate botnets or coordinated network scans. More precisely, we model a MAWI capture of IP traffic as a link streams, i.e. a sequence of interactions (t1,t2,u,v)(t_1 , t_2 , u, v) meaning that devices uu and vv exchanged packets from time t1t_1 to time t2t_2 . This traffic is captured on a single router and so has a bipartite structure: links occur only between nodes in two disjoint sets. We design a method for finding interesting bipartite cliques in such link streams, i.e. two sets of nodes and a time interval such that all nodes in the first set are linked to all nodes in the second set throughout the time interval. We then explore the bipartite cliques present in the considered trace. Comparison with the MAWILab classification of anomalous IP addresses shows that the found cliques succeed in detecting anomalous network activity

    Function vs. Taxonomy: The Case of Fungi Mitochondria ATP Synthase Genes

    Get PDF
    We studied the relations between triplet composition of the family of mitochondrial atp6, atp8 and atp9 genes, their function, and taxonomy of the bearers. The points in 64-dimensional metric space corresponding to genes have been clustered. It was found the points are separated into three clusters corresponding to those genes. 223 mitochondrial genomes have been enrolled into the database

    Triplet Frequencies Implementation in Total Transcriptome Analysis

    Get PDF
    We studied the structuredness in total transcriptome of Siberian larch. To do that, the contigs from total transcriptome has been labeled with the reads comprising the tissue specific transcriptomes, and the distribution of the contigs from the total transcriptome has been developed with respect to the mutual entropy of the frequencies of occurrence of reads from tissue specific transcriptomes. It was found that a number of contigs contain comparable amounts of reads from different tissues, so the chimeric transcripts to be extremely abundant. On the contrary, the transcripts with high tissue specificity do not yield a reliable clustering revealing the tissue specificity. This fact makes usage of total transcriptome for the purposes of differential expression arguable

    Improving interpretability of complex predictive models

    Get PDF
    This work is a part of a collaboration of LARCA Research Group with the Hospital de Sant Pau i de la Santa Creu, which kindly agreed to provide us with the real clinical data of its patients. In this work we will make an attempt to build the interpretation for several datasets and in general try to implement a system that helps explaining results of the machine learning algorithm to people with no special knowledge of the subject. Using the best from the current approaches and trying to avoid their negative sides, we made an attempt to construct the explanation system for decision support, that improves the existing solutions. This was reached by adding multiple thresholds instead of a single one to the numerical values to enhance their interpretation, adding the possible trajectories with frequent itemset predictions and adding the possibility to explain the whole data set in an intelligent way by several methods of selecting the explanation instances
    corecore