12 research outputs found
Discovering Patterns of Interest in IP Traffic Using Cliques in Bipartite Link Streams
Studying IP traffic is crucial for many applications. We focus here on the
detection of (structurally and temporally) dense sequences of interactions,
that may indicate botnets or coordinated network scans. More precisely, we
model a MAWI capture of IP traffic as a link streams, i.e. a sequence of
interactions meaning that devices and exchanged
packets from time to time . This traffic is captured on a single
router and so has a bipartite structure: links occur only between nodes in two
disjoint sets. We design a method for finding interesting bipartite cliques in
such link streams, i.e. two sets of nodes and a time interval such that all
nodes in the first set are linked to all nodes in the second set throughout the
time interval. We then explore the bipartite cliques present in the considered
trace. Comparison with the MAWILab classification of anomalous IP addresses
shows that the found cliques succeed in detecting anomalous network activity
Function vs. Taxonomy: The Case of Fungi Mitochondria ATP Synthase Genes
We studied the relations between triplet composition of the family of mitochondrial atp6, atp8 and atp9 genes, their function, and
taxonomy of the bearers. The points in 64-dimensional metric space corresponding to genes have been clustered. It was found the points are separated into three clusters corresponding to those genes. 223 mitochondrial genomes have been enrolled into the database
Triplet Frequencies Implementation in Total Transcriptome Analysis
We studied the structuredness in total transcriptome of Siberian larch. To do that, the contigs from total transcriptome has been labeled with the reads comprising the tissue specific transcriptomes, and the distribution of the contigs from the total transcriptome has been developed with respect to the mutual entropy of the frequencies of occurrence of reads from tissue specific transcriptomes. It was found that a number of contigs contain comparable amounts of reads from different tissues, so the chimeric transcripts to be extremely abundant. On the contrary, the transcripts with high tissue specificity do not yield a reliable
clustering revealing the tissue specificity. This fact makes usage of total transcriptome for the purposes of differential expression arguable
Improving interpretability of complex predictive models
This work is a part of a collaboration of LARCA Research Group with the Hospital de Sant Pau i
de la Santa Creu, which kindly agreed to provide us with the real clinical data of its patients. In this
work we will make an attempt to build the interpretation for several datasets and in general try to
implement a system that helps explaining results of the machine learning algorithm to people with no
special knowledge of the subject. Using the best from the current approaches and trying to avoid their
negative sides, we made an attempt to construct the explanation system for decision support, that
improves the existing solutions. This was reached by adding multiple thresholds instead of a single one
to the numerical values to enhance their interpretation, adding the possible trajectories with frequent
itemset predictions and adding the possibility to explain the whole data set in an intelligent way by
several methods of selecting the explanation instances