Search CORE

63 research outputs found

Combinatorial algorithm for counting small induced graphs and orbits

Author: Demšar Janez
Hočevar Tomaž
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 25/01/2016
Field of study

Graphlet analysis is an approach to network analysis that is particularly popular in bioinformatics. We show how to set up a system of linear equations that relate the orbit counts and can be used in an algorithm that is significantly faster than the existing approaches based on direct enumeration of graphlets. The algorithm requires existence of a vertex with certain properties; we show that such vertex exists for graphlets of arbitrary size, except for complete graphs and

C_4

, which are treated separately. Empirical analysis of running time agrees with the theoretical results

arXiv.org e-Print Archive

Public Library of Science (PLOS)

Directory of Open Access Journals

PubMed Central

Computation of Graphlet Orbits for Nodes and Edges in Sparse Graphs

Author: Demšar Janez
Hočevar Tomaž
Publication venue: 'Foundation for Open Access Statistic'
Publication date: 01/07/2016
Field of study

Graphlet analysis is a useful tool for describing local network topology around individual nodes or edges. A node or an edge can be described by a vector containing the counts of different kinds of graphlets (small induced subgraphs) in which it appears, or the "roles" (orbits) it has within these graphlets. We implemented an R package with functions for fast computation of such counts on sparse graphs. Instead of enumerating all induced graphlets, our algorithm is based on the derived relations between the counts, which decreases the time complexity by an order of magnitude in comparison with past approaches

Directory of Open Access Journals

Journal of Statistical Software

Machine learning for content based image retrieving

Author: Demšar Janez
Solina Franc
Publication venue
Publication date
Field of study

ePrints.FRI

Machine learning for content based image retrieving

Author: Demšar Janez
Solina Franc
Publication venue
Publication date
Field of study

Attribute Interactions in Medical Data Analysis

Author: Bratko Ivan
Demšar Janez
Jakulin Aleks
Smrke Dragica
Zupan Blaz
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/01/2003
Field of study

There is much empirical evidence about the success of naive Bayesian classification (NBC) in medical applications of attribute-based machine learning. NBC assumes conditional independence between attributes. In classification, such classifiers sum up the pieces of class-related evidence from individual attributes, independently of other attributes. The performance, however, deteriorates significantly when the “interactions” between attributes become critical. We propose an approach to handling attribute interactions within the framework of “voting” classifiers, such as NBC. We propose an operational test for detecting interactions in learning data and a procedure that takes the detected interactions into account while learning. This approach induces a structuring of the domain of attributes, it may lead to improved classifier’s performance and may provide useful novel information for the domain expert when interpreting the results of learning. We report on its application in data analysis and model construction for the prediction of clinical outcome in hip arthroplasty

CiteSeerX

ePrints.FRI

FragViz: visualization of fragmented networks

Author: Demšar Janez
Mramor Minca
Zupan Blaž
Štajdohar Miha
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

BACKGROUND Researchers in systems biology use network visualization to summarize the results of their analysis. Such networks often include unconnected components, which popular network alignment algorithms place arbitrarily with respect to the rest of the network. This can lead to misinterpretations due to the proximity of otherwise unrelated elements. RESULTS We propose a new network layout optimization technique called FragViz which can incorporate additional information on relations between unconnected network components. It uses a two-step approach by first arranging the nodes within each of the components and then placing the components so that their proximity in the network corresponds to their relatedness. In the experimental study with the leukemia gene networks we demonstrate that FragViz can obtain network layouts which are more interpretable and hold additional information that could not be exposed using classical network layout optimization algorithms. CONCLUSIONS Network visualization relies on computational techniques for proper placement of objects under consideration. These algorithms need to be fast so that they can be incorporated in responsive interfaces required by the explorative data analysis environments. Our layout optimization technique FragViz meets these requirements and specifically addresses the visualization of fragmented networks, for which standard algorithms do not consider similarities between unconnected components. The experiments confirmed the claims on speed and accuracy of the proposed solution

Springer - Publisher Connector

PubMed Central

ePrints.FRI

Improving generalisation of AutoML systems with dynamic fitness evaluations

Author: Bengio Yoshua
Demšar Janez
Feurer Matthias
Vanwinckelen Gitte
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 23/01/2020
Field of study

A common problem machine learning developers are faced with is overfitting, that is, fitting a pipeline too closely to the training data that the performance degrades for unseen data. Automated machine learning aims to free (or at least ease) the developer from the burden of pipeline creation, but this overfitting problem can persist. In fact, this can become more of a problem as we look to iteratively optimise the performance of an internal cross-validation (most often \textit{k}-fold). While this internal cross-validation hopes to reduce this overfitting, we show we can still risk overfitting to the particular folds used. In this work, we aim to remedy this problem by introducing dynamic fitness evaluations which approximate repeated \textit{k}-fold cross-validation, at little extra cost over single \textit{k}-fold, and far lower cost than typical repeated \textit{k}-fold. The results show that when time equated, the proposed fitness function results in significant improvement over the current state-of-the-art baseline method which uses an internal single \textit{k}-fold. Furthermore, the proposed extension is very simple to implement on top of existing evolutionary computation methods, and can provide essentially a free boost in generalisation/testing performance.Comment: 19 pages, 4 figure

arXiv.org e-Print Archive

Crossref

Interactive Network Exploration with Orange

Author: Janez Demšar
Miha Štajdohar
Publication venue: Foundation for Open Access Statistics
Publication date: 01/01/2013
Field of study

Network analysis is one of the most widely used techniques in many areas of modern science. Most existing tools for that purpose are limited to drawing networks and computing their basic general characteristics. The user is not able to interactively and graphically manipulate the networks, select and explore subgraphs using other statistical and data mining techniques, add and plot various other data within the graph, and so on. In this paper we present a tool that addresses these challenges, an add-on for exploration of networks within the general component-based environment Orange

Crossref

Directory of Open Access Journals

Journal of Statistical Software

GenePath: a System for Automated Construction of Genetic Networks from Mutant Data

Author: Bratko Ivan
Demšar Janez
Halter John
Juvan Peter
Kuspa Adam
Shaulsky Gad
Zupan Blaz
Publication venue
Publication date: 01/01/2003
Field of study

Motivation: Genetic pathways are often used in the analysis of biological phenomena. In classical genetics, they are constructed manually from experimental data on mutants. The field lacks formalism to guide such analysis, and accounting for all the data becomes complicated when large amounts of data are considered. Results: We have developed GenePath, an intelligent assistant that mimics expert geneticists in the analysis of genetic data. GenePath employs expert-defined patterns to uncover gene relations from the data, and uses these relations as constraints that guide the search for a plausible genetic network. GenePath provides formalism to genetic data analysis, facilitates the consideration of all the available data in a consistent and systematic manner, and aids in the examination of the large number of possible consequences of a planned experiment. It also provides an explanation mechanism that traces back every finding to the pertinent data. GenePath was successfully tested on several genetic problems. Availability: GenePath can be accessed at http://genepath.org. Supplementary information: Supplementary material is available at http://genepath.org/bi-supp

ePrints.FRI

Web-enabled knowledge-based analysis of genetic data

Author: Bratko Ivan
Demšar Janez
Halter John A.
Juvan Peter
Kuspa Adam
Shaulsky Gad
Zupan Blaz
Publication venue: Springer-Verlag Heidelberg
Publication date: 01/01/2001
Field of study

We present a web-based implementation of GenePath, an intelligent assistant tool for data analysis in functional genomics. GenePath considers mutant data and uses expert-defined patterns to find gene-to-gene or gene-to-outcome relations. It presents the results of analysis as genetic networks, wherein a set of genes has various influence on one another and on a biological outcome. In the paper, we particularly focus on its web-based interface and explanation mechanisms

CiteSeerX

ePrints.FRI