Search CORE

63 research outputs found

DALE: Differential Accumulated Local Effects for efficient and accurate global explanations

Author: Dalamagas Theodore
Diou Christos
Gkolemis Vasilis
Publication venue
Publication date: 10/10/2022
Field of study

Accumulated Local Effect (ALE) is a method for accurately estimating feature effects, overcoming fundamental failure modes of previously-existed methods, such as Partial Dependence Plots. However, ALE's approximation, i.e. the method for estimating ALE from the limited samples of the training set, faces two weaknesses. First, it does not scale well in cases where the input has high dimensionality, and, second, it is vulnerable to out-of-distribution (OOD) sampling when the training set is relatively small. In this paper, we propose a novel ALE approximation, called Differential Accumulated Local Effects (DALE), which can be used in cases where the ML model is differentiable and an auto-differentiable framework is accessible. Our proposal has significant computational advantages, making feature effect estimation applicable to high-dimensional Machine Learning scenarios with near-zero computational overhead. Furthermore, DALE does not create artificial points for calculating the feature effect, resolving misleading estimations due to OOD sampling. Finally, we formally prove that, under some hypotheses, DALE is an unbiased estimator of ALE and we present a method for quantifying the standard error of the explanation. Experiments using both synthetic and real datasets demonstrate the value of the proposed approach.Comment: 16 pages, to be published in Asian Conference of Machine Learning (ACML) 202

arXiv.org e-Print Archive

RHALE: Robust and Heterogeneity-aware Accumulated Local Effects

Author: Dalamagas Theodore
Diou Christos
Gkolemis Vasilis
Ntoutsi Eirini
Publication venue
Publication date: 20/09/2023
Field of study

Accumulated Local Effects (ALE) is a widely-used explainability method for isolating the average effect of a feature on the output, because it handles cases with correlated features well. However, it has two limitations. First, it does not quantify the deviation of instance-level (local) effects from the average (global) effect, known as heterogeneity. Second, for estimating the average effect, it partitions the feature domain into user-defined, fixed-sized bins, where different bin sizes may lead to inconsistent ALE estimations. To address these limitations, we propose Robust and Heterogeneity-aware ALE (RHALE). RHALE quantifies the heterogeneity by considering the standard deviation of the local effects and automatically determines an optimal variable-size bin-splitting. In this paper, we prove that to achieve an unbiased approximation of the standard deviation of local effects within each bin, bin splitting must follow a set of sufficient conditions. Based on these conditions, we propose an algorithm that automatically determines the optimal partitioning, balancing the estimation bias and variance. Through evaluations on synthetic and real datasets, we demonstrate the superiority of RHALE compared to other methods, including the advantages of automatic bin splitting, especially in cases with correlated features.Comment: Accepted at ECAI 2023 (European Conference on Artificial Intelligence

arXiv.org e-Print Archive

Efficient evaluation of generalized path pattern queries on XML data

Author: Dalamagas Theodore
Sellis Timos
Souldatos Stefanos
Theodoratos Dimitri
Wu Xiaoying
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2008
Field of study

Finding the occurrences of structural patterns in XML data is a key operation in XML query processing. Existing algorithms for this operation focus almost exclusively on path-patterns or tree-patterns. Requirements in flexible querying of XML data have motivated recently the introduction of query languages that allow a partial specification of path-patterns in a query. In this paper, we focus on the efficient evaluation of partial path queries, a generalization of path pattern queries. Our approach explicitly deals with repeated labels (that is, multiple occurrences of the same label in a query). We show that partial path queries can be represented as rooted dags for which a topological ordering of the nodes exists. We present three algorithms for the efficient evaluation of these queries under the indexed streaming evaluation model. The first one exploits a structural summary of data to generate a set of path-patterns that together are equivalent to a partial path query. To evaluate these path-patterns, we extend PathStack so that it can work on path-patterns with repeated labels. The second one extracts a spanning tree from the query dag, uses a stack-based algorithm to find the matches of the root-to-leaf paths in the tree, and merge-joins the matches to compute the answer. Finally, the third one exploits multiple pointers of stack entries and a topological ordering of the query dag to apply a stack-based holistic technique. An analysis of the algorithms and extensive experimental evaluation shows that the holistic algorithm outperforms the other ones

Crossref

RMIT Research Repository

DSpace at NTUA

Swinburne Research Bank

Indexing views to route queries in a PDMS

Author: George Kokkinidis
Lefteris Sidirourgos
P. Boncz
Theodore Dalamagas
Timos Sellis
Vassilis Christophides
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

TarBase 6.0: capturing the exponential growth of miRNA targets with experimental support

Author: Alexiou Panagiotis
Dalamagas Theodore
Georgakilas George
Gerangelos Stefanos
Hatzigeorgiou Artemis G.
Koziris Nectarios
Maragkakis Manolis
Reczko Martin
Vergoulis Thanasis
Vlachos Ioannis S.
Publication venue: Oxford University Press
Publication date: 01/01/2012
Field of study

As the relevant literature and the number of experiments increase at a super linear rate, databases that curate and collect experimentally verified microRNA (miRNA) targets have gradually emerged. These databases attempt to provide efficient access to this wealth of experimental data, which is scattered in thousands of manuscripts. Aim of TarBase 6.0 (http://www.microrna.gr/tarbase) is to face this challenge by providing a significant increase of available miRNA targets derived from all contemporary experimental techniques (gene specific and high-throughput), while incorporating a powerful set of tools in a user-friendly interface. TarBase 6.0 hosts detailed information for each miRNA–gene interaction, ranging from miRNA- and gene-related facts to information specific to their interaction, the experimental validation methodologies and their outcomes. All database entries are enriched with function-related data, as well as general information derived from external databases such as UniProt, Ensembl and RefSeq. DIANA microT miRNA target prediction scores and the relevant prediction details are available for each interaction. TarBase 6.0 hosts the largest collection of manually curated experimentally validated miRNA–gene interactions (more than 65 000 targets), presenting a 16.5–175-fold increase over other available manually curated databases

CiteSeerX

PubMed Central

miRGen 2.0: a database of microRNA genomic information and regulation

Author: Artemis G. Hatzigeorgiou
Aslam
Brookes
Corcoran
Fabbri
Fernandez
Gartel
George Prekas
Griffiths-Jones
Ivo Grosse
Karin
Karolchik
Kel
Landgraf
Latronico
Maiese
Marson
Martin Gleditzsch
Megraw
Megraw
Molly Megraw
Nikiforova
Ozsolak
Panagiotis Alexiou
Papadopoulos
Smigielski
Thanasis Vergoulis
Theodore Dalamagas
Timos Sellis
Publication venue: Oxford University Press
Publication date: 01/01/2010
Field of study

MicroRNAs are small, non-protein coding RNA molecules known to regulate the expression of genes by binding to the 3′UTR region of mRNAs. MicroRNAs are produced from longer transcripts which can code for more than one mature miRNAs. miRGen 2.0 is a database that aims to provide comprehensive information about the position of human and mouse microRNA coding transcripts and their regulation by transcription factors, including a unique compilation of both predicted and experimentally supported data. Expression profiles of microRNAs in several tissues and cell lines, single nucleotide polymorphism locations, microRNA target prediction on protein coding genes and mapping of miRNA targets of co-regulated miRNAs on biological pathways are also integrated into the database and user interface. The miRGen database will be continuously maintained and freely available at http://www.microrna.gr/mirgen/