Search CORE

128 research outputs found

Inferring Causal Direction from Observational Data: A Complexity Approach

Author: Nikolaou N
Sechidis K
Publication venue: PharML 2020
Publication date: 01/01/2020
Field of study

At the heart of causal structure learning from observational data lies a deceivingly simple question: given two statistically dependent random variables, which one has a causal effect on the other? This is impossible to answer using statistical dependence testing alone and requires that we make additional assumptions. We propose several fast and simple criteria for distinguishing cause and effect in pairs of discrete or continuous random variables. The intuition behind them is that predicting the effect variable using the cause variable should be ‘simpler’ than the reverse – different notions of ‘simplicity’ giving rise to different criteria. We demonstrate the accuracy of the criteria on synthetic data generated under a broad family of causal mechanisms and types of noise

UCL Discovery

Statistical Hypothesis Testing in Positive Unlabelled Data

Author: Brown Gavin
Calvo Borja
Sechidis Konstantinos
Publication venue
Publication date: 01/09/2014
Field of study

We propose a set of novel methodologies which enable valid statistical hypothesis testing when we have only positive and unlabelled (PU) examples. This type of problem, a special case of semi-supervised data, is common in text mining, bioinformatics, and computer vision. Focusing on a generalised likelihood ratio test, we have 3 key contributions: (1) a proof that assuming all unlabelled examples are negative cases is sufficient for independence testing, but not for power analysis activities; (2) a new methodology that compensates this and enables power analysis, allowing sample size determination for observing an effect with a desired power; and finally, (3) a new capability, supervision determination, which can determine a-priori the number of labelled examples the user must collect before being able to observe a desired statistical effect. Beyond general hypothesis testing, we suggest the tools will additionally be useful for information theoretic feature selection, and Bayesian Network structure learning

The University of Manchester - Institutional Repository

Information theoretic feature selection in multi-label data through composite likelihood

Author: Brown Gavin
Nikolaou Nikolaos
Sechidis Konstantinos
Publication venue
Publication date: 01/08/2014
Field of study

In this paper we present a framework to unify information theoretic feature selection criteria for multi-label data. Our framework combines two different ideas; expressing multi-label decomposition methods as composite likelihoods and then showing how feature selection criteria can be derived by maximizing these likelihood expressions. Many existing criteria, until now proposed as heuristics, can be reproduced from a single basis under the proposed framework. Furthermore we can derive new problem-specific criteria by making different independence assumptions over the feature and label spaces. One such derived criterion is shown experimentally to outperform other approaches proposed in the literature on real-world datasets

The University of Manchester - Institutional Repository

Simultaneous prediction of four ATP-binding cassette transporters' substrates using multi-label QSAR

Author: Bentz
Bolón-Canedo
Broccatelli
Demel
Desai
Dragos
Eriksson
Ganta
Gibaja
Gupta
Iyer
Krein
Legrand
Lu-Emerson
Luaces
Maaten
Maggiora
Mak
Marquez
Matsson
Mittal
Montanari
Newby
Oberoi
Pinto
Read
Saeys
Sechidis
Sedykh
Shahlaei
Spolaôr
Stumpfe
Sushko
Sushko
Szakács
Tetko
Tetko
Tsaioun
Tsoumakas
Tsoumakas
Vastag
Wassermann
Wind
Zhang
Publication venue: 'Wiley'
Publication date: 01/09/2016
Field of study

Efflux by the ATP-binding cassette (ABC) transporters affects the pharmacokinetic profile of drugs and it has been implicated in drug-drug interactions as well as its major role in multi-drug resistance in cancer. It is therefore important for the pharmaceutical industry to be able to understand what phenomena rule ABC substrate recognition. Considering a high degree of substrate overlap between various members of ABC transporter family, it is advantageous to employ a multi-label classification approach where predictions made for one transporter can be used for modeling of the other ABC transporters. Here, we present decision tree-based QSAR classification models able to simultaneously predict substrates and non-substrates for BCRP1, P-gp/MDR1 and MRP1 and MRP2, using a dataset of 1493 compounds. To this end, two multi-label classification QSAR modelling approaches were adopted: Binary Relevance (BR) and Classifier Chain (CC). Even though both multi-label models yielded similar predictive performances in terms of overall accuracies (close to 70), the CC model overcame the problem of skewed performance towards identifying substrates compared with non-substrates, which is a common problem in the literature. The models were thoroughly validated by using external testing, applicability domain and activity cliffs characterization. In conclusion, a multi-label classification approach is an appropriate alternative for the prediction of ABC efflux. Â© 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

Crossref

Kent Academic Repository

Sussex Research Online

Information theoretic feature selection in multi-label data through composite likelihood

Author: Brown Gavin
Nikolaou Nikolaos
Sechidis Konstantinos
Publication venue
Publication date: 01/01/2014
Field of study

Crossref

The University of Manchester - Institutional Repository

Insights into distributed feature ranking

Author: Alonso-Betanzos Amparo
Bolon-Canedo Veronica
Brown Gavin
Sanchez-Marono Noelia
Sechidis Konstantinos
Publication venue
Publication date: 01/01/2018
Field of study

This version of the article: Bolón-Canedo, V., Sechidis, K., Sánchez-Maroño, N., Alonso-Betanzos, A., & Brown, G. (2019). ‘Insights into distributed feature ranking’ has been accepted for publication in: Information Sciences, 496, 378–398. The Version of Record is available online at https://doi.org/10.1016/j.ins.2018.09.045.[Abstract]: In an era in which the volume and complexity of datasets is continuously growing, feature selection techniques have become indispensable to extract useful information from huge amounts of data. However, existing algorithms may not scale well when dealing with huge datasets, and a possible solution is to distribute the data in several nodes. In this work we explore the different ways of distributing the data (by features and by samples) and we evaluate to what extent it is possible to obtain similar results as those obtained with the whole dataset. Trying to deal with the challenge of distributing the feature ranking process, we have performed experiments with different aggregation methods and feature rankers, and also evaluated the effect of distributing the feature ranking process in the subsequent classification performance.This research has been economically supported in part by the Spanish Ministerio de Economía y Competitividad and FEDER funds of the European Union through the research project TIN2015-65069-C2-1-R; and by the Consellería de Industria of the Xunta de Galicia through the research project GRC2014/035. Financial support from the Xunta de Galicia (Centro singular de investigación de Galicia accreditation 2016-2019) and the European Union (European Regional Development Fund - ERDF), is gratefully acknowledged (research project ED431G/01). V. Bolón-Canedo acknowledges support of the Xunta de Galicia under postdoctoral Grant code ED481B 2014/164-0.Xunta de Galicia; GRC2014/035Xunta de Galicia; ED431G/01Xunta de Galicia; ED481B 2014/164-

Repositorio da Universidade da Coruña

The University of Manchester - Institutional Repository

Feature selection with limited bit depth mutual information for portable embedded systems

Author: Alonso-Betanzos Amparo
Bolón-Canedo Verónica
Brown Gavin
Morán-Fernández Laura
Sechidis Konstantinos
Publication venue: Elsevier
Publication date: 01/06/2020
Field of study

This version of the article: Morán-Fernández, L., Sechidis, K., Bolón-Canedo, V., Alonso-Betanzos, A., & Brown, G. (2020). ‘Feature selection with limited bit depth mutual information for portable embedded systems’ has been accepted for publication in: Knowledge-Based Systems, 197, 105885. The Version of Record is available online at https://doi.org/10.1016/j.knosys.2020.105885.[Abstract]: Since wearable computing systems have grown in importance in the last years, there is an increased interest in implementing machine learning algorithms with reduced precision parameters/computations. Not only learning, also feature selection, most of the times a mandatory preprocessing step in machine learning, is often constrained by the available computational resources. This work considers mutual information – one of the most common measures of dependence used in feature selection algorithms – with a limited number of bits. In order to test the procedure designed, we have implemented it in several well-known feature selection algorithms. Experimental results over several synthetic and real datasets demonstrate that low bit representations are sufficient to achieve performances close to that of double precision parameters and thus open the door for the use of feature selection in embedded platforms that minimize the energy consumption and carbon emissions.This research has been ﬁnancially supported in part by the Spanish Ministerio de Economía y Competitividad (research project TIN2015-65069-C2-1-R), by European Union FEDER funds and by the Consellería de Industria of the Xunta de Galicia (research project GRC2014 /035). Financial sup-port from the Xunta de Galicia (Centro singular de investigación de Galicia accreditation 2016-2019) and the European Union (European Regional Development Fund - ERDF), is gratefully acknowledged (research project ED431G/01). Project supported by a 2018 Leonardo Grant for Researchers and Cultural Creators, BBVA Foundation. Laura Morán-Fernández acknowledges predoctoral stay grant by INDITEX-UDC 2015.Xunta de Galicia; ED431G/01Xunta de Galicia; GRC2014 /03

Repositorio da Universidade da Coruña

WATCH: A Workflow to Assess Treatment Effect Heterogeneity in Drug Development for Clinical Trial Sponsors

Author: Baillie Mark
Bornkamp Björn
Chen Yao
Hemmings Rob
Lu Jiarui
Ohlssen David
Ruberg Stephen
Sechidis Konstantinos
Sun Sophie
Vandemeulebroecke Marc
Zang Cong
Publication venue
Publication date: 01/05/2024
Field of study

This paper proposes a Workflow for Assessing Treatment effeCt Heterogeneity (WATCH) in clinical drug development targeted at clinical trial sponsors. The workflow is designed to address the challenges of investigating treatment effect heterogeneity (TEH) in randomized clinical trials, where sample size and multiplicity limit the reliability of findings. The proposed workflow includes four steps: Analysis Planning, Initial Data Analysis and Analysis Dataset Creation, TEH Exploration, and Multidisciplinary Assessment. The workflow aims to provide a systematic approach to explore treatment effect heterogeneity in the exploratory setting, taking into account external evidence and best scientific understanding

arXiv.org e-Print Archive

E-Learning & Environmental Policy: The case of a politico-administrative GIS

Author: Hasanagas Nikolaos D.
Papadopoulou Eleni I.
Sechidis Lazaros A.
Styliadis Athanasios D.
Publication venue: Agora University Press
Publication date: 01/11/2010
Field of study

Is an effective knowledge exchange and cooperation between academic community and practitioners possible? Implementation of e-learning in specialized policy fields pertains to the most challenging priorities of ICTs and software engineering. In multidisciplinary academic areas which combine environmental policy studies with positivist subjects (like environmental issues, forest policy, rural development, Landscape Architecture etc), the using of e-learning system in analyzing policy issues steadily gains in importance and is a method which connects the academic community and the researchers with the practitioners and field experts. Such initiatives incorporate a number of politometrics- relevant algorithms embedded in a context of political geography (i.e. visualized hierarchies in different regionrelated policy issues). This is the case addressed in this paper. The GIS learning management system introduced in this paper is based on certain criteria concerning organizational models and region-specific politico-administrative hierarchies. Scenarios of politico-administrative metadata achieving optimal power synergy are extracted through a sequencing technique, combining vector-algebra software and statistics and can be used for both teaching and research purposes

Agora University Editing House: Journals

Automated Selection and Configuration of Multi-Label Classification Algorithms with Grammar-Based Genetic Programming

Author: AGC Sá de
F Pedregosa
G Tsoumakas
I Witten
J Demšar
J Read
J Read
K Sechidis
R Mckay
T Křen
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 21/08/2018
Field of study

This paper proposes Auto-MEKAGGP, an Automated Machine Learning (Auto-ML) method for Multi-Label Classification (MLC) based on the MEKA tool, which offers a number of MLC algorithms. In MLC, each example can be associated with one or more class labels, making MLC problems harder than conventional (single-label) classification problems. Hence, it is essential to select an MLC algorithm and its configuration tailored (optimized) for the input dataset. Auto-MEKAGGP addresses this problem with two key ideas. First, a large number of choices of MLC algorithms and configurations from MEKA are represented into a grammar. Second, our proposed Grammar-based Genetic Programming (GGP) method uses that grammar to search for the best MLC algorithm and configuration for the input dataset. Auto-MEKAGGP was tested in 10 datasets and compared to two well-known MLC methods, namely Binary Relevance and Classifier Chain, and also compared to GA-AutoMLC, a genetic algorithm we recently proposed for the same task. Two versions of Auto-MEKAGGP were tested: a full version with the proposed grammar, and a simplified version where the grammar includes only the algorithmic components used by GA-Auto-MLC. Overall, the full version of Auto-MEKAGGP achieved the best predictive accuracy among all five evaluated methods, being the winner in six out of the 10 datasets

Crossref

Kent Academic Repository