Search CORE

143 research outputs found

Mathematical programming for piecewise linear regression analysis

Author: Liu S
Papageorgiou LG
Tsoka S
Yang L
Publication venue: PERGAMON-ELSEVIER SCIENCE LTD
Publication date: 01/02/2016
Field of study

In data mining, regression analysis is a computational tool that predicts continuous output variables from a number of independent input variables, by approximating their complex inner relationship. A large number of methods have been successfully proposed, based on various methodologies, including linear regression, support vector regression, neural network, piece-wise regression, etc. In terms of piece-wise regression, the existing methods in literature are usually restricted to problems of very small scale, due to their inherent non-linear nature. In this work, a more efficient piece-wise linear regression method is introduced based on a novel integer linear programming formulation. The proposed method partitions one input variable into multiple mutually exclusive segments, and fits one multivariate linear regression function per segment to minimise the total absolute error. Assuming both the single partition feature and the number of regions are known, the mixed integer linear model is proposed to simultaneously determine the locations of multiple break-points and regression coefficients for each segment. Furthermore, an efficient heuristic procedure is presented to identify the key partition feature and final number of break-points. 7 real world problems covering several application domains have been used to demonstrate the efficiency of our proposed method. It is shown that our proposed piece-wise regression method can be solved to global optimality for datasets of thousands samples, which also consistently achieves higher prediction accuracy than a number of state-of-the-art regression methods. Another advantage of the proposed method is that the learned model can be conveniently expressed as a small number of if-then rules that are easily interpretable. Overall, this work proposes an efficient rule-based multivariate regression method based on piece-wise functions and achieves better prediction performance than state-of-the-arts approaches. This novel method can benefit expert systems in various applications by automatically acquiring knowledge from databases to improve the quality of knowledge base

Crossref

UCL Discovery

King's Research Portal

Link prediction methods and their accuracy for different social networks and network metrics

Author: Cooper C.
Gao F.
Musial Katarzyna
Tsoka S.
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2015
Field of study

Currently, we are experiencing a rapid growth of the number of social–based online systems. The availability of the vast amounts of data gathered in those systems brings new challenges that we face when trying to analyse it. One of the intensively researched topics is the prediction of social connections between users. Although a lot of effort has been made to develop new prediction approaches that could provide a better prediction accuracy in social networked structures extracted from large–scale data about people and their activities and interactions, the existing methods are not comprehensively analysed. Presented in this paper, research focuses on the link prediction problem in which in a systematic way, we investigate the correlation between network metrics and accuracy of different prediction methods. For this study we selected six time–stamped real world social networks and ten most widely used link prediction methods. The results of our experiments show that the performance of some methods have a strong correlation with certain network metrics. We managed to distinguish ’prediction friendly’ networks, for which most of the prediction methods give good performance, as well as ’prediction unfriendly’ networks, for which most of the methods result in high prediction error. The results of the study are a valuable input for development of a new prediction approach which may be for example based on combination of several existing methods. Correlation analysis between network metrics and prediction accuracy of different methods may form the basis of a metalearning system where based on network characteristics and prior knowledge will be able to recommend the right prediction method for a given network at hand

Crossref

Directory of Open Access Journals

Bournemouth University Research Online

King's Research Portal

Paying for Happiness: Experimental Results from a Large Cash Transfer Program in Malawi

Author: Angeles G.
Handa S.
Kilburn K.
Mvula P.
Tsoka M.
Publication venue
Publication date: 01/01/2018
Field of study

This study analyzes the short-term impact of an exogenous, positive income shock on caregivers’ subjective well-being (SWB) in Malawi using panel data from 3,365 households targeted to receive Malawi’s Social Cash Transfer Program that provides unconditional cash to ultra-poor, labor-constrained households. The study consists of a cluster-randomized, longitudinal design. After the baseline survey, half of these village clusters were randomly selected to receive the transfer and a follow-up was conducted 17 months later. We find that the short-term impact of household income increases from the cash transfer leads to substantial SWB gains among caregivers. After a year’s worth of transfers, caregivers in beneficiary households have higher life satisfaction and are more likely to believe in a better future. We examine whether program impacts on consumption, food security, resilience, and hopefulness could explain the increase in SWB but do not find that any of these mechanisms individually mediate our results

Crossref

Carolina Digital Repository

Short-term impacts of an unconditional cash transfer program on child schooling: Experimental evidence from Malawi.

Author: Angeles G.
Handa S.
Kilburn K.
Mvula P.
Tsoka M.
Publication venue
Publication date: 01/01/2017
Field of study

This study analyzes the impact of a positive income shock on child schooling outcomes using experimental data from an unconditional cash transfer program in Malawi. Since households receive the cash and parents are responsible for making spending decisions, we also examine the intervening pathways between cash transfers and child schooling. Data comes from a cluster-randomized study of Malawi’s Social Cash Transfer Program (SCTP). After a baseline survey, households in village clusters were randomly assigned to treatment and control arms with treatment villages receiving transfers immediately and control villages assigned a later entry. We test for treatment impacts on a panel of school-aged children (6–17) using a differences-in-differences model. After a years’ worth of transfers, we find the Malawi SCTP both improves enrollment rates and decreases dropouts. The main intervening pathway between the program and schooling is education expenditures, suggesting that the cash improves the demand for education by reducing financial constraints

Carolina Digital Repository

Cancer Grade Model: a multi-gene machine learning-based risk classification for improving prognosis in breast cancer

Author: Amiri Souri E
Chenoweth A
Cheung A
Karagiannis S N
Tsoka S
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 31/08/2021
Field of study

King's Research Portal

Novel drug-target interactions via link prediction and network embedding

Author: Amiri Souri E
Karagiannis S N
Laddach R
Papageorgiou Lazaros G.
Tsoka S
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 04/04/2022
Field of study

BACKGROUND: As many interactions between the chemical and genomic space remain undiscovered, computational methods able to identify potential drug-target interactions (DTIs) are employed to accelerate drug discovery and reduce the required cost. Predicting new DTIs can leverage drug repurposing by identifying new targets for approved drugs. However, developing an accurate computational framework that can efficiently incorporate chemical and genomic spaces remains extremely demanding. A key issue is that most DTI predictions suffer from the lack of experimentally validated negative interactions or limited availability of target 3D structures. RESULTS: We report DT2Vec, a pipeline for DTI prediction based on graph embedding and gradient boosted tree classification. It maps drug-drug and protein–protein similarity networks to low-dimensional features and the DTI prediction is formulated as binary classification based on a strategy of concatenating the drug and target embedding vectors as input features. DT2Vec was compared with three top-performing graph similarity-based algorithms on a standard benchmark dataset and achieved competitive results. In order to explore credible novel DTIs, the model was applied to data from the ChEMBL repository that contain experimentally validated positive and negative interactions which yield a strong predictive model. Then, the developed model was applied to all possible unknown DTIs to predict new interactions. The applicability of DT2Vec as an effective method for drug repurposing is discussed through case studies and evaluation of some novel DTI predictions is undertaken using molecular docking. CONCLUSIONS: The proposed method was able to integrate and map chemical and genomic space into low-dimensional dense vectors and showed promising results in predicting novel DTIs. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-022-04650-w

PubMed Central

UCL Discovery

King's Research Portal

Detection of Composite Communities in Multiplex Biological Networks

Author: Bennett L
Kittas A
Muirhead G
Papageorgiou LG
Tsoka S
Publication venue: NATURE PUBLISHING GROUP
Publication date: 27/05/2015
Field of study

The detection of community structure is a widely accepted means of investigating the principles governing biological systems. Recent efforts are exploring ways in which multiple data sources can be integrated to generate a more comprehensive model of cellular interactions, leading to the detection of more biologically relevant communities. In this work, we propose a mathematical programming model to cluster multiplex biological networks, i.e. multiple network slices, each with a different interaction type, to determine a single representative partition of composite communities. Our method, known as SimMod, is evaluated through its application to yeast networks of physical, genetic and co-expression interactions. A comparative analysis involving partitions of the individual networks, partitions of aggregated networks and partitions generated by similar methods from the literature highlights the ability of SimMod to identify functionally enriched modules. It is further shown that SimMod offers enhanced results when compared to existing approaches without the need to train on known cellular interactions

UCL Discovery

Genome sequences and great expectations

Author: Andrade M.A.
Audit B.
Iliopoulos I.
Janssen P.
Leroy C.
Ouzounis C.A.
Sander C.
Tramontano A.
Tsoka S.
Valencia A.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2001
Field of study

To assess how automatic function assignment will contribute to genome annotation in the next five years, we have performed an analysis of 31 available genome sequences. An emerging pattern is that function can be predicted for almost two-thirds of the 73,500 genes that were analyzed. Despite progress in computational biology, there will always be a great need for large-scale experimental determination of protein function

MDC Repository

A systems model for immune cell interactions unravels the mechanism of inflammation in human skin

Author: Ainali C.
Clop A.
Hundhausen C.
Kotov N.
Nestle F.
Ouzounis C.
Tsoka S.
Umezawa Y.
Valeyev N.
Williams G.
Publication venue
Publication date: 01/01/2010
Field of study

Inflammation is characterized by altered cytokine levels produced by cell populations in a highly interdependent manner. To elucidate the mechanism of an inflammatory reaction, we have developed a mathematical model for immune cell interactions via the specific, dose-dependent cytokine production rates of cell populations. The model describes the criteria required for normal and pathological immune system responses and suggests that alterations in the cytokine production rates can lead to various stable levels which manifest themselves in different disease phenotypes. The model predicts that pairs of interacting immune cell populations can maintain homeostatic and elevated extracellular cytokine concentration levels, enabling them to operate as an immune system switch. The concept described here is developed in the context of psoriasis, an immune-mediated disease, but it can also offer mechanistic insights into other inflammatory pathologies as it explains how interactions between immune cell populations can lead to disease phenotypes. © 2010 Valeyev et al

Kazan Federal University Digital Repository

Stratification of co-evolving genomic groups using ranked phylogenetic profiles

Author: A Karimpour-Fard
A Muller
A Tsirigos
AC McHardy
AJ Enright
AJ Enright
Assaf Gottlieb
C Nieto
C Ouzounis
CA Ouzounis
Christos A Ouzounis
CM Fraser
DC Krakauer
DP Kreil
Eric Blanc
ES Snitkin
EV Koonin
GS Chang
H Teeling
I Cases
J Reidl
J Wu
L Goldovsky
L Goldovsky
LB Koski
Leon Goldovsky
M Pellegrini
MA Ragan
MR Graham
P Hugenholtz
R Chenna
RJ Case
RL Tatusov
S Cokus
S Freilich
S Garcia-Vallve
S Karlin
S Karlin
S Karlin
S Karlin
S Podell
SA Shelburne
SF Altschul
SG Tringe
Shiri Freilich
Sophia Tsoka
T Abe
TZ DeSantis
V Kunin
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background Previous methods of detecting the taxonomic origins of arbitrary sequence collections, with a significant impact to genome analysis and in particular metagenomics, have primarily focused on compositional features of genomes. The evolutionary patterns of phylogenetic distribution of genes or proteins, represented by phylogenetic profiles, provide an alternative approach for the detection of taxonomic origins, but typically suffer from low accuracy. Herein, we present <it>rank-BLAST</it>, a novel approach for the assignment of protein sequences into genomic groups of the same taxonomic origin, based on the ranking order of phylogenetic profiles of target genes or proteins across the reference database. Results The rank-BLAST approach is validated by computing the phylogenetic profiles of all sequences for five distinct microbial species of varying degrees of phylogenetic proximity, against a reference database of 243 fully sequenced genomes. The approach - a combination of sequence searches, statistical estimation and clustering - analyses the degree of sequence divergence between sets of protein sequences and allows the classification of protein sequences according to the species of origin with high accuracy, allowing taxonomic classification of 64% of the proteins studied. In most cases, a main cluster is detected, representing the corresponding species. Secondary, functionally distinct and species-specific clusters exhibit different patterns of phylogenetic distribution, thus flagging gene groups of interest. Detailed analyses of such cases are provided as examples. Conclusion Our results indicate that the rank-BLAST approach can capture the taxonomic origins of sequence collections in an accurate and efficient manner. The approach can be useful both for the analysis of genome evolution and the detection of species groups in metagenomics samples.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

King's Research Portal