89 research outputs found

    Combining Shapley value and statistics to the analysis of gene expression data in children exposed to air pollution

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>In gene expression analysis, statistical tests for differential gene expression provide lists of candidate genes having, individually, a sufficiently low <it>p</it>-value. However, the interpretation of each single <it>p</it>-value within complex systems involving several interacting genes is problematic. In parallel, in the last sixty years, <it>game theory </it>has been applied to political and social problems to assess the power of interacting agents in forcing a decision and, more recently, to represent the relevance of genes in response to certain conditions.</p> <p>Results</p> <p>In this paper we introduce a Bootstrap procedure to test the null hypothesis that each gene has the same relevance between two conditions, where the relevance is represented by the Shapley value of a particular coalitional game defined on a microarray data-set. This method, which is called <it>Comparative Analysis of Shapley value </it>(shortly, CASh), is applied to data concerning the gene expression in children differentially exposed to air pollution. The results provided by CASh are compared with the results from a parametric statistical test for testing differential gene expression. Both lists of genes provided by CASh and t-test are informative enough to discriminate exposed subjects on the basis of their gene expression profiles. While many genes are selected in common by CASh and the parametric test, it turns out that the biological interpretation of the differences between these two selections is more interesting, suggesting a different interpretation of the main biological pathways in gene expression regulation for exposed individuals. A simulation study suggests that CASh offers more power than t-test for the detection of differential gene expression variability.</p> <p>Conclusion</p> <p>CASh is successfully applied to gene expression analysis of a data-set where the joint expression behavior of genes may be critical to characterize the expression response to air pollution. We demonstrate a synergistic effect between coalitional games and statistics that resulted in a selection of genes with a potential impact in the regulation of complex pathways.</p

    Game Theory applied to gene expression analysis.

    Get PDF
    This is a summary of the author’s Ph.D. thesis supervised by Fioravante Patrone and Stefano Bonassi and defended on 25 May 2006 at the Università degli Studi di Genova. The thesis in written in English and a copy is available from the author upon request. This work deals with the discussion and the application of a methodology based on Game Theory for the analysis of gene expression data. Nowadays, microarray technology is available for taking “pictures” of gene expressions. Within a single experiment of this sophisticated technology, the level of expression of thousands of genes can be estimated in a sample of cells under given conditions. Roughly speaking, the starting point is the observation of a “picture” of gene expressions in a sample of cells under a biological condition of interest, for example a tumor. Then, Game Theory plays a primary role to quantitatively evaluate the relevance of each gene in regulating or provoking the condition of interest, taking into account the observed relationships in all subgroups of genes.Coalitional game; Shapley value; Power index; Gene expression; Microarray;

    Using coalitional games on biological networks to measure centrality and power of genes

    Get PDF
    Abstract Motivation: The interpretation of gene interaction in biological networks generates the need for a meaningful ranking of network elements. Classical centrality analysis ranks network elements according to their importance but may fail to reflect the power of each gene in interaction with the others. Results: We introduce a new approach using coalitional games to evaluate the centrality of genes in networks keeping into account genes' interactions. The Shapley value for coalitional games is used to express the power of each gene in interaction with the others and to stress the centrality of certain hub genes in the regulation of biological pathways of interest. The main improvement of this contribution, with respect to previous applications of game theory to gene expression analysis, consists in a finer resolution of the gene interaction investigated in the model, which is based on pairwise relationships of genes in the network. In addition, the new approach allows for the integration of a priori knowledge about genes playing a key function on a certain biological process. An approximation method for practical computation on large biological networks, together with a comparison with other centrality measures, is also presented. Contact: [email protected]

    Feature Selection via Coalitional Game Theory

    Get PDF
    We present and study the contribution-selection algorithm (CSA), a novel algorithm for feature selection. The algorithm is based on the multiperturbation shapley analysis (MSA), a framework that relies on game theory to estimate usefulness. The algorithm iteratively estimates the usefulness of features and selects them accordingly, using either forward selection or backward elimination. It can optimize various performance measures over unseen data such as accuracy, balanced error rate, and area under receiver-operator-characteristic curve. Empirical comparison with several other existing feature selection methods shows that the backward elimination variant of CSA leads to the most accurate classification results on an array of data sets

    Identification of low intratumoral gene expression heterogeneity in neuroblastic tumors by genome-wide expression analysis and Game Theory

    Get PDF
    BACKGROUND. Neuroblastic tumors (NTs) are largely comprised of neuroblastic (Nb) cells with various quantities of Schwannian stromal (SS) cells. NTs show a variable genetic heterogeneity. NT gene expression profiles reported so far have not taken into account the cellular components. The authors reported the genome-wide expression analysis of whole Minors and microdissected Nb and SS cells. METHODS. The authors analyzed gene expression profiles of 10 stroma-poor NTs (NTs-SP) and 9 stroma-rich NTs (NTS-SR) by microarray technology. Nb and SS cells were. isolated by laser microdissection from NTs-SP and NTs-SR and probed with microarrays. Gene expression data were analyzed by the Significance Analysis of Microarrays (SAM) and Game Theory (GT) methods, the latter applied for the first time to microarray data evaluation. RESULTS. SAM identified 84 genes differentially expressed between NTs-SP and NTs-SR, whereas 50 were found by GT. NTs-SP mainly express genes associated with cell replication, nervous system development, and antiapoptotic pathways, whereas NTs-SR express genes of cell-cell communication and apoptosis. Combining SAM and GT, the authors found 16 common genes driving the separation between NTs-SP and NTs-SR. Five genes overexpressed in NTs-SP encode for nuclear proteins (CENPE, EYA1, PBK TOP2A, TFAP2B), whereas only 1 of 11 highly expressed genes in NTs-SR encodes for a nuclear receptor (NR4A2). CONCLUSIONS. The results showed that NT-SP and NT-SR gene signatures differ for a set of genes involved in distinct pathways, and the authors demonstrated a low intratumoral heterogeneity at the mRNA level in both NTs-SP and NTs-SR. The combination of SAM and GT methods may help to better identify gene expression profiling in NTs

    On stratified sampling for estimating coalitional values

    Get PDF
    This paper addresses two sampling methodologies to respectively estimate the Owen value and the Banzhaf–Owen value for TU-games with a priori unions. Both proposals are based on stratified sampling on the set of those coalitions that are compatible with the system of unions according to their cardinalities. These sampling methodologies are analysed in terms of the theoretical properties and of the establishment of bounds for the absolute error from a statistical point of view. Finally, we evaluate the performance of these tools on several real well-known examples in the literatureThe author acknowledges the financial support of Ministerio de Economía y Competitividad of the Spanish government under grants MTM2017-87197-C3-2-P and PID2021-124030NB-C32, and of Xunta de Galicia through the ERDF (Grupos de Referencia Competitiva) ED431C 2021/24. Open Access funding provided thanks to the CRUE-CSIC agreement with Springer Nature.S

    Risk analysis sampling methods in terrorist networks based on the Banzhaf value

    Get PDF
    This article introduces the Banzhaf and the Banzhaf–Owen values as novel measures of risk analysis of a terrorist attack, determining the most dangerous terrorists in a network. This new approach counts with the advantage of integrating at the same time the complete topology (i.e., nodes and edges) of the network and a coalitional structure on the nodes of the network. More precisely, the characteristics of the nodes (e.g., terrorists) of the network and their possible relationships (e.g., types of communication links), as well as coalitional information (e.g., level of hierarchies) independent of the network. First, for these two new measures of risk analysis, we provide and implement approximation algorithms. Second, as illustration, we rank the members of the Zerkani network, responsible for the attacks in Paris (2015) and Brussels (2016). Finally, we give a comparison between the rankings established by the Banzhaf and the Banzhaf–Owen values as measures of risk analysisMinisterio de Ciencia e Innovación, Grant/Award Numbers: PGC2018-097965-B-I00, PID2021-124030NB-C32; Xunta de Galicia, Grant/Award Number: ED431C 2021/24; Ministerio de Ciencia, Innovación y Universidades, Grant/Award Number: MTM2017-87197-C3-3-PS

    Interpretable machine learning for genomics

    Get PDF
    High-throughput technologies such as next-generation sequencing allow biologists to observe cell function with unprecedented resolution, but the resulting datasets are too large and complicated for humans to understand without the aid of advanced statistical methods. Machine learning (ML) algorithms, which are designed to automatically find patterns in data, are well suited to this task. Yet these models are often so complex as to be opaque, leaving researchers with few clues about underlying mechanisms. Interpretable machine learning (iML) is a burgeoning subdiscipline of computational statistics devoted to making the predictions of ML models more intelligible to end users. This article is a gentle and critical introduction to iML, with an emphasis on genomic applications. I define relevant concepts, motivate leading methodologies, and provide a simple typology of existing approaches. I survey recent examples of iML in genomics, demonstrating how such techniques are increasingly integrated into research workflows. I argue that iML solutions are required to realize the promise of precision medicine. However, several open challenges remain. I examine the limitations of current state-of-the-art tools and propose a number of directions for future research. While the horizon for iML in genomics is wide and bright, continued progress requires close collaboration across disciplines

    Optimization and Allocation in Some Decision Problems with Several Agents or with Stochastic Elements

    Get PDF
    Programa Oficial de Doutoramento en Estatística e Investigación Operativa. 5017V01[Abstract] This dissertation addresses sorne decision problems that arise in project management, cooperative game theory and vehicle route optimization. We start with the problem of allocating the delay costs of a project. In a stochastic context in which we assume that activity durations are random variables, we propose and study an allocation rule based on the Shapley value. In addition, we present an R package that allows a comprehensive control of the project, including the new rule. We propose and characterize new egalitarian solutions in the context of cooperative games with a coalitional structure. Also, using a necessary player property we introduce a new value for cooperative games, which we later extend and characterize within the framework of cooperative games with a coalitional structure. Finally, we present a two-step algorithm for solving multi-compartment vehicle route problems with stochastic demands. This algorithm obtains an initial solution through a constructive heuristic and then uses a tabu search to improve the solution. Using real data, we evaluate the performance of the algorithm.[Resumo] Nesta memoria abórdanse diversos problemas de decisión que xorden na xestión de proxectos, na teoría de xogos cooperativos e na optimización de rutas de vehículos. Empezamos estudando o problema da repartición dos custos de demora nun proxecto. Nun contexto estocástico no que supoñemos que as duracións das actividades son variables aleatorias, propoñemos e estudamos unha regra de repartición baseada no valor de Shapley. Ademais, presentamos un paquete de R que permite un control integral do proxecto, incluíndo a nova regra de repartición. A continuación, propoñemos e caracterizamos axiomaticamente novas solucións igualitarias no contexto dos xogos cooperativos cunha estrutura coalicional. E introducimos un novo valor, utilizando unha propiedade de xogadores necesarios, para xogos cooperativos, que posteriormente estendemos e caracterizamos dentro do marco dos xogos cooperativos cunha estrutura coalicional. Por último, presentamos un algoritmo en dous pasos para resolver problemas de rutas de vehículos con multi-compartimentos e demandas estocásticas. Este algoritmo obtén unha solución inicial mediante unha heurística construtiva e, a continuación, utiliza unha búsqueda tabú para mellorar a solución. Utilizando datos reais, levamos a cabo unha análise do comportamento do algoritmo.[Resumen] En esta memoria se abordan diversos problemas de decisión que surgen en la gestión de proyectos, en la teoría de juegos cooperativos y en la optimización de rutas de vehículos. Empezamos estudiando el problema del reparto de los costes de demora en un proyecto. En un contexto estocástico en el que suponemos que las duraciones de las actividades son variables aleatorias, proponemos y estudiamos una regla de reparto basada en el valor de Shapley. Además, presentamos un paquete de R que permite un control integral del proyecto, incluyendo la nueva regla de reparto. A continuación, proponemos y caracterizamos axiomáticamente nuevas soluciones igualitarias en el contexto de los juegos cooperativos con una estructura coalicional. E introducimos un nuevo valor, utilizando una propiedad de jugadores necesarios, para juegos cooperativos, que posteriormente extendemos y caracterizamos dentro del marco de los juegos cooperativos con una estructura coalicional. Por último, presentamos un algoritmo en dos pasos para resolver problemas de rutas de vehículos con multi-compartimentos y demandas estocásticas. Este algoritmo obtiene una solución inicial mediante una heurística constructiva y, a continuación, utiliza una búsqueda tabú para mejorar la solución. Utilizando datos reales, llevamos a cabo un análisis del comportamiento del algoritmo
    corecore