15 research outputs found

    dbstats: una llibreria d'R que implementa els mètodes estadístics basats en distàncies

    Get PDF
    Els mètodes estadístics basats en distàncies són una alternativa o complement a les tècniques clàssiques més habituals en l'estadística com el model lineal, el model lineal generalitzat, o les seves versions no paramètriques. En aquest treball s'explica detalladament en què consisteixen tals mètodes i es desenvolupa una llibreria de R, a la que s'anomenarà "dbstats", que implementa totes aquestes tècniques

    L'encerten els sondejos electorals?

    Get PDF
    Hi ha la sensació que els sondejos electorals que publiquen els diaris s'equivoquen, que mai encerten el que realment es dóna el dia de les eleccions. En aquest treball és volen contrastar aquestes hipòtesis, mitjançant l'anàlisi de dades. S'ha comparat els sondejos que varen publicar els diaris de més tirada del país 7 dies abans del dia clau amb els resultats una setmana després. Es fa l'estudi al Parlament de Catalunya i al Congrés dels Diputats. Hem tingut d'establir una distància per interpretar els errors de predicció. Mitjançant la creació d'uns sondeig teòric s'han pogut avaluar i adjudicar a uns grups orientatius cada un dels sondejos. A més s'ha comprovat com han evolucionat en el temps, si hi ha diaris que fan millors pronòstics que d'altres, i també si es tendeix a sobreestimar a alguna tendència o partit polític en concret. La pregunta és evident: s'equivoquen realment els sondejos electorals?. Sembla que en general es té la impressió de que els sondejos electorals s'equivoquen, de que no encerten mai el que acaba passant. Però probablement aquesta sensació es deguda a que quan s'equivoquen s'en parla molt i quan l'encerten passen desapercebuts. L'objectiu del projecte és comparar el que han dit els sondejos publicats en els diaris de més tirada i comparar-ho amb el que realment ha ocorregut. Caldrà definir una distancia entre les previsions del sondeig i la realitat, veure com evoluciona aquesta distancia,... i el que se'ns ocorri

    Statistical methods for the testing and estimation of linear dependence structures on paired high-dimensional data: application to genomic data

    Get PDF
    This thesis provides novel methodology for statistical analysis of paired high-dimensional genomic data, with the aimto identify gene interactions specific to each group of samples as well as the gene connections that change between the two classes of observations. An example of such groups can be patients under two medical conditions, in which the estimation of gene interaction networks is relevant to biologists as part of discerning gene regulatory mechanisms that control a disease process like, for instance, cancer. We construct these interaction networks fromdata by considering the non-zero structure of correlationmatrices, which measure linear dependence between random variables, and their inversematrices, which are commonly known as precision matrices and determine linear conditional dependence instead. In this regard, we study three statistical problems related to the testing, single estimation and joint estimation of (conditional) dependence structures. Firstly, we develop hypothesis testingmethods to assess the equality of two correlation matrices, and also two correlation sub-matrices, corresponding to two classes of samples, and hence the equality of the underlying gene interaction networks. We consider statistics based on the average of squares, maximum and sum of exceedances of sample correlations, which are suitable for both independent and paired observations. We derive the limiting distributions for the test statistics where possible and, for practical needs, we present a permuted samples based approach to find their corresponding non-parametric distributions. Cases where such hypothesis testing presents enough evidence against the null hypothesis of equality of two correlation matrices give rise to the problem of estimating two correlation (or precision) matrices. However, before that we address the statistical problem of estimating conditional dependence between random variables in a single class of samples when data are high-dimensional, which is the second topic of the thesis. We study the graphical lasso method which employs an L1 penalized likelihood expression to estimate the precision matrix and its underlying non-zero graph structure. The lasso penalization termis given by the L1 normof the precisionmatrix elements scaled by a regularization parameter, which determines the trade-off between sparsity of the graph and fit to the data, and its selection is our main focus of investigation. We propose several procedures to select the regularization parameter in the graphical lasso optimization problem that rely on network characteristics such as clustering or connectivity of the graph. Thirdly, we address the more general problem of estimating two precision matrices that are expected to be similar, when datasets are dependent, focusing on the particular case of paired observations. We propose a new method to estimate these precision matrices simultaneously, a weighted fused graphical lasso estimator. The analogous joint estimation method concerning two regression coefficient matrices, which we call weighted fused regression lasso, is also developed in this thesis under the same paired and high-dimensional setting. The two joint estimators maximize penalized marginal log likelihood functions, which encourage both sparsity and similarity in the estimated matrices, and that are solved using an alternating direction method of multipliers (ADMM) algorithm. Sparsity and similarity of thematrices are determined by two tuning parameters and we propose to choose them by controlling the corresponding average error rates related to the expected number of false positive edges in the estimated conditional dependence networks. These testing and estimation methods are implemented within the R package ldstatsHD, and are applied to a comprehensive range of simulated data sets as well as to high-dimensional real case studies of genomic data. We employ testing approaches with the purpose of discovering pathway lists of genes that present significantly different correlation matrices on healthy and unhealthy (e.g., tumor) samples. Besides, we use hypothesis testing problems on correlation sub-matrices to reduce the number of genes for estimation. The proposed joint estimation methods are then considered to find gene interactions that are common between medical conditions as well as interactions that vary in the presence of unhealthy tissues

    Global and local distance-based generalized linear models

    Get PDF
    This paper introduces local distance-based generalized linear models. These models extend (weighted) distance-based linear models first to the generalized linear model framework. Then, a nonparametric version of these models is proposed by means of local fitting. Distances between individuals are the only predictor information needed to fit these models. Therefore, they are applicable, among others, to mixed (qualitative and quantitative) explanatory variables or when the regressor is of functional type. An implementation is provided by the R package dbstats, which also implements other distance-based prediction methods. Supplementary material for this article is available online, which reproduces all the results of this article

    Local Distance-Based Generalized Linear Models using the dbstats package for R

    Get PDF
    This paper introduces local distance-based generalized linear models. These models extend (weighted) distance-based linear models firstly with the generalized linear model concept, then by localizing. Distances between individuals are the only predictor information needed to fit these models. Therefore they are applicable to mixed (qualitative and quantitative) explanatory variables or when the regressor is of functional type. Models can be fitted and analysed with the R package dbstats, which implements several distancebased prediction methods

    Metabolic rewiring induced by ranolazine improves melanoma responses to targeted therapy and immunotherapy

    Get PDF
    Resistance of melanoma to targeted therapy and immunotherapy is linked to metabolic rewiring. Here, we show that increased fatty acid oxidation (FAO) during prolonged BRAF inhibitor (BRAFi) treatment contributes to acquired therapy resistance in mice. Targeting FAO using the US Food and Drug Administration-approved and European Medicines Agency-approved anti-anginal drug ranolazine (RANO) delays tumour recurrence with acquired BRAFi resistance. Single-cell RNA-sequencing analysis reveals that RANO diminishes the abundance of the therapy-resistant NGFRhi neural crest stem cell subpopulation. Moreover, by rewiring the methionine salvage pathway, RANO enhances melanoma immunogenicity through increased antigen presentation and interferon signalling. Combination of RANO with anti-PD-L1 antibodies strongly improves survival by increasing antitumour immune responses. Altogether, we show that RANO increases the efficacy of targeted melanoma therapy through its effects on FAO and the methionine salvage pathway. Importantly, our study suggests that RANO could sensitize BRAFi-resistant tumours to immunotherapy. Since RANO has very mild side-effects, it might constitute a therapeutic option to improve the two main strategies currently used to treat metastatic melanoma

    dbstats: una llibreria d'R que implementa els mètodes estadístics basats en distàncies

    Get PDF
    Els mètodes estadístics basats en distàncies són una alternativa o complement a les tècniques clàssiques més habituals en l'estadística com el model lineal, el model lineal generalitzat, o les seves versions no paramètriques. En aquest treball s'explica detalladament en què consisteixen tals mètodes i es desenvolupa una llibreria de R, a la que s'anomenarà "dbstats", que implementa totes aquestes tècniques

    L'encerten els sondejos electorals?

    No full text
    Hi ha la sensació que els sondejos electorals que publiquen els diaris s'equivoquen, que mai encerten el que realment es dóna el dia de les eleccions. En aquest treball és volen contrastar aquestes hipòtesis, mitjançant l'anàlisi de dades. S'ha comparat els sondejos que varen publicar els diaris de més tirada del país 7 dies abans del dia clau amb els resultats una setmana després. Es fa l'estudi al Parlament de Catalunya i al Congrés dels Diputats. Hem tingut d'establir una distància per interpretar els errors de predicció. Mitjançant la creació d'uns sondeig teòric s'han pogut avaluar i adjudicar a uns grups orientatius cada un dels sondejos. A més s'ha comprovat com han evolucionat en el temps, si hi ha diaris que fan millors pronòstics que d'altres, i també si es tendeix a sobreestimar a alguna tendència o partit polític en concret. La pregunta és evident: s'equivoquen realment els sondejos electorals?. Sembla que en general es té la impressió de que els sondejos electorals s'equivoquen, de que no encerten mai el que acaba passant. Però probablement aquesta sensació es deguda a que quan s'equivoquen s'en parla molt i quan l'encerten passen desapercebuts. L'objectiu del projecte és comparar el que han dit els sondejos publicats en els diaris de més tirada i comparar-ho amb el que realment ha ocorregut. Caldrà definir una distancia entre les previsions del sondeig i la realitat, veure com evoluciona aquesta distancia,... i el que se'ns ocorri

    dbstats: una llibreria d'R que implementa els mètodes estadístics basats en distàncies

    No full text
    Els mètodes estadístics basats en distàncies són una alternativa o complement a les tècniques clàssiques més habituals en l'estadística com el model lineal, el model lineal generalitzat, o les seves versions no paramètriques. En aquest treball s'explica detalladament en què consisteixen tals mètodes i es desenvolupa una llibreria de R, a la que s'anomenarà "dbstats", que implementa totes aquestes tècniques

    Roastgsa: a comparison of rotation-based scores for gene set enrichment analysis

    No full text
    Abstract Background Gene-wise differential expression is usually the first major step in the statistical analysis of high-throughput data obtained from techniques such as microarrays or RNA-sequencing. The analysis at gene level is often complemented by interrogating the data in a broader biological context that considers as unit of measure groups of genes that may have a common function or biological trait. Among the vast number of publications about gene set analysis (GSA), the rotation test for gene set analysis, also referred to as roast, is a general sample randomization approach that maintains the integrity of the intra-gene set correlation structure in defining the null distribution of the test. Results We present roastgsa, an R package that contains several enrichment score functions that feed the roast algorithm for hypothesis testing. These implemented methods are evaluated using both simulated and benchmarking data in microarray and RNA-seq datasets. We find that computationally intensive measures based on Kolmogorov-Smirnov (KS) statistics fail to improve the rates of simpler measures of GSA like mean and maxmean scores. We also show the importance of accounting for the gene linear dependence structure of the testing set, which is linked to the loss of effective signature size. Complete graphical representation of the results, including an approximation for the effective signature size, can be obtained as part of the roastgsa output. Conclusions We encourage the usage of the absmean (non-directional), mean (directional) and maxmean (directional) scores for roast GSA analysis as these are simple measures of enrichment that have presented dominant results in all provided analyses in comparison to the more complex KS measures
    corecore