52 research outputs found

    Gestión de volúmenes masivos de datos genéticos y análisis de la influencia de su interacción en el desarrollo de cáncer

    Get PDF
    RESUMEN: Los modelos o patrones en las asociaciones entre una variante genética (o una interacción de estas) y una enfermedad, a pesar de la información que propor-cionan, se han ignorado en casi todos los estudios de asociación del genoma completo. Aunque no todas las variantes genéticas, ni mucho menos todas sus interacciones, presentan un modelo en su relación con la enfermedad, la hipó-tesis de partida de esta tesis doctoral era que no son tan reducidas en número como parece, por lo que su estudio podía dar lugar a la generación de hipótesis biológicas susceptibles de ser comprobadas experimentalmente. Para demos-trarlo, (i) se desarrolló un marco de trabajo que permitió evaluar y comparar los niveles de adecuación e incertidumbre de distintos patrones a volúmenes masivos de variables que analizar simultáneamente; (ii) se diseñó e implemen-tó una prueba estadística en el marco de trabajo anterior que permitió decidir qué modelo genético le correspondía a variantes genéticas e interacciones de estas; y (iii) se confeccionó un protocolo de construcción de redes de interac-ciones con que se analizaron los datos del estudio MCC-Spain. Las asociacio-nes encontradas han podido refrendarse con descubrimientos científicos de los últimos 5 años, lo que pone de manifiesto tanto la viabilidad del método como su potencial para revelar información oculta en las redes de interaccio-nes de variantes genéticas que conducen a la aparición de enfermedades co-munes.ABSTRACT: The presence of a model or a pattern in the association between a genetic vari-ant (or a variant–variant interaction) and a disease, despite the fact that it pro-vides a wealth of information, has been ignored by genome-wide association studies. Although these models do not underlie every variant–disease (let alone every interaction–disease) association, the working hypothesis of this doctoral dissertation, contrary to what intuition would indicate, was that they are abundant, which might give rise to biological hypotheses to be tested ex-perimentally. In order to confirm it, (i) we developed a framework that al-lowed us to evaluate and compare the patterns in massive datasets with vari-ables to be analyzed simultaneously; (ii) we designed and implemented a sta-tistical test that allowed us to decide which genetic model corresponded with each genetic variant and interaction; and (iii) we composed a protocol for gen-erating interaction networks and analyzing the data from the MCC-Spain study. The associations found are supported by scientific discoveries in the past 5 years, which demonstrates both the viability of the method and its abil-ity to reveal the information hidden in variant–variant interaction networks leading to the development of common diseases

    Evaluation of Feature Selection Techniques for Breast Cancer Risk Prediction

    Get PDF
    This study evaluates several feature ranking techniques together with some classifiers based on machine learning to identify relevant factors regarding the probability of contracting breast cancer and improve the performance of risk prediction models for breast cancer in a healthy population. The dataset with 919 cases and 946 controls comes from the MCC-Spain study and includes only environmental and genetic features. Breast cancer is a major public health problem. Our aim is to analyze which factors in the cancer risk prediction model are the most important for breast cancer prediction. Likewise, quantifying the stability of feature selection methods becomes essential before trying to gain insight into the data. This paper assesses several feature selection algorithms in terms of performance for a set of predictive models. Furthermore, their robustness is quantified to analyze both the similarity between the feature selection rankings and their own stability. The ranking provided by the SVM-RFE approach leads to the best performance in terms of the area under the ROC curve (AUC) metric. Top-47 ranked features obtained with this approach fed to the Logistic Regression classifier achieve an AUC = 0.616. This means an improvement of 5.8% in comparison with the full feature set. Furthermore, the SVM-RFE ranking technique turned out to be highly stable (as well as Random Forest), whereas relief and the wrapper approaches are quite unstable. This study demonstrates that the stability and performance of the model should be studied together as Random Forest and SVM-RFE turned out to be the most stable algorithms, but in terms of model performance SVM-RFE outperforms Random Forest.The study was partially funded by the “Accion Transversal del Cancer”, approved on the Spanish Ministry Council on the 11th October 2007, by the Instituto de Salud Carlos III-FEDER (PI08/1770, PI08/0533, PI08/1359, PS09/00773, PS09/01286, PS09/01903, PS09/02078, PS09/01662, PI11/01403, PI11/01889, PI11/00226, PI11/01810, PI11/02213, PI12/00488, PI12/00265, PI12/01270, PI12/00715, PI12/00150), by the Fundación Marqués de Valdecilla (API 10/09), by the ICGC International Cancer Genome Consortium CLL, by the Junta de Castilla y León (LE22A10-2), by the Consejería de Salud of the Junta de Andalucía (PI-0571), by the Conselleria de Sanitat of the Generalitat Valenciana (AP 061/10), by the Recercaixa (2010ACUP 00310), by the Regional Government of the Basque Country by European Commission grants FOOD-CT- 2006-036224- HIWATE, by the Spanish Association Against Cancer (AECC) Scientific Foundation, by the The Catalan Government DURSI grant 2009SGR1489. Samples: Biological samples were stored at the Parc de Salut MAR Biobank (MARBiobanc; Barcelona) which is supported by Instituto de Salud Carlos III FEDER (RD09/0076/00036). Furthermore, at the Public Health Laboratory from Gipuzkoa and the Basque Biobank. Furthermore, sample collection was supported by the Xarxa de Bancs de Tumors de Catalunya sponsored by Pla Director d’Oncologia de Catalunya (XBTC). Biological samples were stored at the “Biobanco La Fe” which is supported by Instituto de Salud Carlos III (RD 09 0076/00021) and FISABIO biobanking, which is supported by Instituto de Salud Carlos III (RD09 0076/00058).S

    Activity in the field of Human-Computer Interaction of a work team integrated in the MCFLAI research group

    Get PDF
    Se presenta la actividad en el ámbito de la Interacción Persona-Ordenador de un equipo de trabajo integrado en el grupo de investigación MCFLAI (Mathematics & Computation: Foundations, Learning, Artificial Intelligence) de la Universidad de CantabriaThe activity in the field of Human-Computer Interaction of a work team integrated in the research group MCFLAI (Mathematics & Computation: Foundations, Learning, Artificial Intelligence) of the University of Cantabria is presented

    QRISK3 performance in the assessment of cardiovascular risk in patients with inflammatory bowel disease

    Get PDF
    Inflammatory bowel disease (IBD) has been described as an independent risk factor for the development of cardiovascular (CV) disease. Since the QRESEARCH risk estimator version 3 (QRISK3) calculator was recently proposed to assess CV in the general population, our objective was to compare the predictive ability of QRISK3 with that of a well-established European CV risk calculator, the Systematic Coronary Risk Assessment (SCORE), to identify the presence of subclinical carotid atherosclerosis in patients with IBD. In all, 186 patients with IBD and 178 controls were recruited. The presence of subclinical atherosclerosis was evaluated by carotid ultrasound to identify carotid plaque and the thickness of the carotid intima-media (cIMT). QRISK3 and SCORE were calculated. The relationship of QRISK3 and SCORE with each other and with the presence of subclinical carotid atherosclerosis (both carotid plaque and cIMT) was studied in patients and controls. SCORE (0.2 (interquartile range 0.1-0.9) vs. 0.4 (0.1-1.4), p = 0.55) and QRISK3 1.7 ((0.6-4.6) vs. 3.0 (1.0-7.8), p = 0.16) absolute values did not differ between patients and controls. QRISK3 and SCORE correlated equally with cIMT within both populations. However, SCORE correlation with cIMT was found to be significantly lower in patients with IBD when compared to controls (Spearman's Rho 0.715 vs. 0.587, p = 0.034). Discrimination analysis of both calculators with carotid plaque was similar within both populations. Nevertheless, in patients with IBD, QRISK3 showed a trend toward a higher discrimination (QRISK3 area under the curve 0.812 (95%CI 0.748-0.875) vs. SCORE 0.790 (95%CI 0.723-0.856), p = 0.051). In conclusion, QRISK3 discrimination for subclinical atherosclerosis is optimal and equivalent to that of SCORE in IBD patients. However, our findings highlight the role of QRISK3 as an appropriate tool for the assessment of CV risk in patients with IBD.Funding: This work was supported by a grant to I.F-A. from the Spanish Ministry of Health, Subdirección General de Evaluación y Fomento de la Investigación, Plan Estatal de Investigación Científica y Técnica y de Innovación 2013–2016, and by Fondo Europeo de Desarrollo Regional-FEDER-(Fondo de Investigaciones Sanitarias, FIS PI14/00394, PI17/00083)

    Risk model for prostate cancer using environmental and genetic factors in the spanish multi-case-control (MCC) study

    Get PDF
    Prostate cancer (PCa) is the second most common cancer among men worldwide. Its etiology remains largely unknown compared to other common cancers. We have developed a risk stratification model combining environmental factors with family history and genetic susceptibility. 818 PCa cases and 1,006 healthy controls were compared. Subjects were interviewed on major lifestyle factors and family history. Fifty-six PCa susceptibility SNPs were genotyped. Risk models based on logistic regression were developed to combine environmental factors, family history and a genetic risk score. In the whole model, compared with subjects with low risk (reference category, decile 1), those carrying an intermediate risk (decile 5) had a 265% increase in PCa risk (OR = 3.65, 95% CI 2.26 to 5.91). The genetic risk score had an area under the ROC curve (AUROC) of 0.66 (95% CI 0.63 to 0.68). When adding the environmental score and family history to the genetic risk score, the AUROC increased by 0.05, reaching 0.71 (95% CI 0.69 to 0.74). Genetic susceptibility has a stronger risk value of the prediction that modifiable risk factors. While the added value of each SNP is small, the combination of 56 SNPs adds to the predictive ability of the risk model

    The Use of Antihypertensive Medication and the Risk of Breast Cancer in a Case-Control Study in a Spanish Population: The MCC-Spain Study

    Get PDF
    The evidence on the relationship between breast cancer and different types of antihypertensive drugs taken for at least 5 years is limited and inconsistent. Furthermore, the debate has recently been fueled again with new data reporting an increased risk of breast cancer among women with a long history of use of antihypertensive drugs compared with nonusers

    Detection of Overlapping Communities in Directed and Weighted Social Networks

    No full text
    RESUMEN:Con la reciente popularidad de los servicios de redes sociales, como Facebook o Twitter, la detección de comunidades se ha convertido en un problema de un interés considerable. A pesar de que se han propuesto decenas de algoritmos que permiten detectar comunidades en redes sociales, solo un reducido subconjunto de estos son capaces de identificar comunidades solapadas, siendo aún menor el número de algoritmos que lo hacen en redes dirigidas y/o ponderadas. Así, este Trabajo Fin de Máster presenta un algoritmo que detecta comunidades solapadas en redes sociales dirigidas y/o ponderadas que, basándose en las ideas de amistad y liderazgo presentes en estas redes, no solo revela las comunidades identificadas, sino que también especifica quiénes son sus líderes. El algoritmo se describe en detalle y sus resultados se comparan con los obtenidos por otros algoritmos de detección de comunidades solapadas destacados en la literatura científica.ABSTRACT:With the recent increasing popularity of social networking services, such as Facebook or Twitter, community detection has become a problem of considerable interest. Although there are more than a hundred algorithms that find communities in social networks, only a few are able to detect overlapping communities, and an even smaller number of them do it in directed and/or weighted networks. For this reason, this Master’s Thesis presents an algorithm that detects overlapping communities in directed and/or weighted social networks, which—based on the ideas of friendship and leadership in these networks—not only revels the communities identified, but also specifies who their leaders are. The algorithm is described in detail and its results are compared with those obtained by prominent overlapping community detection algorithms found in the scientific literature.Máster en Matemáticas y Computació

    Design and implementation of a tool for the comparison of community detection algorthms in graphs

    Get PDF
    RESUMEN: En la última década, la aparición de servicios como Facebook o Twitter ha dado como resultado un renovado interés en el análisis de redes sociales, siendo la detección de comunidades uno de los principales problemas que se han abordado. La detección de comunidades consiste en organizar los vértices de un grafo en grupos densamente conectados entre sí.Apesar de que se han propuesto decenas de algoritmos y varios generadores de grafos para comprobar su eficacia, la prueba de los mismos no ha recibido gran atención en la literatura: ésta suele limitarse a la aplicación del algoritmo propuesto a un conjunto de grafos cuya estructura es conocida de antemano o a la selección de los parámetros de un generador de grafos que permitan obtener redes estructuralmente sencillas. Esto supone un gran problema ya que no se puede afirmar qué método es mejor, por lo que, en la práctica, la elección del algoritmo a usar vendrá determinada por factores que nada tienen que ver con su eficiencia (por ejemplo, su popularidad o la reputación de su autor). Por ello, este Proyecto Fin de Carrera ha diseñado e implementado una aplicación que permite comparar algoritmos de detección de comunidades de manera imparcial. Ésta se ha diseñado de tal manera que los usuarios pueden añadirle algoritmos, generadores de grafos y medidas de evaluación de resultados, para lo cual se ha hecho uso de un lenguaje multiplataforma de propósito general, como Java. La aplicación obtiene los grafos a partir de los generadores suministrados y se los envía a los algoritmos para que éstos le devuelvan la estructura de comunidades detectada. Así, con las medidas de evaluación oportunas, puede determinar qué algoritmo se comporta mejor. Asimismo, se ha implementado implementar un mecanismo para que la ejecución del código de los componentes suministrados sea segura, de manera que un usuario malintencionado no pueda ejecutar código que sea capaz de afectar a la seguridad de la máquina. El diseño e implementación de esta aplicación se han llevado a cabo siguiendo la metodología MÉTRICA en desarrollo orientado a objetos.ABSTRACT: In the last decade, the appearance of services such as Facebook or Twitter has allowed for a renewed interest in social network analysis, being community detection one of the main problems tackled. Community detection consists in organizing the vertices of a graph in groups that permit them to be densely connected between each other. Despite the fact that many different algorithms and graph generators to test the efficiency of those algorithms have been proposed, that testing has not been duly treated in the literature: it usually is limited to the application of the proposed algorithm on a set of graphs whose structure is known in advance or the selection of the parameters of a graph generator that permit obtaining structurally simple networks. This becomes a great problem due to the fact that it cannot be ascertained what method is best, thus in practice choosing what algorithm to use will become conditioned by factors that have nothing to do with its efficiency (e.g., its popularity or the reputation of its author). For this reason, this Final Degree Project designed and implemented an application that permits comparing community detection algorithms in graphs in an impartial fashion. Itwas developed in such a manner that users can add algorithms, graph generators and measures for comparing results, for which a multipurpose and multiplatform programming language was used, in this case Java. The application obtains graphs from the generators included in it and sends them to the algorithms for these to return the community structure detected. Thus, with the appropriate measures, it is able to determine what algorithm does best. Also, a mechanism was implemented so as to ensure that the execution of these modules is safe, so that a malicious user cannot execute code that potentially puts the machine security at risk. The design and implementation of this application were done using the MÉTRICA methodology for object-oriented developments.Ingeniería en Informátic
    corecore