56 research outputs found

    Study of digital signal processing tools to infer gene regulatory networks from microarrays

    Get PDF
    [ANGLÈS] Since the mid-1990's, the field of genomic signal processing has exploded due to the development of DNA microarray technology, which made possible the measurement of mRNA expression of thousands of genes in parallel. Researchers had developed a vast body of knowledge in classification methods. The scientific community has developed a broad knowledge of the individual parts involved in the operation of a cell, but we still do not understand how these individual parts interact. For this reason a new type of analysis of the microarray data called Pathways analysis has been developed. This approach considers that genes work together in cascades and do not act for themselves in a biological system. The activity of the genes in a cell is controlled by the gene regulatory networks, which consist of the union and interconnection of the various pathways. This thesis is placed in the field of computer systems and signal processing applied to biology and aims to study and develop methods to infer the relationship of genes in a large-scale gene network topology where regulation is not known, and must be inferred from experimental data. First, we present a review and a comparison of the different methods in the state of the art that have tried to solve this challenge with different approaches: Gene networks based in co-expression, information-theoretic approach, bayesian networks, and finally the one based on differential equations. Secondly, we present an exhaustive study of two selected techniques, the Z-score and Zavlanos algorithms, in order to analyze their strengths and drawbacks. The chosen methods have been tested on two public datasets: the SOS pathway and a synthetic dataset simulated by computer. The proposed approach obtains good identification results, confirming the goodness of the approach. And finally, we present an analysis of the ability of the inferred network to predict the behavior of the system to an external perturbation. Also a new approach to boost the identification performance is presented. It is based on an ensemble decision paradigm. It is a preliminary idea but even though, we have found some promising results that demonstrate the potential of the approach.[CASTELLÀ] Desde mediados de los noventa, el campo de la genómica fue revolucionado debido al desarrollo de la tecnología de los DNA microarrays, el cual hizo posible la medición de la expresión de mRNA de miles de genes en paralelo. Los investigadores han desarrollado un vasto conocimiento en los métodos de clasificación. Y aunque la comunidad científica tiene un amplio conocimiento de las distintas partes implicadas en el funcionamiento de una célula, todavía no han logrado entender cómo estas partes individuales interactúan. Por esta razón, un nuevo tipo de análisis de los datos de microarrays llamado análisis de rutas metabólicas se está desarrollando. Este enfoque considera que los genes trabajan conjuntamente y que no actúan por sí mismos en un sistema biológico. La actividad de los genes en una célula está controlada por las redes reguladoras de genes, que consisten en la unión y la interconexión de las diversas rutas metabólicas. Esta tesis se sitúa en el campo del procesamiento de señal aplicada a la biología y tiene como objetivo estudiar y desarrollar métodos para inferir la relación de los genes en una topología de genes a gran escala donde la regulación es desconocida, y debe ser inferida a partir de datos experimentales. En primer lugar, se presenta una revisión y una comparación de los diferentes métodos en el estado del arte, que han tratado de resolver este problema con diferentes enfoques: las redes de genes basadas en la co-expresión, la teoría de la información, las redes bayesianas, y finalmente uno basado en ecuaciones diferenciales. En segundo lugar, se presenta un estudio exhaustivo de las dos técnicas seleccionadas, los algoritmos Z-score y de Zavlanos, con el fin de analizar sus puntos fuertes y débiles. Los métodos elegidos han sido probados en dos conjuntos de datos públicos: el SOS pathway y un conjunto de datos sintéticos simulados por ordenador. El método propuesto permite obtener buenos resultados de identificación, lo que confirma la bondad del enfoque escogido. Y, por último, se presenta un análisis de la capacidad para predecir el comportamiento del sistema ante una perturbación externa de la red inferida. Además, se aplica un nuevo enfoque para mejorar la identificación. Está basado en un paradigma de decisión conjunta. Es una idea preliminar, pero a pesar de ello, se han encontrado algunos resultados prometedores que demuestran el potencial de este enfoque.[CATALÀ] Des de mitjans dels anys noranta, el camp de la genòmica va ser revolucionat gràcies al desenvolupament de la tecnologia dels DNA microarrays, la qual va fer possible el mesurament de l'expressió de mRNA de milers de gens en paral·lel. Els investigadors han desenvolupat un vast coneixement en els mètodes de classificació i encara que la comunitat científica té un ampli coneixement de les diferents parts implicades en el funcionament d'una cèl·lula, encara no han aconseguit entendre com aquestes parts individuals interactuen. Per això, un nou tipus d'anàlisi de les dades de microarrays anomenat anàlisi de rutes metabòliques s'està desenvolupant. Aquesta tècnica considera que els gens treballen conjuntament i que no actuen per si mateixos a un sistema biològic. L'activitat dels gens en una cèl·lula està controlada per les xarxes reguladores de gens, que consisteixen en la unió i la interconnexió de les diverses rutes metabòliques. Aquesta tesi se situa en el camp de la processament del senyal aplicat a la biologia i té com a objectiu estudiar i desenvolupar mètodes per inferir la relació dels gens en una topologia de gens a gran escala on la regulació és desconeguda, i ha de ser inferida a partir de dades experimentals. En primer lloc, es presenta una revisió i una comparació dels diferents mètodes presents a l'estat de l'art, que han tractat de resoldre aquest problema amb diferents enfocaments: les xarxes de gens basats en la coexpressió, la teoria de la informació, les xarxes bayesianes, i finalment un basat en equacions diferencials. En segon lloc, es presenta un estudi exhaustiu de les dues tècniques seleccionades, els algoritmes Z-score i de Zavlanos, amb la finalitat d'analitzar els seus punts forts i febles. Els mètodes escollits han estat testats amb dos conjunts de dades públiques: el SOS Pathway i un conjunt de dades sintètiques simulades per ordinador. El mètode proposat permet obtindre bons resultats d'identificació, el que confirma la bondat de la tècnica escollida. I, finalment, es presenta una anàlisi de la capacitat de predir el comportament del sistema davant d'una pertorbació externa de la xarxa inferida. A més, es presenta una nova tècnica per millorar la identificació. Es basa en un paradigma de decisió conjunta. És una idea preliminar, però tot i així, s'han trobat alguns resultats prometedors que demostren el potencial de la idea

    Study of gene expression representation with Treelets and hierarchical clustering algorithms

    Get PDF
    English: Since the mid-1990's, the field of genomic signal processing has exploded due to the development of DNA microarray technology, which made possible the measurement of mRNA expression of thousands of genes in parallel. Researchers had developed a vast body of knowledge in classification methods. However, microarray data is characterized by extremely high dimensionality and comparatively small number of data points. This makes microarray data analysis quite unique. In this work we have developed various hierarchical clustering algorthims in order to improve the microarray classification task. At first, the original feature set of gene expression values are enriched with new features that are linear combinations of the original ones. These new features are called metagenes and are produced by different proposed hierarchical clustering algorithms. In order to prove the utility of this methodology to classify microarray datasets the building of a reliable classifier via feature selection process is introduced. This methodology has been tested on three public cancer datasets: Colon, Leukemia and Lymphoma. The proposed method has obtained better classification results than if this enhancement is not performed. Confirming the utility of the metagenes generation to improve the final classifier. Secondly, a new technique has been developed in order to use the hierarchical clustering to perform a reduction on the huge microarray datasets, removing the initial genes that will not be relevant for the cancer classification task. The experimental results of this method are also presented and analyzed when it is applied to one public database demonstrating the utility of this new approach.Castellano: Desde finales de la década de los años 90, el campo de la genómica fue revolucionado debido al desarrollo de la tecnología de los DNA microarrays. Con ésta técnica es posible medir la expresión de los mRNA de miles de genes en paralelo. Los investigadores han desarrollado un vasto conocimiento en los métodos de clasificación. Sin embargo, los microarrays están caracterizados por tener un alto número de genes y un número de muestras comparativamente pequeño. Éste hecho convierte al estudio de los microarrays en único. En éste trabajo se ha desarrollado diversos algoritmos de agrupación jerárquica para mejorar la clasificación de los microarrays. La primera y gran aplicación ha sido el enriquecimiento de las bases de datos originales mediante la introducción de nuevos elementos que son obtenidos como combinaciones lineales los genes originales. Estos nuevos elementos se han denominado metagenes y son producidos mediante los diferentes algoritmos propuestos de agrupación jerárquica. A fin de demostrar la utilidad de esta metodología para clasificar las bases de datos de microarrays se ha introducido la construcción de un clasificador fiable a través de un proceso de selección de características. Esta metodología ha sido probada en tres bases de datos de cáncer públicas: Colon, Leucemia y Linfoma. El método propuesto ha obtenido mejores resultados en la clasificación que cuando éste enriquecimiento no se ha llevado a cabo. De ésta manera se ha confirmado la utilidad de la generación de los metagenes para mejorar el clasificador. En segundo lugar, se ha desarrollado una nueva técnica para realizar una reducción inicial en las bases de datos, consistente en eliminar los genes que no son relevantes para realizar la clasificación. Éste método se ha aplicado a una de las bases de datos públicas, y los resultados experimentales se presentan y analizan demostrando la utilidad de éste nuevo enfoque.Català: Des de finals de la dècada dels 90, el camp de la genómica va ser revolucionat gràcies al desenvolupament de la tecnología dels DNA microarrays. Amb aquesta tècnica es possible mesurar l'expresió dels mRNA de milers de gens en paralel. Els investigadors han desenvolupat un ample coneixement dels mètodes de classificació. No obstant, els microarrays estàn caracteritzats per tindre una alt nombre de genes i comparativament un nombre petit de mostres. Aquest fet fa que l'estudi dels microarrays sigui únic. Amb aquest treball s' han desenvolupat diversos algoritmes d'agrupació jeràrquica per millorar la classificació dels microarrays. La primera i gran aplicació ha sigut l'enriqueiment de les bases de dades originals mitjançant l'introducció de nous elements que s'obtenen com combinacions lineals dels gens originals. Aquests nous elements han sigut denominats com metagens i són calculats mitjantçant els diferents algoritmes d'agrupació jerárquica proposats. Per a demostrar l'utilitat d'aquesta metodología per a classificar les bases de dades de microarrays s'ha introduït la construcció d'un classificador fiable mitjantçant un procés de selecció de característiques. Aquesta metodología ha sigut aplicada a tres bases de dades públiques de càncer: Colon, Leucèmia i Limfoma. El métode proposat ha obtenigut millors resultats en la classificació que quan aquest enriqueiment no ha sigut realitzat. D'aquesta manera s'ha confirmat l'utilitat de la generació dels metagens per a millorar els classificadors. En segon lloc, s'ha desenvolupat una nova técnica per a realitzar una reducció inicial en les bases de dades, aquest mètode consisteix en l'eliminació dels gens que no són relevants a l'hora de realitzar la classificació dels pacients. Aquest mètode ha sigut aplicat a una de les bases de dades públiques. Els resultats experimentals es presenten i analitzen demostrant l'utilitat d'aquesta nova tècnica

    Study of gene regulatory networks inference methods from gene expression data

    Get PDF
    A cell is a the basic structural and functional unit of every living thing, it is protein-based an that regulates itself. The cell eats to stay alive, it grows and develops; reacting to the environment, while subjected to evolution. It also makes copies of itself. These processes are governed by chain of chemical reactions, creating a complex system. The scientific community has proposed to model the whole process with Gene Regulatory Networks (GRN). The understanding of these networks allows gaining a systems-level acknowledgment of biological organisms and also to genetically related diseases. This thesis focused on network inference from gene expression data, will contribute to this field of knowledge by studying different techniques that allows a better reconstruction of GRN. Gene expression datasets, are characterised by having thousands of noisy variables measured only with tens of samples. Moreover, these variables presents non-linear dependencies between them. Therefore, recovering a model that is capable of capturing the relationships contained in this data, constitutes a major challenge. The main contribution of this thesis is a set of fair and sound studies of different GRN inference methods and post-processing algorithms. First, we present a novel approach for inferring gene networks and we compare it with other methods. It is inspired by the concept of "variable importance" in feature selection. However, many algorithms can be proposed to infer GRNs, so there is a need to assess the quality of these algorithms. Secondly, and motivated by the fact that the previous comparison was not informative enough, we introduce a new framework for in silico performance assessment of GRN inference methods. This work has led to an open source R/Bioconductor package called NetBenchmark. Finally, and thanks to this tool we have corroborated that inferring gene regulatory networks from expression data is a tough problem. The different algorithms have some particular biases and strengths, and none of them is the best across all types of data and datasets. Therefore, we present a framework for evaluating and standardising network consensus methods to aggregate various network inferencesUna célula es es la unidad estructural y funcional básica de todo ser viviente capaz de autoregularse mediante proteínas. La célula come para mantenerse viva, crece y se desarrolla; Reaccionando al medio ambiente y está sometida a la evolución. También hace copias de sí misma. Estos procesos se rigen por una cadena de reacciones químicas, creando un sistema complejo. La comunidad científica ha propuesto modelar todo el proceso con las redes reguladoras de genes (GRN). La comprensión de estas redes permite entender los sistemas de los organismos biológicos y también las enfermedades genéticas. Esta tesis se centra en la inferencia de GRN a partir de datos de expresión génica, contribuye a este campo de conocimiento mediante el estudio de diferentes técnicas que permiten una mejor reconstrucción de GRN. Los conjuntos de datos de expresión génica se caracterizan por tener miles de variables ruidosas de las que sólo se disponen decenas de muestras. Además, estas variables presentan dependencias no lineales entre ellas. Por lo tanto, recuperar un modelo capaz de capturar las relaciones contenidas en estos datos, constituye un reto importante. La principal contribución de esta tesis es un conjunto de estudios de los diferentes métodos de inferencia de GRN y algoritmos de posprocesamiento. En primer lugar, presentamos un nuevo enfoque para inferir redes de genes y lo comparamos con otros métodos del estado del arte. Se inspira en el concepto de "importancia de variable" propio de la selección de características (feature selection). Sin embargo, muchos algoritmos pueden ser propuestos para inferir GRNs, por lo que hay una necesidad de evaluar la calidad de estos algoritmos. En segundo lugar, y motivado por el hecho de que la comparación anterior no era lo suficientemente informativa, introducimos un nuevo marco para la evaluación en bases de datos sintéticas de los métodos de inferencia GRN. Este trabajo ha llevado a un paquete de código abierto de R / Bioconductor llamado NetBenchmark. Finalmente, y gracias a esta herramienta hemos corroborado que inferir las redes reguladoras de genes a partir de los datos de expresión es un problema difícil. Los diferentes algoritmos tienen algunos sesgos y fortalezas particulares, y ninguno de ellos es el mejor en todos los tipos de datos y conjuntos de datos. Por lo tanto, presentamos un marco para evaluar y estandarizar los métodos de consenso de redes para agregar varias inferencias de red.Postprint (published version

    Study of meta-analysis strategies for network inference using information-theoretic approaches

    Get PDF
    © 2017 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.Reverse engineering of gene regulatory networks (GRNs) from gene expression data is a classical challenge in systems biology. Thanks to high-throughput technologies, a massive amount of gene-expression data has been accumulated in the public repositories. Modelling GRNs from multiple experiments (also called integrative analysis) has; therefore, naturally become a standard procedure in modern computational biology. Indeed, such analysis is usually more robust than the traditional approaches focused on individual datasets, which typically suffer from some experimental bias and a small number of samples. To date, there are mainly two strategies for the problem of interest: the first one (”data merging”) merges all datasets together and then infers a GRN whereas the other (”networks ensemble”) infers GRNs from every dataset separately and then aggregates them using some ensemble rules (such as ranksum or weightsum). Unfortunately, a thorough comparison of these two approaches is lacking. In this paper, we evaluate the performances of various metaanalysis approaches mentioned above with a systematic set of experiments based on in silico benchmarks. Furthermore, we present a new meta-analysis approach for inferring GRNs from multiple studies. Our proposed approach, adapted to methods based on pairwise measures such as correlation or mutual information, consists of two steps: aggregating matrices of the pairwise measures from every dataset followed by extracting the network from the meta-matrix.Peer ReviewedPostprint (author's final draft

    Validation of the Maslach Burnout Inventory-Human Services Survey for Estimating Burnout in Dental Students.

    Get PDF
    The aim of this study was to examine the validity and reliability of the Maslach Burnout Inventory-Human Services Survey (MBI-HSS) as a tool for assessing the prevalence and level of burnout in dental students in Spanish universities. The survey was adapted from English to Spanish. A sample of 533 dental students from 15 Spanish universities and a control group of 188 medical students self-administered the survey online, using the Google Drive service. The test-retest reliability or reproducibility showed an Intraclass Correlation Coefficient of 0.95. The internal consistency of the survey was 0.922. Testing the construct validity showed two components with an eigenvalue greater than 1.5, which explained 51.2% of the total variance. Factor I (36.6% of the variance) comprised the items that estimated emotional exhaustion and depersonalization. Factor II (14.6% of the variance) contained the items that estimated personal accomplishment. The cut-off point for the existence of burnout achieved a sensitivity of 92.2%, a specificity of 92.1%, and an area under the curve of 0.96. Comparison of the total dental students sample and the control group of medical students showed significantly higher burnout levels for the dental students (50.3% vs. 40.4%). In this study, the MBI-HSS was found to be viable, valid, and reliable for measuring burnout in dental students. Since the study also found that the dental students suffered from high levels of this syndrome, these results suggest the need for preventive burnout control programs

    Differential expression of long non-coding RNAs are related to proliferation and histological diversity in follicular lymphomas

    Get PDF
    "This is the peer reviewed version of the following article: Roisman, Alejandro, et al. "Differential expression of long non‐coding RNA s are related to proliferation and histological diversity in follicular lymphomas." British journal of haematology (2018), which has been published in final form at https://doi.org/10.1111/bjh.15656. This article may be used for non-commercial purposes in accordance with Wiley Terms and Conditions for Self-Archiving."Long non-coding RNAs (lncRNAs) comprise a family of non-coding transcripts that are emerging as relevant gene expression regulators of different processes, including tumour development. To determine the possible contribution of lncRNA to the pathogenesis of follicular lymphoma (FL) we performed RNA-sequencing at high depth sequencing in primary FL samples ranging from grade 1-3A to aggressive grade 3B variants using unpurified (n = 16) and purified (n = 12) tumour cell suspensions from nodal samples. FL grade 3B had a significantly higher number of differentially expressed lncRNAs (dif-lncRNAs) with potential target coding genes related to cell cycle regulation. Nine out of the 18 selected dif-lncRNAs were validated by quantitative real time polymerase chain reaction in an independent series (n = 43) of FL. RP4-694A7.2 was identified as the top deregulated lncRNA potentially involved in cell proliferation. RP4-694A7.2 silencing in the WSU-FSCCL FL cell line reduced cell proliferation due to a block in the G1/S phase. The relationship between RP4-694A7.2 and proliferation was confirmed in primary samples as its expression levels positively related to the Ki-67 proliferation index. In summary, lncRNAs are differentially expressed across the clinico-biological spectrum of FL and a subset of them, related to cell cycle, may participate in cell proliferation regulation in these tumours.Peer ReviewedPostprint (author's final draft

    NetBenchmark: a bioconductor package for reproducible benchmarks of gene regulatory network inference

    Get PDF
    Background: In the last decade, a great number of methods for reconstructing gene regulatory networks from expression data have been proposed. However, very few tools and datasets allow to evaluate accurately and reproducibly those methods. Hence, we propose here a new tool, able to perform a systematic, yet fully reproducible, evaluation of transcriptional network inference methods. Results: Our open-source and freely available Bioconductor package aggregates a large set of tools to assess the robustness of network inference algorithms against different simulators, topologies, sample sizes and noise intensities. Conclusions: The benchmarking framework that uses various datasets highlights the specialization of some methods toward network types and data. As a result, it is possible to identify the techniques that have broad overall performances.Peer ReviewedPostprint (published version

    Identification of rumen microbial biomarkers linked to methane emission in Holstein dairy cows

    Get PDF
    Mitigation of greenhouse gas emissions is relevant for reducing the environmental impact of ruminant production. In this study, the rumen microbiome from Holstein cows was characterized through a combination of 16S rRNA gene and shotgun metagenomic sequencing. Methane production (CH4) and dry matter intake (DMI) were individually measured over 4–6 weeks to calculate the CH4 yield (CH4y = CH4/DMI) per cow. We implemented a combination of clustering, multivariate and mixed model analyses to identify a set of operational taxonomic unit (OTU) jointly associated with CH4y and the structure of ruminal microbial communities. Three ruminotype clusters (R1, R2 and R3) were identified, and R2 was associated with higher CH4y. The taxonomic composition on R2 had lower abundance of Succinivibrionaceae and Methanosphaera, and higher abundance of Ruminococcaceae, Christensenellaceae and Lachnospiraceae. Metagenomic data confirmed the lower abundance of Succinivibrionaceae and Methanosphaera in R2 and identified genera (Fibrobacter and unclassified Bacteroidales) not highlighted by metataxonomic analysis. In addition, the functional metagenomic analysis revealed that samples classified in cluster R2 were overrepresented by genes coding for KEGG modules associated with methanogenesis, including a significant relative abundance of the methyl‐coenzyme M reductase enzyme. Based on the cluster assignment, we applied a sparse partial least‐squares discriminant analysis at the taxonomic and functional levels. In addition, we implemented a sPLS regression model using the phenotypic variation of CH4y. By combining these two approaches, we identified 86 discriminant bacterial OTUs, notably including families linked to CH4 emission such as Succinivibrionaceae, Ruminococcaceae, Christensenellaceae, Lachnospiraceae and Rikenellaceae. These selected OTUs explained 24% of the CH4y phenotypic variance, whereas the host genome contribution was ~14%. In summary, we identified rumen microbial biomarkers associated with the methane production of dairy cows; these biomarkers could be used for targeted methane‐reduction selection programmes in the dairy cattle industry provided they are heritable.info:eu-repo/semantics/publishedVersio

    Differential expression of long non-coding RNAs are related to proliferation and histological diversity in follicular lymphomas

    Get PDF
    Long non‐coding RNAs (lncRNAs) comprise a family of non‐coding transcripts that are emerging as relevant gene expression regulators of different processes, including tumour development. To determine the possible contribution of lncRNA to the pathogenesis of follicular lymphoma (FL) we performed RNA‐sequencing at high depth sequencing in primary FL samples ranging from grade 1‐3A to aggressive grade 3B variants using unpurified (n = 16) and purified (n = 12) tumour cell suspensions from nodal samples. FL grade 3B had a significantly higher number of differentially expressed lncRNAs (dif‐lncRNAs) with potential target coding genes related to cell cycle regulation. Nine out of the 18 selected dif‐lncRNAs were validated by quantitative real time polymerase chain reaction in an independent series (n = 43) of FL. RP4‐694A7.2 was identified as the top deregulated lncRNA potentially involved in cell proliferation. RP4‐694A7.2 silencing in the WSU‐FSCCL FL cell line reduced cell proliferation due to a block in the G1/S phase. The relationship between RP4‐694A7.2 and proliferation was confirmed in primary samples as its expression levels positively related to the Ki‐67 proliferation index. In summary, lncRNAs are differentially expressed across the clinico‐biological spectrum of FL and a subset of them, related to cell cycle, may participate in cell proliferation regulation in these tumours
    corecore