289 research outputs found

    LABRAD : Vol 46, Issue 3 - May 2021

    Get PDF
    Laboratory Diagnostic Tools for HIV in Neonates and Infants Intussusception in Pediatric Patients: Role of Radiology in Diagnosis and Management Fecal Calprotectin as an Inflammatory Marker in Inflammatory Bowel Disease Spectrum of Tools for the Detection of Enteric Pathogens in A Clinical Laboratory Pleuropulmonary Blastoma Type III with Rhabdomyosarcomatous and Chondrsarcomatous Differentiation in a Five year old Child, A rare case Radiology Pathology Correlation: Orthopedic Pathology Updates in Reporting Wilms Tumor (nephrectomy) Specimen Biotinidase Deficiency- a Rare but Easily Treatable Disorder Renal Tubular Disorders and Biochemical Diagnostics TSH Receptor Antibodies (TRAb) in Neonatal Hyperthyroidism: What We Need to Know The Best of the Past Polaroidhttps://ecommons.aku.edu/labrad/1035/thumbnail.jp

    Systems Analytics and Integration of Big Omics Data

    Get PDF
    A “genotype"" is essentially an organism's full hereditary information which is obtained from its parents. A ""phenotype"" is an organism's actual observed physical and behavioral properties. These may include traits such as morphology, size, height, eye color, metabolism, etc. One of the pressing challenges in computational and systems biology is genotype-to-phenotype prediction. This is challenging given the amount of data generated by modern Omics technologies. This “Big Data” is so large and complex that traditional data processing applications are not up to the task. Challenges arise in collection, analysis, mining, sharing, transfer, visualization, archiving, and integration of these data. In this Special Issue, there is a focus on the systems-level analysis of Omics data, recent developments in gene ontology annotation, and advances in biological pathways and network biology. The integration of Omics data with clinical and biomedical data using machine learning is explored. This Special Issue covers new methodologies in the context of gene–environment interactions, tissue-specific gene expression, and how external factors or host genetics impact the microbiome

    Previsão e análise da estrutura e dinâmica de redes biológicas

    Get PDF
    Increasing knowledge about the biological processes that govern the dynamics of living organisms has fostered a better understanding of the origin of many diseases as well as the identification of potential therapeutic targets. Biological systems can be modeled through biological networks, allowing to apply and explore methods of graph theory in their investigation and characterization. This work had as main motivation the inference of patterns and rules that underlie the organization of biological networks. Through the integration of different types of data, such as gene expression, interaction between proteins and other biomedical concepts, computational methods have been developed so that they can be used to predict and study diseases. The first contribution, was the characterization a subsystem of the human protein interactome through the topological properties of the networks that model it. As a second contribution, an unsupervised method using biological criteria and network topology was used to improve the understanding of the genetic mechanisms and risk factors of a disease through co-expression networks. As a third contribution, a methodology was developed to remove noise (denoise) in protein networks, to obtain more accurate models, using the network topology. As a fourth contribution, a supervised methodology was proposed to model the protein interactome dynamics, using exclusively the topology of protein interactions networks that are part of the dynamic model of the system. The proposed methodologies contribute to the creation of more precise, static and dynamic biological models through the identification and use of topological patterns of protein interaction networks, which can be used to predict and study diseases.O conhecimento crescente sobre os processos biológicos que regem a dinâmica dos organismos vivos tem potenciado uma melhor compreensão da origem de muitas doenças, assim como a identificação de potenciais alvos terapêuticos. Os sistemas biológicos podem ser modelados através de redes biológicas, permitindo aplicar e explorar métodos da teoria de grafos na sua investigação e caracterização. Este trabalho teve como principal motivação a inferência de padrões e de regras que estão subjacentes à organização de redes biológicas. Através da integração de diferentes tipos de dados, como a expressão de genes, interação entre proteínas e outros conceitos biomédicos, foram desenvolvidos métodos computacionais, para que possam ser usados na previsão e no estudo de doenças. Como primeira contribuição, foi proposto um método de caracterização de um subsistema do interactoma de proteínas humano através das propriedades topológicas das redes que o modelam. Como segunda contribuição, foi utilizado um método não supervisionado que utiliza critérios biológicos e topologia de redes para, através de redes de co-expressão, melhorar a compreensão dos mecanismos genéticos e dos fatores de risco de uma doença. Como terceira contribuição, foi desenvolvida uma metodologia para remover ruído (denoise) em redes de proteínas, para obter modelos mais precisos, utilizando a topologia das redes. Como quarta contribuição, propôs-se uma metodologia supervisionada para modelar a dinâmica do interactoma de proteínas, usando exclusivamente a topologia das redes de interação de proteínas que fazem parte do modelo dinâmico do sistema. As metodologias propostas contribuem para a criação de modelos biológicos, estáticos e dinâmicos, mais precisos, através da identificação e uso de padrões topológicos das redes de interação de proteínas, que podem ser usados na previsão e no estudo doenças.Programa Doutoral em Engenharia Informátic

    Machine Learning Approaches for Breast Cancer Survivability Prediction

    Get PDF
    Breast cancer is one of the leading causes of cancer death in women. If not diagnosed early, the 5-year survival rate of patients is just about 26\%. Furthermore, patients with similar phenotypes can respond differently to the same therapies, which means the therapies might not work well for some of them. Identifying biomarkers that can help predict a cancer class with high accuracy is at the heart of breast cancer studies because they are targets of the treatments and drug development. Genomics data have been shown to carry useful information for breast cancer diagnosis and prognosis, as well as uncovering the disease’s mechanism. Machine learning methods are powerful tools to find such information. Feature selection methods are often utilized in supervised learning and unsupervised learning tasks to deal with data containing a large number of features in which only a small portion of them are useful to the classification task. On the other hand, analyzing only one type of data, without reference to the existing knowledge about the disease and the therapies, might mislead the findings. Effective data integration approaches are necessary to uncover this complex disease. In this thesis, we apply and develop machine learning methods to identify meaningful biomarkers for breast cancer survivability prediction after a certain treatment. They include applying feature selection methods on gene-expression data to derived gene-signatures, where the initial genes are collected concerning the mechanism of some drugs used breast cancer therapies. We also propose a new feature selection method, named PAFS, and apply it to discover accurate biomarkers. In addition, it has been increasingly supported that, sub-network biomarkers are more robust and accurate than gene biomarkers. We proposed two network-based approaches to identify sub-network biomarkers for breast cancer survivability prediction after a treatment. They integrate gene-expression data with protein-protein interactions during the optimal sub-network searching process and use cancer-related genes and pathways to prioritize the extracted sub-networks. The sub-network search space is usually huge and many proteins interact with thousands of other proteins. Thus, we apply some heuristics to avoid generating and evaluating redundant sub-networks

    Novel feature selection methods for high dimensional data

    Get PDF
    [Resumen] La selección de características se define como el proceso de detectar las características relevantes y descartar las irrelevantes, con el objetivo de obtener un subconjunto de características más pequeño que describa adecuadamente el problema dado con una degradación mínima o incluso con una mejora del rendimiento. Con la llegada de los conjuntos de alta dimensión -tanto en muestras como en características-, se ha vuelto indispensable la identifícación adecuada de las características relevantes en escenarios del mundo real. En este contexto, los diferentes métodos disponibles se encuentran con un nuevo reto en cuanto a aplicabilidad y escalabilidad. Además, es necesario desarrollar nuevos métodos que tengan en cuenta estas particularidades de la alta dimensión. Esta tesis está dedicada a la investigación en selección de características y a su aplicación a datos reales de alta dimensión. La primera parte de este trabajo trata del análisis de los métodos de selección de características existentes, para comprobar su idoneidad frente a diferentes retos y para poder proporcionar nuevos resultados a los investigadores de selección de características. Para esto, se han aplicado las técnicas más populares a problemas reales, con el objetivo de obtener no sólo mejoras en rendimiento sino también para permitir su aplicación en tiempo real. Además de la eficiencia, la escalabilidad también es un aspecto crítico en aplicaciones de gran escala. La eficacia de los métodos de selección de características puede verse significativamente degradada, si no totalmente inaplicable, cuando el tamaño de los datos se incrementa continuamente. Por este motivo, la escalabilidad de los métodos de selección de características también debe ser analizada. Tras llevar a cabo un análisis en profundidad de los métodos de selección de características existentes, la segunda parte de esta tesis se centra en el desarrollo de nuevas técnicas. Debido a que la mayoría de métodos de selección existentes necesitan que los datos sean discretos, la primera aproximación propuesta consiste en la combinación de un discretizador, un filtro y un clasificador, obteniendo resultados prometedores en escenarios diferentes. En un intento de introducir diversidad, la segunda propuesta trata de usar un conjunto de filtros en lugar de uno sólo, con el objetivo de liberar al usuario de tener que decidir que técnica es la más adecuada para un problema dado. La tercera técnica propuesta en esta tesis no solo considera la relevancia de las características sino también su coste asociado -económico o en cuanto a tiempo de ejecución-, por lo que se presenta una metodología general para selección de características basada en coste. Por último, se proponen varias estrategias para distribuir y paralelizar la selección de características, ya que transformar un problema de gran escala en varios problemas de pequeña escala puede llevar a mejoras en el tiempo de procesado y, en algunas ocasiones, en precisión de clasificación

    Elicitation of relevant information from medical databases: application to the encoding of secondary diagnoses

    Get PDF
    Dans cette thèse, nous nous concentrons sur le codage du séjour d'hospitalisation en codes standards. Ce codage est une tâche médicale hautement sensible dans les hôpitaux français, nécessitant des détails minutieux et une haute précision, car le revenu de l'hôpital en dépend directement. L'encodage du séjour d'hospitalisation comprend l'encodage du diagnostic principal qui motive le séjour d'hospitalisation et d'autres diagnostics secondaires qui surviennent pendant le séjour. Nous proposons une analyse rétrospective mettant en oeuvre des méthodes d'apprentissage, sur la tâche d'encodage de certains diagnostics secondaires sélectionnés. Par conséquent, la base de données PMSI, une grande base de données médicales qui documente toutes les informations sur les séjours d'hospitalisation en France.} est analysée afin d'extraire à partir de séjours de patients hospitalisés antérieurement, des variables décisives (Features). Identifier ces variables permet de pronostiquer le codage d'un diagnostic secondaire difficile qui a eu lieu avec un diagnostic principal fréquent. Ainsi, à la fin d'une session de codage, nous proposons une aide pour les codeurs en proposant une liste des encodages pertinents ainsi que des variables utilisées pour prédire ces encodages. Les défis nécessitent une connaissance métier dans le domaine médical et une méthodologie d'exploitation efficace de la base de données médicales par les méthodes d'apprentissage automatique. En ce qui concerne le défi lié à la connaissance du domaine médical, nous collaborons avec des codeurs experts dans un hôpital local afin de fournir un aperçu expert sur certains diagnostics secondaires difficiles à coder et afin d'évaluer les résultats de la méthodologie proposée. En ce qui concerne le défi lié à l'exploitation des bases de données médicales par des méthodes d'apprentissage automatique, plus spécifiquement par des méthodes de "Feature Selection" (FS), nous nous concentrons sur la résolution de certains points : le format des bases de données médicales, le nombre de variables dans les bases de données médicales et les variables instables extraites des bases de données médicales. Nous proposons une série de transformations afin de rendre le format de la base de données médicales, en général sous forme de bases de données relationnelles, exploitable par toutes les méthodes de type FS. Pour limiter l'explosion du nombre de variables représentées dans la base de données médicales, généralement motivée par la quantité de diagnostics et d'actes médicaux, nous analysons l'impact d'un regroupement de ces variables dans un niveau de représentation approprié et nous choisissons le meilleur niveau de représentation. Enfin, les bases de données médicales sont souvent déséquilibrées à cause de la répartition inégale des exemples positifs et négatifs. Cette répartition inégale cause des instabilités de variables extraites par des méthodes de FS. Pour résoudre ce problème, nous proposons une méthodologie d'extraction des variables stables en échantillonnant plusieurs fois l'ensemble de données et en extrayant les variables pertinentes de chaque ensemble de données échantillonné. Nous évaluons la méthodologie en établissant un modèle de classification qui prédit les diagnostics étudiés à partir des variables extraites. La performance du modèle de classification indique la qualité des variables extraites, car les variables de bonne qualité produisent un bon modèle de classification. Deux échelles de base de données PMSI sont utilisées: échelle locale et régionale. Le modèle de classification est construit en utilisant l'échelle locale de PMSI et testé en utilisant des échelles locales et régionales. Les évaluations ont montré que les variables extraites sont de bonnes variables pour coder des diagnostics secondaires. Par conséquent, nous proposons d'appliquer notre méthodologie pour éviter de manquer des encodages importants qui affectent le budget de l'hôpital en fournissant aux codeurs les encodages potentiels des diagnostics secondaires ainsi que les variables qui conduisent à ce codage.In the thesis we focus on encoding inpatient episode into standard codes, a highly sensitive medical task in French hospitals, requiring minute detail and accuracy, since the hospital's income directly depends on it. Encoding inpatient episode includes encoding the primary diagnosis that motivates the hospitalisation stay and other secondary diagnoses that occur during the stay. Unlike primary diagnosis, encoding secondary diagnoses is prone to human error, due to the difficulty of collecting relevant data from different medical sources, or to the outright absence of relevant data that helps encoding the diagnosis. We propose a retrospective analysis on the encoding task of some selected secondary diagnoses. Hence, the PMSI database is analysed in order to extract, from previously encoded inpatient episodes, the decisive features to encode a difficult secondary diagnosis occurred with frequent primary diagnosis. Consequently, at the end of an encoding session, once all the features are available, we propose to help the coders by proposing a list of relevant encodings as well as the features used to predict these encodings. Nonetheless, a set of challenges need to be addressed for the development of an efficient encoding help system. The challenges include, an expert knowledge in the medical domain and an efficient exploitation methodology of the medical database by Machine Learning methods. With respect to the medical domain knowledge challenge, we collaborate with expert coders in a local hospital in order to provide expert insight on some difficult secondary diagnoses to encode and in order to evaluate the results of the proposed methodology. With respect to the medical databases exploitation challenge, we use ML methods such as Feature Selection (FS), focusing on resolving several issues such as the incompatible format of the medical databases, the excessive number features of the medical databases in addition to the unstable features extracted from the medical databases. Regarding to issue of the incompatible format of the medical databases caused by relational databases, we propose a series of transformation in order to make the database and its features more exploitable by any FS methods. To limit the effect of the excessive number of features in the medical database, usually motivated by the amount of the diagnoses and the medical procedures, we propose to group the excessive number features into a proper representation level and to study the best representation level. Regarding to issue of unstable features extracted from medical databases, as the dataset linked with diagnoses are highly imbalanced due to classification categories that are unequally represented, most existing FS methods tend not to perform well on them even if sampling strategies are used. We propose a methodology to extract stable features by sampling the dataset multiple times and extracting the relevant features from each sampled dataset. Thus, we propose a methodology that resolves these issues and extracts stable set of features from medical database regardless to the sampling method and the FS method used in the methodology. Lastly, we evaluate the methodology by building a classification model that predicts the studied diagnoses out of the extracted features. The performance of the classification model indicates the quality of the extracted features, since good quality features produces good classification model. Two scales of PMSI database are used: local and regional scales. The classification model is built using the local scale of PMSI and tested out using both local and regional scales. Hence, we propose applying our methodology to increase the integrity of the encoded diagnoses and to prevent missing important encodings. We propose modifying the encoding process and providing the coders with the potential encodings of the secondary diagnoses as well as the features that lead to this encoding

    Molecular Targets of CNS Tumors

    Get PDF
    Molecular Targets of CNS Tumors is a selected review of Central Nervous System (CNS) tumors with particular emphasis on signaling pathway of the most common CNS tumor types. To develop drugs which specifically attack the cancer cells requires an understanding of the distinct characteristics of those cells. Additional detailed information is provided on selected signal pathways in CNS tumors

    Outils statistiques pour la sélection de variables\ud et l'intégration de données "omiques"

    Get PDF
    Les récentes avancées biotechnologiques permettent maintenant de mesurer une\ud énorme quantité de données biologiques de différentes sources (données génomiques,\ud protémiques, métabolomiques, phénotypiques), souvent caractérisées par un petit nombre\ud d'échantillons ou d'observations.\ud L'objectif de ce travail est de développer ou d'adapter des méthodes statistiques\ud adéquates permettant d'analyser ces jeux de données de grande dimension, en proposant\ud aux biologistes des outils efficaces pour sélectionner les variables les plus pertinentes.\ud Dans un premier temps, nous nous intéressons spécifiquement aux données de\ud transcriptome et à la sélection de gènes discriminants dans un cadre de classification\ud supervisée. Puis, dans un autre contexte, nous cherchons a sélectionner des variables de\ud types différents lors de la réconciliation (ou l'intégration) de deux tableaux de données\ud omiques.\ud Dans la première partie de ce travail, nous proposons une approche de type\ud wrapper en agrégeant des méthodes de classification (CART, SVM) pour sélectionner\ud des gènes discriminants une ou plusieurs conditions biologiques. Dans la deuxième\ud partie, nous développons une approche PLS avec pénalisation l1 dite de type sparse\ud car conduisant à un ensemble "creux" de paramètres, permettant de sélectionner des\ud sous-ensembles de variables conjointement mesurées sur les mêmes échantillons biologiques.\ud Un cadre de régression, ou d'analyse canonique est propose pour répondre\ud spécifiquement a la question biologique.\ud Nous évaluons chacune des approches proposées en les comparant sur de nombreux\ud jeux de données réels a des méthodes similaires proposées dans la littérature.\ud Les critères statistiques usuels que nous appliquons sont souvent limitée par le petit\ud nombre d'échantillons. Par conséquent, nous nous efforcons de toujours combiner nos\ud évaluations statistiques avec une interprétation biologique détaillee des résultats.\ud Les approches que nous proposons sont facilement applicables et donnent des\ud résultats très satisfaisants qui répondent aux attentes des biologistes.------------------------------------------------------------------------------------Recent advances in biotechnology allow the monitoring of large quantities of\ud biological data of various types, such as genomics, proteomics, metabolomics, phenotypes...,\ud that are often characterized by a small number of samples or observations.\ud The aim of this thesis was to develop, or adapt, appropriate statistical methodologies\ud to analyse highly dimensional data, and to present ecient tools to biologists\ud for selecting the most biologically relevant variables. In the rst part, we focus on\ud microarray data in a classication framework, and on the selection of discriminative\ud genes. In the second part, in the context of data integration, we focus on the selection\ud of dierent types of variables with two-block omics data.\ud Firstly, we propose a wrapper method, which agregates two classiers (CART\ud or SVM) to select discriminative genes for binary or multiclass biological conditions.\ud Secondly, we develop a PLS variant called sparse PLS that adapts l1 penalization and\ud allows for the selection of a subset of variables, which are measured from the same\ud biological samples. Either a regression or canonical analysis frameworks are proposed\ud to answer biological questions correctly.\ud We assess each of the proposed approaches by comparing them to similar methods\ud known in the literature on numerous real data sets. The statistical criteria that\ud we use are often limited by the small number of samples. We always try, therefore, to\ud combine statistical assessments with a thorough biological interpretation of the results.\ud The approaches that we propose are easy to apply and give relevant results that\ud answer the biologists needs
    corecore