1,111 research outputs found

    RTCGAToolbox: A New Tool for Exporting TCGA Firehose Data

    Get PDF
    Background & Objective Managing data from large-scale projects (such as The Cancer Genome Atlas (TCGA)) for further analysis is an important and time consuming step for research projects. Several efforts, such as the Firehose project, make TCGA pre-processed data publicly available via web services and data portals, but this information must be managed, downloaded and prepared for subsequent steps. We have developed an open source and extensible R based data client for pre-processed data from the Firehouse, and demonstrate its use with sample case studies. Results show that our RTCGAToolbox can facilitate data management for researchers interested in working with TCGA data. The RTCGAToolbox can also be integrated with other analysis pipelines for further data processing. Availability and implementation The RTCGAToolbox is open-source and licensed under the GNU General Public License Version 2.0. All documentation and source code for RTCGAToolbox is freely available at http://mksamur.github.io/RTCGAToolbox/ for Linux and Mac OS X operating systems

    Applications

    Get PDF
    Volume 3 describes how resource-aware machine learning methods and techniques are used to successfully solve real-world problems. The book provides numerous specific application examples: in health and medicine for risk modelling, diagnosis, and treatment selection for diseases in electronics, steel production and milling for quality control during manufacturing processes in traffic, logistics for smart cities and for mobile communications

    Applications

    Get PDF
    Volume 3 describes how resource-aware machine learning methods and techniques are used to successfully solve real-world problems. The book provides numerous specific application examples: in health and medicine for risk modelling, diagnosis, and treatment selection for diseases in electronics, steel production and milling for quality control during manufacturing processes in traffic, logistics for smart cities and for mobile communications

    Enabling Web-scale data integration in biomedicine through Linked Open Data

    Get PDF
    The biomedical data landscape is fragmented with several isolated, heterogeneous data and knowledge sources, which use varying formats, syntaxes, schemas, and entity notations, existing on the Web. Biomedical researchers face severe logistical and technical challenges to query, integrate, analyze, and visualize data from multiple diverse sources in the context of available biomedical knowledge. Semantic Web technologies and Linked Data principles may aid toward Web-scale semantic processing and data integration in biomedicine. The biomedical research community has been one of the earliest adopters of these technologies and principles to publish data and knowledge on the Web as linked graphs and ontologies, hence creating the Life Sciences Linked Open Data (LSLOD) cloud. In this paper, we provide our perspective on some opportunities proffered by the use of LSLOD to integrate biomedical data and knowledge in three domains: (1) pharmacology, (2) cancer research, and (3) infectious diseases. We will discuss some of the major challenges that hinder the wide-spread use and consumption of LSLOD by the biomedical research community. Finally, we provide a few technical solutions and insights that can address these challenges. Eventually, LSLOD can enable the development of scalable, intelligent infrastructures that support artificial intelligence methods for augmenting human intelligence to achieve better clinical outcomes for patients, to enhance the quality of biomedical research, and to improve our understanding of living systems

    AGUIA: autonomous graphical user interface assembly for clinical trials semantic data services

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>AGUIA is a front-end web application originally developed to manage clinical, demographic and biomolecular patient data collected during clinical trials at MD Anderson Cancer Center. The diversity of methods involved in patient screening and sample processing generates a variety of data types that require a resource-oriented architecture to capture the associations between the heterogeneous data elements. AGUIA uses a semantic web formalism, resource description framework (RDF), and a bottom-up design of knowledge bases that employ the S3DB tool as the starting point for the client's interface assembly.</p> <p>Methods</p> <p>The data web service, S3DB, meets the necessary requirements of generating the RDF and of explicitly distinguishing the description of the domain from its instantiation, while allowing for continuous editing of both. Furthermore, it uses an HTTP-REST protocol, has a SPARQL endpoint, and has open source availability in the public domain, which facilitates the development and dissemination of this application. However, S3DB alone does not address the issue of representing content in a form that makes sense for domain experts.</p> <p>Results</p> <p>We identified an autonomous set of descriptors, the GBox, that provides user and domain specifications for the graphical user interface. This was achieved by identifying a formalism that makes use of an RDF schema to enable the automatic assembly of graphical user interfaces in a meaningful manner while using only resources native to the client web browser (JavaScript interpreter, document object model). We defined a generalized RDF model such that changes in the graphic descriptors are automatically and immediately (locally) reflected into the configuration of the client's interface application.</p> <p>Conclusions</p> <p>The design patterns identified for the GBox benefit from and reflect the specific requirements of interacting with data generated by clinical trials, and they contain clues for a general purpose solution to the challenge of having interfaces automatically assembled for multiple and volatile views of a domain. By coding AGUIA in JavaScript, for which all browsers include a native interpreter, a solution was found that assembles interfaces that are meaningful to the particular user, and which are also ubiquitous and lightweight, allowing the computational load to be carried by the client's machine.</p

    Implementazione ed ottimizzazione di algoritmi per l'analisi di Biomedical Big Data

    Get PDF
    Big Data Analytics poses many challenges to the research community who has to handle several computational problems related to the vast amount of data. An increasing interest involves Biomedical data, aiming to get the so-called personalized medicine, where therapy plans are designed on the specific genotype and phenotype of an individual patient and algorithm optimization plays a key role to this purpose. In this work we discuss about several topics related to Biomedical Big Data Analytics, with a special attention to numerical issues and algorithmic solutions related to them. We introduce a novel feature selection algorithm tailored on omics datasets, proving its efficiency on synthetic and real high-throughput genomic datasets. We tested our algorithm against other state-of-art methods obtaining better or comparable results. We also implemented and optimized different types of deep learning models, testing their efficiency on biomedical image processing tasks. Three novel frameworks for deep learning neural network models development are discussed and used to describe the numerical improvements proposed on various topics. In the first implementation we optimize two Super Resolution models showing their results on NMR images and proving their efficiency in generalization tasks without a retraining. The second optimization involves a state-of-art Object Detection neural network architecture, obtaining a significant speedup in computational performance. In the third application we discuss about femur head segmentation problem on CT images using deep learning algorithms. The last section of this work involves the implementation of a novel biomedical database obtained by the harmonization of multiple data sources, that provides network-like relationships between biomedical entities. Data related to diseases and other biological relates were mined using web-scraping methods and a novel natural language processing pipeline was designed to maximize the overlap between the different data sources involved in this project

    New control mechanisms of the tumor suppressor protein FBXW7 mediated by DYRK2 kinase

    Get PDF
    The systems responsible for protein homeostasis play a fundamental role in the development of tumors. These systems recognize and degrade incomplete, misfolded proteins or act as control mechanisms for cellular processes. Specifically, protein degradation by the ubiquitination proteasome system (UPS) controls a wide range of them. This ubiquitination-mediated proteolysis is rapid, irreversible, highly regulated, and plays a key role in cell division, growth, and differentiation. With almost total certainty, the protein that is part of a UPS most widely studied to date is FBXW7 ubiquitin ligase, which also has among its substrates key oncoproteins in tumor development involved in cell cycle control, proliferation, migration and tumorigenesis. Furthermore, FBXW7 is among the most mutated genes associated with the development of cancer. Given the high prevalence of FBXW7 mutations, the development of therapies targeting FBXW7 pathway is of great interest. All these data make the scientific community consider FBXW7 a key and fundamental candidate for the search for regulatory mechanisms that can open new anticancer therapies. However, the few known mechanisms for regulating the expression or activity of FBXW7 and, therefore, those signaling pathways vulnerable to being altered using drugs make it necessary to describe new pathways or ways capable of regulating this tumor suppressor. In this work, a new regulatory mechanism for FBXW7 dependent on the serine/threonine protein kinase DYRK2 is described for the first time. We show that DYRK2 interacts with and phosphorylates FBXW7, resulting in its proteasome-mediated degradation. DYRK2-mediated destabilization of FBXW7 is independent of its ubiquitin ligase activity. Furthermore, functional analysis demonstrates the existence of DYRK2- dependent regulatory mechanisms for key substrates of FBXW7. Finally, we provide evidence indicating that DYRK2-dependent regulation of FBXW7 protein accumulation contributes to cytotoxic effects in response to chemotherapeutic agents such as doxorubicin or paclitaxel in colorectal cancer cell lines and to BET inhibitors in T-cell acute lymphoblastic leukemia cell lines. Taken together, this work reveals a new regulatory axis, DYRK2/FBXW7, which provides insight into the role of these two proteins in tumor progression and the response to DNA damage.Los sistemas encargados de la homeostasis de las proteínas juegan un papel fundamental en el desarrollo de tumores. Estos sistemas reconocen y degradan proteínas incompletas, mal plegadas o actúan como mecanismos de control de procesos celulares. Concretamente, la degradación de proteínas por el sistema del proteosoma mediante ubiquitinación (UPS) controla un amplio abanico de ellos. Esta proteólisis mediada por ubiquitinación es rápida, irreversible y está altamente regulada, y cumple un rol determinante en la división, el crecimiento y la diferenciación celular. Con casi total seguridad, la proteína que forma parte de un UPS más ampliamente estudiada hasta la fecha es la ubiquitina ligasa FBXW7, la cual además tiene entre sus sustratos a oncoproteínas clave en el desarrollo tumoral implicados en el control ciclo celular, proliferación, migración y tumorigénesis. Además, FBXW7 se encuentra entre los genes más mutados y asociados al desarrollo de cáncer. Dada la alta prevalencia de mutaciones de FBXW7, el desarrollo de terapias dirigidas a la vía de FBXW7 tiene un gran interés. Todos estos datos hacen que la comunidad científica considere a FBXW7 una candidata clave y fundamental para la búsqueda de mecanismos reguladores que puedan abrir nuevas terapias anticancerígenas. No obstante, los pocos mecanismos de regulación de la expresión o actividad de FBXW7 conocidos y, por lo tanto, aquellas rutas de señalización vulnerables a ser alteradas mediante el uso de fármacos hacen necesario describir nuevas vías o formas capaces de regular a este supresor tumoral. En este trabajo se describe por primera vez, un nuevo mecanismo regulador para FBXW7 dependiente de la serina/treonina proteína quinasa DYRK2. Mostramos que DYRK2 interactúa con FBXW7 y la fosforila, lo que resulta en su degradación mediada por el proteasoma. La desestabilización de FBXW7 mediad por DYRK2 es independiente de su actividad ubiquitina ligasa. Además, el análisis funcional demuestra la existencia de mecanismos reguladores dependientes de DYRK2 para sustratos clave de FBXW7. Finalmente, proporcionamos evidencia que indica que la regulación dependiente de DYRK2 de la acumulación de proteína FBXW7 contribuye a los efectos citotóxicos en respuesta a agentes quimioterapéuticos como la doxorrubicina o el paclitaxel en líneas celulares de cáncer colorrectal y a los inhibidores BET en líneas celulares de leucemia linfoblástica aguda de células T. En conjunto, este trabajo revela un nuevo eje regulador, DYRK2/FBXW7, que permite comprender el papel de estas dos proteínas en la progresión tumoral y la respuesta al daño del ADN

    Using bioinformatic analyses to understand prostate cancer cell biology

    Get PDF
    Le cancer de la prostate (CaP) affecte 1 homme sur 7 au cours de sa vie. C’est le cancer numéro un diagnostiqué chez l'homme. Il s'agit du quatrième cancer le plus fréquent au Canada. Le CaP est une maladie hormonodépendante diagnostiquée chez l'homme. Les androgènes jouent un rôle vital dans la progression de la maladie. La première ligne de traitement, suivant une intervention chirurgicale ou un traitement de radiothérapie, est la thérapie de déprivation aux androgènes. Malgré une réponse initiale positive à l'inhibition des androgènes, la progression de la maladie vers un cancer de la prostate résistant à la castration (CRPC) est presque inévitable. Aux différentes étapes du CaP, le récepteur des androgènes joue un rôle majeur. Ainsi, cette thèse décrit les méthodes développées et utilisées pour mieux comprendre la biologie du CaP et le rôle joué par les androgènes dans cette maladie. Le travail démontré dans cette thèse se compose principalement d'analyses bioinformatiques effectuées sur des ensembles de données accessibles au public et d'un « pipeline » construit pour analyser des données RNA-Seq. Un pipeline RNA-Seq a été développé pour comprendre l'impact des androgènes et des gènes régulés lors du traitement aux androgènes dans les modèles de cellules de CaP. Ce pipeline bioinformatique se compose de divers outils qui ont été décrits ci-dessous dans le chapitre 1. L'objectif principal de ce projet était de développer un pipeline pour analyser les données RNA-Seq qui aide à comprendre et à définir les voies et les gènes métaboliques qui sont régulés par les androgènes, et qui jouent un rôle important dans la progression du CaP. Le flux de travail expérimental consistait en deux lignées cellulaires positives aux récepteurs aux androgènes LNCaP et LAPC4. Toutes les données utilisées dans ce projet ont été rendues publiques pour que la communauté de recherche puisse effectuer diverses autres études et analyses comparatives pour comprendre les fonctions des androgènes dans un sens beaucoup plus profond afin de développer de nouvelles thérapies pour traiter le CaP. Dans un autre projet décrit au chapitre 2, des analyses bioinformatiques ont été réalisées sur des données accessibles au public pour comprendre la fréquence de la perte et de l'altération génomique du gène PTEN localisé à 10q23. Ces analyses ont mis en évidence la fréquence d'altération génomique de PTEN qui est beaucoup plus élevée dans le CRPC que dans le CaP localisé. Ces analyses ont également aidé à identifier d'autres gènes altérés dans le CaP. Ces gènes n’ont pas été beaucoup étudiés dans la littérature, mais il semble que certains d’entre eux possèdent des caractéristiques de suppresseurs de tumeurs. Ces résultats pourraient être un bon début pour des analyses plus approfondies concernant la perte de gènes.La compréhension des fonctions de AR et de la suppression de PTEN aidera à développer de nouvelles stratégies et approches pour diagnostiquer et traiter le CaP. L'intégration des analyses bioinformatiques à la recherche clinique ouvre une nouvelle perspective dans le domaine de la recherche du CaP.Prostate Cancer (PCa) affects 1 in 7 men in their lifetime and is the number one diagnosed cancer in men. It is the 4th most common cancer in Canada. PCa is a hormone-dependent disease diagnosed in men. Androgens play a vital role in the disease progression. The standard of care to treat PCa, following surgery or radiation therapy, is the androgen deprivation therapy (ADT). In spite of initial positive response to androgen inhibition, the progression of the disease to castration-resistant prostate cancer (CRPC) is almost inevitable. Across the various stages of PCa, the androgen receptor (AR) plays a major role. This thesis portrays the methods developed and used to understand PCa biology. The work demonstrated in this thesis majorly consists of bioinformatic analyses performed on publicly available data sets and a pipeline built to analyse RNA-Seq data. An RNA-Seq pipeline has been developed to understand the impact of androgens and the genes regulated upon androgen treatment in PCa cell models. This bioinformatic pipeline consists of various tools which have been described below in chapter 1. The major goal of this project was to develop a pipeline to analyse the RNA-Seq data which helps to understand and define the metabolic pathways and genes regulated by androgens which play an important role in PCa disease progression. The experimental workflow consisted of two androgen receptor positive cell lines LNCaP and LAPC4. All the data used in this project has been made publicly available for the research community to perform various other comparative studies and analyses to understand the functions of androgens in a much deeper sense to develop novel therapies to treat PCa. In another project described in chapter 2, bioinformatic analyses have been performed on publicly available data to understand the loss and genomic alteration frequency of the gene PTEN occurring at 10q23. These analyses highlighted that the genomic alteration frequency of PTEN is much higher in CRPC than in localised PCa, and also helped in identifying other genes which are lost along with PTEN. The lost genes have not been studied much in literature, but few studies demonstrated that they might possess tumor suppressor characteristics. These results might be a good start for further deeper analyses regarding the lost of genes. Understanding the functions of AR and the deletion of PTEN will help for the development of novel strategies and approaches to diagnose and treat PCa. Integration of bioinformatic analyses with clinical research open up a new perspective in the PCa research domain

    Health State Estimation

    Full text link
    Life's most valuable asset is health. Continuously understanding the state of our health and modeling how it evolves is essential if we wish to improve it. Given the opportunity that people live with more data about their life today than any other time in history, the challenge rests in interweaving this data with the growing body of knowledge to compute and model the health state of an individual continually. This dissertation presents an approach to build a personal model and dynamically estimate the health state of an individual by fusing multi-modal data and domain knowledge. The system is stitched together from four essential abstraction elements: 1. the events in our life, 2. the layers of our biological systems (from molecular to an organism), 3. the functional utilities that arise from biological underpinnings, and 4. how we interact with these utilities in the reality of daily life. Connecting these four elements via graph network blocks forms the backbone by which we instantiate a digital twin of an individual. Edges and nodes in this graph structure are then regularly updated with learning techniques as data is continuously digested. Experiments demonstrate the use of dense and heterogeneous real-world data from a variety of personal and environmental sensors to monitor individual cardiovascular health state. State estimation and individual modeling is the fundamental basis to depart from disease-oriented approaches to a total health continuum paradigm. Precision in predicting health requires understanding state trajectory. By encasing this estimation within a navigational approach, a systematic guidance framework can plan actions to transition a current state towards a desired one. This work concludes by presenting this framework of combining the health state and personal graph model to perpetually plan and assist us in living life towards our goals.Comment: Ph.D. Dissertation @ University of California, Irvin
    corecore