228 research outputs found

    Querying Very Large Multi-dimensional Datasets in ADR - Extended Abstract

    Get PDF
    This paper addresses optimizing the execution of range queries into multi-dimensional datasets on distributed memory parallel machines within the Active Data Repository framework. ADR is an infrastructure that integrates storage, retrieval and processing of large multi-dimensional datasets on distributed memory parallel architectures with multiple disks attached to each node. We describe three potential strategies for efficient execution of such queries that employ different tiling and workload partitioning approaches. We evaluate scalability of these strategies for different application scenarios, varying both the number of processors and the input dataset size on a 128 processor IBM SP multicomputer. Also cross-referenced as UMIACS-TR-99-2

    Interactive exploration of population scale pharmacoepidemiology datasets

    Full text link
    Population-scale drug prescription data linked with adverse drug reaction (ADR) data supports the fitting of models large enough to detect drug use and ADR patterns that are not detectable using traditional methods on smaller datasets. However, detecting ADR patterns in large datasets requires tools for scalable data processing, machine learning for data analysis, and interactive visualization. To our knowledge no existing pharmacoepidemiology tool supports all three requirements. We have therefore created a tool for interactive exploration of patterns in prescription datasets with millions of samples. We use Spark to preprocess the data for machine learning and for analyses using SQL queries. We have implemented models in Keras and the scikit-learn framework. The model results are visualized and interpreted using live Python coding in Jupyter. We apply our tool to explore a 384 million prescription data set from the Norwegian Prescription Database combined with a 62 million prescriptions for elders that were hospitalized. We preprocess the data in two minutes, train models in seconds, and plot the results in milliseconds. Our results show the power of combining computational power, short computation times, and ease of use for analysis of population scale pharmacoepidemiology datasets. The code is open source and available at: https://github.com/uit-hdl/norpd_prescription_analyse

    PPQ-Trajectory : spatio-temporal quantization for querying in large trajectory repositories

    Get PDF
    We present PPQ-trajectory, a spatio-temporal quantization based solution for querying large dynamic trajectory data. PPQ-trajectory includes a partition-wise predictive quantizer (PPQ) that generates an error-bounded codebook with autocorrelation and spatial proximity-based partitions. The codebook is indexed to run approximate and exact spatio-temporal queries over compressed trajectories. PPQ-trajectory includes a coordinate quadtree coding for the codebook with support for exact queries. An incremental temporal partition-based index is utilised to avoid full reconstruction of trajectories during queries. An extensive set of experimental results for spatio-temporal queries on real trajectory datasets is presented. PPQ-trajectory shows significant improvements over the alternatives with respect to several performance measures, including the accuracy of results when the summary is used directly to provide approximate query results, the spatial deviation with which spatio-temporal path queries can be answered when the summary is used as an index, and the time taken to construct the summary. Superior results on the quality of the summary and the compression ratio are also demonstrated

    A transfer learning approach to drug resistance classification in mixed HIV dataset

    Get PDF
    Funding: This research is funded by the Tertiary Education Trust Fund (TETFund), Nigeria.As we advance towards individualized therapy, the ‘one-size-fits-all’ regimen is gradually paving the way for adaptive techniques that address the complexities of failed treatments. Treatment failure is associated with factors such as poor drug adherence, adverse side effect/reaction, co-infection, lack of follow-up, drug-drug interaction and more. This paper implements a transfer learning approach that classifies patients' response to failed treatments due to adverse drug reactions. The research is motivated by the need for early detection of patients' response to treatments and the generation of domain-specific datasets to balance under-represented classification data, typical of low-income countries located in Sub-Saharan Africa. A soft computing model was pre-trained to cluster CD4+ counts and viral loads of treatment change episodes (TCEs) processed from two disparate sources: the Stanford HIV drug resistant database (https://hivdb.stanford.edu), or control dataset, and locally sourced patients' records from selected health centers in Akwa Ibom State, Nigeria, or mixed dataset. Both datasets were experimented on a traditional 2-layer neural network (NN) and a 5-layer deep neural network (DNN), with odd dropout neurons distribution resulting in the following configurations: NN (Parienti et al., 2004) [32], NN (Deniz et al., 2018) [53] and DNN [9 7 5 3 1]. To discern knowledge of failed treatment, DNN1 [9 7 5 3 1] and DNN2 [9 7 5 3 1] were introduced to model both datasets and only TCEs of patients at risk of drug resistance, respectively. Classification results revealed fewer misclassifications, with the DNN architecture yielding best performance measures. However, the transfer learning approach with DNN2 [9 7 3 1] configuration produced superior classification results when compared to other variants/configurations, with classification accuracy of 99.40%, and RMSE values of 0.0056, 0.0510, and 0.0362, for test, train, and overall datasets, respectively. The proposed system therefore indicates good generalization and is vital as decision-making support to clinicians/physicians for predicting patients at risk of adverse drug reactions. Although imbalanced features classification is typical of disease problems and diminishes dependence on classification accuracy, the proposed system still compared favorably with the literature and can be hybridized to improve its precision and recall rates.Publisher PDFPeer reviewe

    Drug-drug interactions: A machine learning approach

    Get PDF
    Automatic detection of drug-drug interaction (DDI) is a difficult problem in pharmaco-surveillance. Recent practice for in vitro and in vivo pharmacokinetic drug-drug interaction studies have been based on carefully selected drug characteristics such as their pharmacological effects, and on drug-target networks, in order to identify and comprehend anomalies in a drug\u27s biochemical function upon co-administration.;In this work, we present a novel DDI prediction framework that combines several drug-attribute similarity measures to construct a feature space from which we train three machine learning algorithms: Support Vector Machine (SVM), J48 Decision Tree and K-Nearest Neighbor (KNN) using a partially supervised classification algorithm called Positive Unlabeled Learning (PU-Learning) tailored specifically to suit our framework.;In summary, we extracted 1,300 U.S. Food and Drug Administration-approved pharmaceutical drugs and paired them to create 1,688,700 feature vectors. Out of 397 drug-pairs known to interact prior to our experiments, our system was able to correctly identify 80% of them and from the remaining 1,688,303 pairs for which no interaction had been determined, we were able to predict 181 potential DDIs with confidence levels greater than 97%. The latter is a set of DDIs unrecognized by our source of ground truth at the time of study.;Evaluation of the effectiveness of our system involved querying the U.S. Food and Drug Administration\u27s Adverse Effect Reporting System (AERS) database for cases involving drug-pairs used in this study. The results returned from the query listed incidents reported for a number of patients, some of whom had experienced severe adverse reactions leading to outcomes such as prolonged hospitalization, diminished medicinal effect of one or more drugs, and in some cases, death

    Relatório de Estágio - Solução de BI Roaming Data Science (RoaDS) em ambiente Vodafone

    Get PDF
    A telecom company (Vodafone), had the need to implement a Business Intelligence solution for Roaming data across a wide set of different data sources. Based on the data visualization of this solution, its key users with decision power, can make a business analysis and needs of infrastructure and software expansion. This document aims to expose the scientific papers produced with the various stages of production of the solution (state of the art, architecture design and implementation results), this Business Intelligence solution was designed and implemented with OLAP methodologies and technologies in a Data Warehouse composed of Data Marts arranged in constellation, the visualization layer was custom made in JavaScript (VueJS). As a base for the results a questionnaire was created to be filled in by the key users of the solution. Based on this questionnaire it was possible to ascertain that user acceptance was satisfactory. The proposed objectives for the implementation of the BI solution with all the requirements was achieved with the infrastructure itself created from scratch in Kubernetes. This BI platform can be expanded using column storage databases created specifically with OLAP workloads in mind, removing the need for an OLAP cube layer. Based on Machine Learning algorithms, the platform will be able to perform the predictions needed to make decisions about Vodafone's Roaming infrastructure

    Composição de serviços para aplicações biomédicas

    Get PDF
    Doutoramento em Engenharia InformáticaA exigente inovação na área das aplicações biomédicas tem guiado a evolução das tecnologias de informação nas últimas décadas. Os desafios associados a uma gestão, integração, análise e interpretação eficientes dos dados provenientes das mais modernas tecnologias de hardware e software requerem um esforço concertado. Desde hardware para sequenciação de genes a registos electrónicos de paciente, passando por pesquisa de fármacos, a possibilidade de explorar com precisão os dados destes ambientes é vital para a compreensão da saúde humana. Esta tese engloba a discussão e o desenvolvimento de melhores estratégias informáticas para ultrapassar estes desafios, principalmente no contexto da composição de serviços, incluindo técnicas flexíveis de integração de dados, como warehousing ou federação, e técnicas avançadas de interoperabilidade, como serviços web ou LinkedData. A composição de serviços é apresentada como um ideal genérico, direcionado para a integração de dados e para a interoperabilidade de software. Relativamente a esta última, esta investigação debruçou-se sobre o campo da farmacovigilância, no contexto do projeto Europeu EU-ADR. As contribuições para este projeto, um novo standard de interoperabilidade e um motor de execução de workflows, sustentam a sucesso da EU-ADR Web Platform, uma plataforma para realizar estudos avançados de farmacovigilância. No contexto do projeto Europeu GEN2PHEN, esta investigação visou ultrapassar os desafios associados à integração de dados distribuídos e heterogéneos no campo do varíoma humano. Foi criada uma nova solução, WAVe - Web Analyses of the Variome, que fornece uma coleção rica de dados de variação genética através de uma interface Web inovadora e de uma API avançada. O desenvolvimento destas estratégias evidenciou duas oportunidades claras na área de software biomédico: melhorar o processo de implementação de software através do recurso a técnicas de desenvolvimento rápidas e aperfeiçoar a qualidade e disponibilidade dos dados através da adopção do paradigma de web semântica. A plataforma COEUS atravessa as fronteiras de integração e interoperabilidade, fornecendo metodologias para a aquisição e tradução flexíveis de dados, bem como uma camada de serviços interoperáveis para explorar semanticamente os dados agregados. Combinando as técnicas de desenvolvimento rápidas com a riqueza da perspectiva "Semantic Web in a box", a plataforma COEUS é uma aproximação pioneira, permitindo o desenvolvimento da próxima geração de aplicações biomédicas.The demand for innovation in the biomedical software domain has been an information technologies evolution driver over the last decades. The challenges associated with the effective management, integration, analyses and interpretation of the wealth of life sciences information stemming from modern hardware and software technologies require concerted efforts. From gene sequencing hardware to pharmacology research up to patient electronic health records, the ability to accurately explore data from these environments is vital to further improve our understanding of human health. This thesis encloses the discussion on building better informatics strategies to address these challenges, primarily in the context of service composition, including warehousing and federation strategies for resource integration, as well as web services or LinkedData for software interoperability. Service composition is introduced as a general principle, geared towards data integration and software interoperability. Concerning the latter, this research covers the service composition requirements within the pharmacovigilance field, namely on the European EU-ADR project. The contributions to this area, the definition of a new interoperability standard and the creation of a new workflow-wrapping engine, are behind the successful construction of the EUADR Web Platform, a workspace for delivering advanced pharmacovigilance studies. In the context of the European GEN2PHEN project, this research tackles the challenges associated with the integration of heterogeneous and distributed data in the human variome field. For this matter, a new lightweight solution was created: WAVe, Web Analysis of the Variome, provides a rich collection of genetic variation data through an innovative portal and an advanced API. The development of the strategies underlying these products highlighted clear opportunities in the biomedical software field: enhancing the software implementation process with rapid application development approaches and improving the quality and availability of data with the adoption of the Semantic Web paradigm. COEUS crosses the boundaries of integration and interoperability as it provides a framework for the flexible acquisition and translation of data into a semantic knowledge base, as well as a comprehensive set of interoperability services, from REST to LinkedData, to fully exploit gathered data semantically. By combining the lightness of rapid application development strategies with the richness of its "Semantic Web in a box" approach, COEUS is a pioneering framework to enhance the development of the next generation of biomedical applications
    • …
    corecore