269,360 research outputs found

    Toward a Standardized Strategy of Clinical Metabolomics for the Advancement of Precision Medicine

    Get PDF
    Despite the tremendous success, pitfalls have been observed in every step of a clinical metabolomics workflow, which impedes the internal validity of the study. Furthermore, the demand for logistics, instrumentations, and computational resources for metabolic phenotyping studies has far exceeded our expectations. In this conceptual review, we will cover inclusive barriers of a metabolomics-based clinical study and suggest potential solutions in the hope of enhancing study robustness, usability, and transferability. The importance of quality assurance and quality control procedures is discussed, followed by a practical rule containing five phases, including two additional "pre-pre-" and "post-post-" analytical steps. Besides, we will elucidate the potential involvement of machine learning and demonstrate that the need for automated data mining algorithms to improve the quality of future research is undeniable. Consequently, we propose a comprehensive metabolomics framework, along with an appropriate checklist refined from current guidelines and our previously published assessment, in the attempt to accurately translate achievements in metabolomics into clinical and epidemiological research. Furthermore, the integration of multifaceted multi-omics approaches with metabolomics as the pillar member is in urgent need. When combining with other social or nutritional factors, we can gather complete omics profiles for a particular disease. Our discussion reflects the current obstacles and potential solutions toward the progressing trend of utilizing metabolomics in clinical research to create the next-generation healthcare system.11Ysciescopu

    WATER PURIFICATION: A BRIEF REVIEW ON TOOLS AND TECHNIQUES USED IN ANALYSIS, MONITORING AND ASSESSMENT OF WATER QUALITY

    Get PDF
    Drinking water sources are regularly polluted by various human activities that cause severe health problem all over the world. In recent years, water quality research has drawn great attention from scientific communities. A lot number of tools and techniques are used for proper water quality analysis, monitoring and assessment. This paper includes brief information about some of the them namely, physio-chemical water analysis (PCWA), adsorption, metal pollution index (MPI), water quality index (WQI), water quality modelling tools (WQMT) and multivariable statistical models that include five multivariate data mining approaches i.e. cluster analysis (CA), principal component analysis (PCA), factor analysis (FA), multiple linear regression analysis (MLRA), discriminant analysis (DA). Present paper also explores the interaction between science and technologies and provides basic knowledge of emerging tools and techniques used in water purification

    Overview of BioCreAtIvE: critical assessment of information extraction for biology

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The goal of the first BioCreAtIvE challenge (Critical Assessment of Information Extraction in Biology) was to provide a set of common evaluation tasks to assess the state of the art for text mining applied to biological problems. The results were presented in a workshop held in Granada, Spain March 28–31, 2004. The articles collected in this <it>BMC Bioinformatics </it>supplement entitled "A critical assessment of text mining methods in molecular biology" describe the BioCreAtIvE tasks, systems, results and their independent evaluation.</p> <p>Results</p> <p>BioCreAtIvE focused on two tasks. The first dealt with extraction of gene or protein names from text, and their mapping into standardized gene identifiers for three model organism databases (fly, mouse, yeast). The second task addressed issues of functional annotation, requiring systems to identify specific text passages that supported Gene Ontology annotations for specific proteins, given full text articles.</p> <p>Conclusion</p> <p>The first BioCreAtIvE assessment achieved a high level of international participation (27 groups from 10 countries). The assessment provided state-of-the-art performance results for a basic task (gene name finding and normalization), where the best systems achieved a balanced 80% precision / recall or better, which potentially makes them suitable for real applications in biology. The results for the advanced task (functional annotation from free text) were significantly lower, demonstrating the current limitations of text-mining approaches where knowledge extrapolation and interpretation are required. In addition, an important contribution of BioCreAtIvE has been the creation and release of training and test data sets for both tasks. There are 22 articles in this special issue, including six that provide analyses of results or data quality for the data sets, including a novel inter-annotator consistency assessment for the test set used in task 2.</p

    Rerunning OCR: A Machine Learning Approach to Quality Assessment and Enhancement Prediction

    Full text link
    Iterating with new and improved OCR solutions enforces decision making when it comes to targeting the right candidates for reprocessing. This especially applies when the underlying data collection is of considerable size and rather diverse in terms of fonts, languages, periods of publication and consequently OCR quality. This article captures the efforts of the National Library of Luxembourg to support those targeting decisions. They are crucial in order to guarantee low computational overhead and reduced quality degradation risks, combined with a more quantifiable OCR improvement. In particular, this work explains the methodology of the library with respect to text block level quality assessment. Through extension of this technique, a regression model, that is able to take into account the enhancement potential of a new OCR engine, is also presented. They both mark promising approaches, especially for cultural institutions dealing with historical data of lower quality.Comment: Journal of Data Mining and Digital Humanities; Major revisio

    An evaluation of airborne laser scan data for coalmine subsidence mapping

    Get PDF
    The accurate mapping of coalmine subsidence is necessary for the continued management of potential subsidence impacts. The use of airborne laser scan (ALS) data for subsidence mapping provides an alternative method to traditional ground-based approaches that affords increased accessibility and complete spatial coverage. This paper evaluates the suitability and potential of ALS data for subsidence mapping, primarily through the examination of two pre-mining surveys in a rugged, densely vegetated study site. Data quality, in terms of mean point spacing and coverage, is evaluated, along with the impact of interpolation methods, resolution, and terrain. It was assumed that minimal surface height changes occurred between the two pre-mining surfaces. Therefore any height changes between digital elevation models of the two ALS surveys were interpreted as errors associated with the use of ALS data for subsidence mapping. A mean absolute error of 0.23 m was observed, though this error may be exaggerated by the presence of a systematic 0.15 m offset between the two surveys. Very large (several metres) errors occur in areas of steep or dynamic terrain, such as along cliff lines and watercourses. Despite these errors, preliminary subsidence mapping, performed using a third, post-mining dataset, clearly demonstrates the potential benefits of ALS data for subsidence mapping, as well as some potential limitations and the need for further careful assessment and validation concerning data errors

    Transcriptome Analysis for Non-Model Organism: Current Status and Best-Practices

    Get PDF
    Since transcriptome analysis provides genome-wide sequence and gene expression information, transcript reconstruction using RNA-Seq sequence reads has become popular during recent years. For non-model organism, as distinct from the reference genome-based mapping, sequence reads are processed via de novo transcriptome assembly approaches to produce large numbers of contigs corresponding to coding or non-coding, but expressed, part of genome. In spite of immense potential of RNA-Seq–based methods, particularly in recovering full-length transcripts and spliced isoforms from short-reads, the accurate results can be only obtained by the procedures to be taken in a step-by-step manner. In this chapter, we aim to provide an overview of the state-of-the-art methods including (i) quality check and pre-processing of raw reads, (ii) the pros and cons of de novo transcriptome assemblers, (iii) generating non-redundant transcript data, (iv) current quality assessment tools for de novo transcriptome assemblies, (v) approaches for transcript abundance and differential expression estimations and finally (vi) further mining of transcriptomic data for particular biological questions. Our intention is to provide an overview and practical guidance for choosing the appropriate approaches to best meet the needs of researchers in this area and also outline the strategies to improve on-going projects

    Freshwater ecosystem services in mining regions : modelling options for policy development support

    Get PDF
    The ecosystem services (ES) approach offers an integrated perspective of social-ecological systems, suitable for holistic assessments of mining impacts. Yet for ES models to be policy-relevant, methodological consensus in mining contexts is needed. We review articles assessing ES in mining areas focusing on freshwater components and policy support potential. Twenty-six articles were analysed concerning (i) methodological complexity (data types, number of parameters, processes and ecosystem-human integration level) and (ii) potential applicability for policy development (communication of uncertainties, scenario simulation, stakeholder participation and management recommendations). Articles illustrate mining impacts on ES through valuation exercises mostly. However, the lack of ground-and surface-water measurements, as well as insufficient representation of the connectivity among soil, water and humans, leave room for improvements. Inclusion of mining-specific environmental stressors models, increasing resolution of topographies, determination of baseline ES patterns and inclusion of multi-stakeholder perspectives are advantageous for policy support. We argue that achieving more holistic assessments exhorts practitioners to aim for high social-ecological connectivity using mechanistic models where possible and using inductive methods only where necessary. Due to data constraints, cause-effect networks might be the most feasible and best solution. Thus, a policy-oriented framework is proposed, in which data science is directed to environmental modelling for analysis of mining impacts on water ES

    TLAD 2010 Proceedings:8th international workshop on teaching, learning and assesment of databases (TLAD)

    Get PDF
    This is the eighth in the series of highly successful international workshops on the Teaching, Learning and Assessment of Databases (TLAD 2010), which once again is held as a workshop of BNCOD 2010 - the 27th International Information Systems Conference. TLAD 2010 is held on the 28th June at the beautiful Dudhope Castle at the Abertay University, just before BNCOD, and hopes to be just as successful as its predecessors.The teaching of databases is central to all Computing Science, Software Engineering, Information Systems and Information Technology courses, and this year, the workshop aims to continue the tradition of bringing together both database teachers and researchers, in order to share good learning, teaching and assessment practice and experience, and further the growing community amongst database academics. As well as attracting academics from the UK community, the workshop has also been successful in attracting academics from the wider international community, through serving on the programme committee, and attending and presenting papers.This year, the workshop includes an invited talk given by Richard Cooper (of the University of Glasgow) who will present a discussion and some results from the Database Disciplinary Commons which was held in the UK over the academic year. Due to the healthy number of high quality submissions this year, the workshop will also present seven peer reviewed papers, and six refereed poster papers. Of the seven presented papers, three will be presented as full papers and four as short papers. These papers and posters cover a number of themes, including: approaches to teaching databases, e.g. group centered and problem based learning; use of novel case studies, e.g. forensics and XML data; techniques and approaches for improving teaching and student learning processes; assessment techniques, e.g. peer review; methods for improving students abilities to develop database queries and develop E-R diagrams; and e-learning platforms for supporting teaching and learning

    Predictive Modelling Approach to Data-Driven Computational Preventive Medicine

    Get PDF
    This thesis contributes novel predictive modelling approaches to data-driven computational preventive medicine and offers an alternative framework to statistical analysis in preventive medicine research. In the early parts of this research, this thesis presents research by proposing a synergy of machine learning methods for detecting patterns and developing inexpensive predictive models from healthcare data to classify the potential occurrence of adverse health events. In particular, the data-driven methodology is founded upon a heuristic-systematic assessment of several machine-learning methods, data preprocessing techniques, models’ training estimation and optimisation, and performance evaluation, yielding a novel computational data-driven framework, Octopus. Midway through this research, this thesis advances research in preventive medicine and data mining by proposing several new extensions in data preparation and preprocessing. It offers new recommendations for data quality assessment checks, a novel multimethod imputation (MMI) process for missing data mitigation, a novel imbalanced resampling approach, and minority pattern reconstruction (MPR) led by information theory. This thesis also extends the area of model performance evaluation with a novel classification performance ranking metric called XDistance. In particular, the experimental results show that building predictive models with the methods guided by our new framework (Octopus) yields domain experts' approval of the new reliable models’ performance. Also, performing the data quality checks and applying the MMI process led healthcare practitioners to outweigh predictive reliability over interpretability. The application of MPR and its hybrid resampling strategies led to better performances in line with experts' success criteria than the traditional imbalanced data resampling techniques. Finally, the use of the XDistance performance ranking metric was found to be more effective in ranking several classifiers' performances while offering an indication of class bias, unlike existing performance metrics The overall contributions of this thesis can be summarised as follow. First, several data mining techniques were thoroughly assessed to formulate the new Octopus framework to produce new reliable classifiers. In addition, we offer a further understanding of the impact of newly engineered features, the physical activity index (PAI) and biological effective dose (BED). Second, the newly developed methods within the new framework. Finally, the newly accepted developed predictive models help detect adverse health events, namely, visceral fat-associated diseases and advanced breast cancer radiotherapy toxicity side effects. These contributions could be used to guide future theories, experiments and healthcare interventions in preventive medicine and data mining
    corecore