30,357 research outputs found

    DUDE-Seq: Fast, Flexible, and Robust Denoising for Targeted Amplicon Sequencing

    Full text link
    We consider the correction of errors from nucleotide sequences produced by next-generation targeted amplicon sequencing. The next-generation sequencing (NGS) platforms can provide a great deal of sequencing data thanks to their high throughput, but the associated error rates often tend to be high. Denoising in high-throughput sequencing has thus become a crucial process for boosting the reliability of downstream analyses. Our methodology, named DUDE-Seq, is derived from a general setting of reconstructing finite-valued source data corrupted by a discrete memoryless channel and effectively corrects substitution and homopolymer indel errors, the two major types of sequencing errors in most high-throughput targeted amplicon sequencing platforms. Our experimental studies with real and simulated datasets suggest that the proposed DUDE-Seq not only outperforms existing alternatives in terms of error-correction capability and time efficiency, but also boosts the reliability of downstream analyses. Further, the flexibility of DUDE-Seq enables its robust application to different sequencing platforms and analysis pipelines by simple updates of the noise model. DUDE-Seq is available at http://data.snu.ac.kr/pub/dude-seq

    Machine Learning and Integrative Analysis of Biomedical Big Data.

    Get PDF
    Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues

    Online Deception Detection Refueled by Real World Data Collection

    Full text link
    The lack of large realistic datasets presents a bottleneck in online deception detection studies. In this paper, we apply a data collection method based on social network analysis to quickly identify high-quality deceptive and truthful online reviews from Amazon. The dataset contains more than 10,000 deceptive reviews and is diverse in product domains and reviewers. Using this dataset, we explore effective general features for online deception detection that perform well across domains. We demonstrate that with generalized features - advertising speak and writing complexity scores - deception detection performance can be further improved by adding additional deceptive reviews from assorted domains in training. Finally, reviewer level evaluation gives an interesting insight into different deceptive reviewers' writing styles.Comment: 10 pages, Accepted to Recent Advances in Natural Language Processing (RANLP) 201

    Maintenance Knowledge Management with Fusion of CMMS and CM

    Get PDF
    Abstract- Maintenance can be considered as an information, knowledge processing and management system. The management of knowledge resources in maintenance is a relatively new issue compared to Computerized Maintenance Management Systems (CMMS) and Condition Monitoring (CM) approaches and systems. Information Communication technologies (ICT) systems including CMMS, CM and enterprise administrative systems amongst others are effective in supplying data and in some cases information. In order to be effective the availability of high-quality knowledge, skills and expertise are needed for effective analysis and decision-making based on the supplied information and data. Information and data are not by themselves enough, knowledge, experience and skills are the key factors when maximizing the usability of the collected data and information. Thus, effective knowledge management (KM) is growing in importance, especially in advanced processes and management of advanced and expensive assets. Therefore efforts to successfully integrate maintenance knowledge management processes with accurate information from CMMSs and CM systems will be vital due to the increasing complexities of the overall systems. Low maintenance effectiveness costs money and resources since normal and stable production cannot be upheld and maintained over time, lowered maintenance effectiveness can have a substantial impact on the organizations ability to obtain stable flows of income and control costs in the overall process. Ineffective maintenance is often dependent on faulty decisions, mistakes due to lack of experience and lack of functional systems for effective information exchange [10]. Thus, access to knowledge, experience and skills resources in combination with functional collaboration structures can be regarded as vital components for a high maintenance effectiveness solution. Maintenance effectiveness depends in part on the quality, timeliness, accuracy and completeness of information related to machine degradation state, based on which decisions are made. Maintenance effectiveness, to a large extent, also depends on the quality of the knowledge of the managers and maintenance operators and the effectiveness of the internal & external collaborative environments. With emergence of intelligent sensors to measure and monitor the health state of the component and gradual implementation of ICT) in organizations, the conceptualization and implementation of E-Maintenance is turning into a reality. Unfortunately, even though knowledge management aspects are important in maintenance, the integration of KM aspects has still to find its place in E-Maintenance and in the overall information flows of larger-scale maintenance solutions. Nowadays, two main systems are implemented in most maintenance departments: Firstly, Computer Maintenance Management Systems (CMMS), the core of traditional maintenance record-keeping practices that often facilitate the usage of textual descriptions of faults and actions performed on an asset. Secondly, condition monitoring systems (CMS). Recently developed (CMS) are capable of directly monitoring asset components parameters; however, attempts to link observed CMMS events to CM sensor measurements have been limited in their approach and scalability. In this article we present one approach for addressing this challenge. We argue that understanding the requirements and constraints in conjunction - from maintenance, knowledge management and ICT perspectives - is necessary. We identify the issues that need be addressed for achieving successful integration of such disparate data types and processes (also integrating knowledge management into the “data types” and processes)

    The value of remote sensing techniques in supporting effective extrapolation across multiple marine spatial scales

    Get PDF
    The reporting of ecological phenomena and environmental status routinely required point observations, collected with traditional sampling approaches to be extrapolated to larger reporting scales. This process encompasses difficulties that can quickly entrain significant errors. Remote sensing techniques offer insights and exceptional spatial coverage for observing the marine environment. This review provides guidance on (i) the structures and discontinuities inherent within the extrapolative process, (ii) how to extrapolate effectively across multiple spatial scales, and (iii) remote sensing techniques and data sets that can facilitate this process. This evaluation illustrates that remote sensing techniques are a critical component in extrapolation and likely to underpin the production of high-quality assessments of ecological phenomena and the regional reporting of environmental status. Ultimately, is it hoped that this guidance will aid the production of robust and consistent extrapolations that also make full use of the techniques and data sets that expedite this process
    corecore