18 research outputs found

    A consensus‑based ensemble approach to improve transcriptome assembly

    Get PDF
    Background: Systems-level analyses, such as differential gene expression analysis, co-expression analysis, and metabolic pathway reconstruction, depend on the accuracy of the transcriptome. Multiple tools exist to perform transcriptome assembly from RNAseq data. However, assembling high quality transcriptomes is still not a trivial problem. This is especially the case for non-model organisms where adequate reference genomes are often not available. Different methods produce different transcriptome models and there is no easy way to determine which are more accurate. Furthermore, having alternative-splicing events exacerbates such difficult assembly problems. While benchmarking transcriptome assemblies is critical, this is also not trivial due to the general lack of true reference transcriptomes. Results: In this study, we first provide a pipeline to generate a set of the simulated benchmark transcriptome and corresponding RNAseq data. Using the simulated benchmarking datasets, we compared the performance of various transcriptome assembly approaches including both de novo and genome-guided methods. The results showed that the assembly performance deteriorates significantly when alternative transcripts (isoforms) exist or for genome-guided methods when the reference is not available from the same genome. To improve the transcriptome assembly performance, leveraging the overlapping predictions between different assemblies, we present a new consensus-based ensemble transcriptome assembly approach, ConSemble. Conclusions: Without using a reference genome, ConSemble using four de novo assemblers achieved an accuracy up to twice as high as any de novo assemblers we compared. When a reference genome is available, ConSemble using four genomeguided assemblies removed many incorrectly assembled contigs with minimal impact on correctly assembled contigs, achieving higher precision and accuracy than individual genome-guided methods. Furthermore, ConSemble using de novo assemblers matched or exceeded the best performing genome-guided assemblers even when the transcriptomes included isoforms. We thus demonstrated that the ConSemble consensus strategy both for de novo and genome-guided assemblers can improve transcriptome assembly. The RNAseq simulation pipeline, the benchmark transcriptome datasets, and the script to perform the ConSemble assembly are all freely available from: http:// bioin folab. unl. edu/ emlab/ conse mble/

    M-GCAT: interactively and efficiently constructing large-scale multiple genome comparison frameworks in closely related species

    Get PDF
    BACKGROUND: Due to recent advances in whole genome shotgun sequencing and assembly technologies, the financial cost of decoding an organism's DNA has been drastically reduced, resulting in a recent explosion of genomic sequencing projects. This increase in related genomic data will allow for in depth studies of evolution in closely related species through multiple whole genome comparisons. RESULTS: To facilitate such comparisons, we present an interactive multiple genome comparison and alignment tool, M-GCAT, that can efficiently construct multiple genome comparison frameworks in closely related species. M-GCAT is able to compare and identify highly conserved regions in up to 20 closely related bacterial species in minutes on a standard computer, and as many as 90 (containing 75 cloned genomes from a set of 15 published enterobacterial genomes) in an hour. M-GCAT also incorporates a novel comparative genomics data visualization interface allowing the user to globally and locally examine and inspect the conserved regions and gene annotations. CONCLUSION: M-GCAT is an interactive comparative genomics tool well suited for quickly generating multiple genome comparisons frameworks and alignments among closely related species. M-GCAT is freely available for download for academic and non-commercial use at:

    Self-protection against business logic vulnerabilities

    No full text

    Cluster-Based Adaptive Information Retrieval

    No full text
    This paper discusses the issues involved in the design of a complete information retrieval system based on useroriented clustering schemes. Clusters are constructed taking into account the users' perception of similarity between documents. The system accumulates feedback from the users and employs it to construct useroriented clusters. An optimization function to improve the effectiveness of the clustering process is developed. A retrieval process based on the clustering scheme is described. The system developed is experimentally validated and compared with existing systems. 1 Introduction An information retrieval (ir) system is characterized by a collection of documents and a set of users who perform queries on the collection to fulfill their information needs. To improve the efficiency of retrieval, it has been proposed that the documents which are generally retrieved together in response to some query, should be kept close together within the system in the form of clusters [28, 30]..

    How do Companies Strategize Today?

    No full text

    Feature selection with adjustable criteria

    No full text
    Abstract. We present a study on a rough set based approach for feature selection. Instead of using significance or support, Parameterized Average Support Heuristic (PASH) considers the overall quality of the potential set of rules. It will produce a set of rules with balanced support distribution over all decision classes. Adjustable parameters of PASH can help users with different levels of approximation needs to extract predictive rules that may be ignored by other methods. This paper finetunes the PASH heuristic and provides experimental results to PASH.

    On-line Algorithms for a Single Machine Scheduling Problem

    No full text
    An increasingly significant branch of computer science is the study of online algorithms. In this paper, we apply the theory of on-line algorithms to job scheduling. In particular, we study the nonpreemptive single machine scheduling of independent jobs with arbitrary release dates to minimize the total completion time. We design and analyze two on-line algorithms which make scheduling decisions without knowing about jobs that will arrive in future. Keywords: job scheduling, on-line algorithm, c-competitiveness 1 Introduction Given a sequence of requests, an on-line algorithm is one that responds to each request in the order it appears in the sequence without the knowledge of any request following it in the sequence. For instance, in the bin packing problem, a list L = (a 1 ; a 2 ; : : : ; a n ) of reals in (0; 1] needs to be packed into the minimum number of unit-capacity bins. An on-line bin packing algorithm packs a i , where i starts from 1, without knowing about a i+1 ; : : : ; ..
    corecore