38 research outputs found

    DockerBIO: web application for efficient use of bioinformatics Docker images

    Get PDF
    Background and Objective Docker is a light containerization program that shows almost the same performance as a local environment. Recently, many bioinformatics tools have been distributed as Docker images that include complex settings such as libraries, configurations, and data if needed, as well as the actual tools. Users can simply download and run them without making the effort to compile and configure them, and can obtain reproducible results. In spite of these advantages, several problems remain. First, there is a lack of clear standards for distribution of Docker images, and the Docker Hub often provides multiple images with the same objective but different uses. For these reasons, it can be difficult for users to learn how to select and use them. Second, Docker images are often not suitable as a component of a pipeline, because many of them include big data. Moreover, a group of users can have difficulties when sharing a pipeline composed of Docker images. Users of a group may modify scripts or use different versions of the data, which causes inconsistent results. Methods and Results To handle the problems described above, we developed a Java web application, DockerBIO, which provides reliable, verified, light-weight Docker images for various bioinformatics tools and for various kinds of reference data. With DockerBIO, users can easily build a pipeline with tools and data registered at DockerBIO, and if necessary, users can easily register new tools or data. Built pipelines are registered in DockerBIO, which provides an efficient running environment for the pipelines registered at DockerBIO. This enables user groups to run their pipelines without expending much effort to copy and modify them

    Kelimpahan Dan Keanekaragaman Plankton Di Perairan Laguna Desa Tolongano Kecamatan Banawa Selatan

    Full text link
    Penelitian bertujuan untuk mengetahui kelimpahan dan keanekaragaman plankton yang ada di Perairan Laguna, Desa Tolongano, Kecamatan Banawa Selatan. Penelitian dilaksanakan pada bulan Juni – Juli 2009. Pengambilan sampel plankton bertempat di Perairan Laguna, Desa Tolongano, Kecamatan Banawa Selatan, Kabupaten Donggala. Identifikasi sampel dilakukan di Laboratorium Budidaya Perairan, Fakultas Pertanian, Universitas Tadulako. Metode penelitian yang digunakan adalah purpossive sampling method (penempatan titik sampel dengan sengaja). Stasiun pengambilan sampel terdiri atas 5 stasiun, dilakukan sebanyak 3 kali yaitu pada pukul 07.00, 12.00, dan 17.00 WITA. Hasil penelitian menunjukkan, bahwa kelimpahan fitoplankton dari kelas Bacillariophyceae berkisar antara 8.925 – 16.135 ind/l dan kelimpahan zooplankton dari kelas Crustacea berkisar antara 35 – 70 ind/l, indeks keanekaragaman fitoplankton dari kelas Bacillariophyceae berkisar antara 2,010 – 2,504 dan indeks keanekaragaman zooplankton dari kelas Crustacea berkisar antara 0 – 0,6931, indeks dominansi dari kelas Bacillariophyceae berkisar antara 1,1995 – 1,2326 menunjukkan ada jenis plankton yang mendominasi, yaitu Nitzchia sp

    A Multi-Sample Based Method for Identifying Common CNVs in Normal Human Genomic Structure Using High-Resolution aCGH Data

    Get PDF
    BACKGROUND: It is difficult to identify copy number variations (CNV) in normal human genomic data due to noise and non-linear relationships between different genomic regions and signal intensity. A high-resolution array comparative genomic hybridization (aCGH) containing 42 million probes, which is very large compared to previous arrays, was recently published. Most existing CNV detection algorithms do not work well because of noise associated with the large amount of input data and because most of the current methods were not designed to analyze normal human samples. Normal human genome analysis often requires a joint approach across multiple samples. However, the majority of existing methods can only identify CNVs from a single sample. METHODOLOGY AND PRINCIPAL FINDINGS: We developed a multi-sample-based genomic variations detector (MGVD) that uses segmentation to identify common breakpoints across multiple samples and a k-means-based clustering strategy. Unlike previous methods, MGVD simultaneously considers multiple samples with different genomic intensities and identifies CNVs and CNV zones (CNVZs); CNVZ is a more precise measure of the location of a genomic variant than the CNV region (CNVR). CONCLUSIONS AND SIGNIFICANCE: We designed a specialized algorithm to detect common CNVs from extremely high-resolution multi-sample aCGH data. MGVD showed high sensitivity and a low false discovery rate for a simulated data set, and outperformed most current methods when real, high-resolution HapMap datasets were analyzed. MGVD also had the fastest runtime compared to the other algorithms evaluated when actual, high-resolution aCGH data were analyzed. The CNVZs identified by MGVD can be used in association studies for revealing relationships between phenotypes and genomic aberrations. Our algorithm was developed with standard C++ and is available in Linux and MS Windows format in the STL library. It is freely available at: http://embio.yonsei.ac.kr/~Park/mgvd.php

    RASER: reads aligner for SNPs and editing sites of RNA.

    No full text

    RASER: reads aligner for SNPs and editing sites of RNA

    No full text
    Motivation: Accurate identification of genetic variants such as single-nucleotide polymorphisms (SNPs) or RNA editing sites from RNA-Seq reads is important, yet challenging, because it necessitates a very low false-positive rate in read mapping. Although many read aligners are available, no single aligner was specifically developed or tested as an effective tool for SNP and RNA editing prediction. Results: We present RASER, an accurate read aligner with novel mapping schemes and index tree structure that aims to reduce false-positive mappings due to existence of highly similar regions. We demonstrate that RASER shows the best mapping accuracy compared with other popular algorithms and highest sensitivity in identifying multiply mapped reads. As a result, RASER displays superb efficacy in unbiased mapping of the alternative alleles of SNPs and in identification of RNA editing sites. Availability and implementation: RASER is written in C++ and freely available for download at https://github.com/jaegyoonahn/RASER. Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics online

    An Improved Method for Prediction of Cancer Prognosis by Network Learning

    No full text
    Accurate identification of prognostic biomarkers is an important yet challenging goal in bioinformatics. Many bioinformatics approaches have been proposed for this purpose, but there is still room for improvement. In this paper, we propose a novel machine learning-based method for more accurate identification of prognostic biomarker genes and use them for prediction of cancer prognosis. The proposed method specifies the candidate prognostic gene module by graph learning using the generative adversarial networks (GANs) model, and scores genes using a PageRank algorithm. We applied the proposed method to multiple-omics data that included copy number, gene expression, DNA methylation, and somatic mutation data for five cancer types. The proposed method showed better prediction accuracy than did existing methods. We identified many prognostic genes and their roles in their biological pathways. We also showed that the genes identified from different omics data were complementary, which led to improved accuracy in prediction using multi-omics data

    Accurate Prediction of Cancer Prognosis by Exploiting Patient-Specific Cancer Driver Genes

    No full text
    Accurate prediction of the prognoses of cancer patients and identification of prognostic biomarkers are both important for the improved treatment of cancer patients, in addition to enhanced anticancer drugs. Many previous bioinformatic studies have been carried out to achieve this goal; however, there remains room for improvement in terms of accuracy. In this study, we demonstrated that patient-specific cancer driver genes could be used to predict cancer prognoses more accurately. To identify patient-specific cancer driver genes, we first generated patient-specific gene networks before using modified PageRank to generate feature vectors that represented the impacts genes had on the patient-specific gene network. Subsequently, the feature vectors of the good and poor prognosis groups were used to train the deep feedforward network. For the 11 cancer types in the TCGA data, the proposed method showed a significantly better prediction performance than the existing state-of-the-art methods for three cancer types (BRCA, CESC and PAAD), better performance for five cancer types (COAD, ESCA, HNSC, KIRC and STAD), and a similar or slightly worse performance for the remaining three cancer types (BLCA, LIHC and LUAD). Furthermore, the case study for the identified breast cancer and cervical squamous cell carcinoma prognostic genes and their subnetworks included several pathways associated with the progression of breast cancer and cervical squamous cell carcinoma. These results suggested that heterogeneous cancer driver information may be associated with cancer prognosis

    System overview.

    No full text
    <p>(a) “Adjacency-Based Inference” measures the drug-drug (disease-disease) adjacency among known drug-disease associations, and infers new drug-disease association. “Module-Distance-Based Inference” derives drug-drug (disease-disease) gene module among known drug-disease associations, measures the distance between the gene module and disease (drug), and infers new drug-disease association. (b) Drug-disease relationship represented by score becomes features. Various machine learning based classifiers are built with those features, and predict unknown drug-disease relationship.</p
    corecore