Search CORE

12,361 research outputs found

Multiplierz: An Extensible API Based Desktop Environment for Proteomics Data Analysis

Author: Askenazi Manor
Blank Nathaniel C.
Cashorali Tanya
Ficarro Scott B.
Marto Jarrod A.
Parikh Jignesh R.
Webber James T.
Zhang Yi
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

BACKGROUND. Efficient analysis of results from mass spectrometry-based proteomics experiments requires access to disparate data types, including native mass spectrometry files, output from algorithms that assign peptide sequence to MS/MS spectra, and annotation for proteins and pathways from various database sources. Moreover, proteomics technologies and experimental methods are not yet standardized; hence a high degree of flexibility is necessary for efficient support of high- and low-throughput data analytic tasks. Development of a desktop environment that is sufficiently robust for deployment in data analytic pipelines, and simultaneously supports customization for programmers and non-programmers alike, has proven to be a significant challenge. RESULTS. We describe multiplierz, a flexible and open-source desktop environment for comprehensive proteomics data analysis. We use this framework to expose a prototype version of our recently proposed common API (mzAPI) designed for direct access to proprietary mass spectrometry files. In addition to routine data analytic tasks, multiplierz supports generation of information rich, portable spreadsheet-based reports. Moreover, multiplierz is designed around a "zero infrastructure" philosophy, meaning that it can be deployed by end users with little or no system administration support. Finally, access to multiplierz functionality is provided via high-level Python scripts, resulting in a fully extensible data analytic environment for rapid development of custom algorithms and deployment of high-throughput data pipelines. CONCLUSION. Collectively, mzAPI and multiplierz facilitate a wide range of data analysis tasks, spanning technology development to biological annotation, for mass spectrometry-based proteomics research.Dana-Farber Cancer Institute; National Human Genome Research Institute (P50HG004233); National Science Foundation Integrative Graduate Education and Research Traineeship grant (DGE-0654108

Crossref

Boston University Institutional Repository (OpenBU)

Springer - Publisher Connector

PubMed Central

Contextual Information Retrieval based on Algorithmic Information Theory and Statistical Outlier Detection

Author: Camacho David
Cebrian Manuel
Martinez Rafael
Rodriguez Francisco de Borja
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 27/11/2007
Field of study

The main contribution of this paper is to design an Information Retrieval (IR) technique based on Algorithmic Information Theory (using the Normalized Compression Distance- NCD), statistical techniques (outliers), and novel organization of data base structure. The paper shows how they can be integrated to retrieve information from generic databases using long (text-based) queries. Two important problems are analyzed in the paper. On the one hand, how to detect "false positives" when the distance among the documents is very low and there is actual similarity. On the other hand, we propose a way to structure a document database which similarities distance estimation depends on the length of the selected text. Finally, the experimental evaluations that have been carried out to study previous problems are shown.Comment: Submitted to 2008 IEEE Information Theory Workshop (6 pages, 6 figures

arXiv.org e-Print Archive

Crossref

Error Level Analysis Technique for Identifying JPEG Block Unique Signature for Digital Forensic Analysis

Author: Azhan Nor Amira Nor
Ikuesan Richard Adeyemi
Kebande Victor R.
Razak Shukor Abd
Publication venue: ZU Scholars
Publication date: 01/05/2022
Field of study

The popularity of unique image compression features of image files opens an interesting research analysis process, given that several digital forensics cases are related to diverse file types. Of interest has been fragmented file carving and recovery which forms a major aspect of digital forensics research on JPEG files. Whilst there exist several challenges, this paper focuses on the challenge of determining the co-existence of JPEG fragments within various file fragment types. Existing works have exhibited a high false-positive rate, therefore rendering the need for manual validation. This study develops a technique that can identify the unique signature of JPEG 8 × 8 blocks using the Error Level Analysis technique, implemented in MATLAB. The experimental result that was conducted with 21 images of JFIF format with 1008 blocks shows the efficacy of the proposed technique. Specifically, the initial results from the experiment show that JPEG 8 × 8 blocks have unique characteristics which can be leveraged for digital forensics. An investigator could, therefore, search for the unique characteristics to identify a JPEG fragment during a digital investigation process

ZU Scholars (Zayed University)

SOAP3-dp: Fast, Accurate and Sensitive GPU-based Short Read Aligner

Author: Chang Yu
Chi-Man Liu
David W Cheung
Edward Wu
Haoxiang Lin
Hing-Fung Ting
Jianqiao Zhu
Lap-Kei Lee
Ruibang Luo
Ruiqiang Li
Shaoliang Peng
Siu-Ming Yiu
Tak-Wah Lam
Thomas Wong
Wenjuan Zhu
Xiaoqian Zhu
Yingrui Li
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2013
Field of study

To tackle the exponentially increasing throughput of Next-Generation Sequencing (NGS), most of the existing short-read aligners can be configured to favor speed in trade of accuracy and sensitivity. SOAP3-dp, through leveraging the computational power of both CPU and GPU with optimized algorithms, delivers high speed and sensitivity simultaneously. Compared with widely adopted aligners including BWA, Bowtie2, SeqAlto, GEM and GPU-based aligners including BarraCUDA and CUSHAW, SOAP3-dp is two to tens of times faster, while maintaining the highest sensitivity and lowest false discovery rate (FDR) on Illumina reads with different lengths. Transcending its predecessor SOAP3, which does not allow gapped alignment, SOAP3-dp by default tolerates alignment similarity as low as 60 percent. Real data evaluation using human genome demonstrates SOAP3-dp's power to enable more authentic variants and longer Indels to be discovered. Fosmid sequencing shows a 9.1 percent FDR on newly discovered deletions. SOAP3-dp natively supports BAM file format and provides a scoring scheme same as BWA, which enables it to be integrated into existing analysis pipelines. SOAP3-dp has been deployed on Amazon-EC2, NIH-Biowulf and Tianhe-1A.Comment: 21 pages, 6 figures, submitted to PLoS ONE, additional files available at "https://www.dropbox.com/sh/bhclhxpoiubh371/O5CO_CkXQE". Comments most welcom

arXiv.org e-Print Archive

Directory of Open Access Journals

PubMed Central

HKU Scholars Hub

FigShare

A standards-based solution for the accurate transfer of digital assets

Author: Bekaert Jeroen
Van De Sompel Herbert
Publication venue
Publication date: 01/01/2005
Field of study

Ghent University Academic Bibliography