3,400 research outputs found
Comparison of Several Methods of Chromatographic Baseline Removal with a New Approach Based on Quantile Regression
The article is intended to introduce and discuss a new quantile regression method for baseline detrending of chromatographic signals. It is compared with current methods based on polynomial fitting, spline fitting, LOESS, and Whittaker smoother, each with thresholding and reweighting approach. For curve flexibility selection in existing algorithms, a new method based on skewness of the residuals is successfully applied. The computational efficiency of all approaches is also discussed. The newly introduced methods could be preferred to visible better performance and short computational time. The other algorithms behave in comparable way, and polynomial regression can be here preferred due to short computational time
HiTRACE: High-throughput robust analysis for capillary electrophoresis
Motivation: Capillary electrophoresis (CE) of nucleic acids is a workhorse
technology underlying high-throughput genome analysis and large-scale chemical
mapping for nucleic acid structural inference. Despite the wide availability of
CE-based instruments, there remain challenges in leveraging their full power
for quantitative analysis of RNA and DNA structure, thermodynamics, and
kinetics. In particular, the slow rate and poor automation of available
analysis tools have bottlenecked a new generation of studies involving hundreds
of CE profiles per experiment.
Results: We propose a computational method called high-throughput robust
analysis for capillary electrophoresis (HiTRACE) to automate the key tasks in
large-scale nucleic acid CE analysis, including the profile alignment that has
heretofore been a rate-limiting step in the highest throughput experiments. We
illustrate the application of HiTRACE on thirteen data sets representing 4
different RNAs, three chemical modification strategies, and up to 480 single
mutant variants; the largest data sets each include 87,360 bands. By applying a
series of robust dynamic programming algorithms, HiTRACE outperforms prior
tools in terms of alignment and fitting quality, as assessed by measures
including the correlation between quantified band intensities between replicate
data sets. Furthermore, while the smallest of these data sets required 7 to 10
hours of manual intervention using prior approaches, HiTRACE quantitation of
even the largest data sets herein was achieved in 3 to 12 minutes. The HiTRACE
method therefore resolves a critical barrier to the efficient and accurate
analysis of nucleic acid structure in experiments involving tens of thousands
of electrophoretic bands.Comment: Revised to include Supplement. Availability: HiTRACE is freely
available for download at http://hitrace.stanford.ed
Automated mass spectrometry-based metabolomics data processing by blind source separation methods
Una de les principals limitacions de la metabolòmica és la transformació de dades crues en informació biològica. A
més, la metabolòmica basada en espectrometria de masses genera grans quantitats de dades complexes
caracteritzades per la co-elució de compostos i artefactes experimentals. L'objectiu d'aquesta tesi és desenvolupar
estratègies automatitzades basades en deconvolució cega del senyal per millorar les capacitats dels mètodes
existents que tracten les limitacions de les diferents passes del processament de dades en metabolòmica. L'objectiu
d'aquesta tesi és també desenvolupar eines capaces d'executar el flux de treball del processament de dades en
metabolòmica, que inclou el preprocessament de dades, deconvolució espectral, alineament i identificació. Com a
resultat, tres nous mètodes automàtics per deconvolució espectral basats en deconvolució cega del senyal van ser
desenvolupats. Aquests mètodes van ser inclosos en dues eines computacionals que permeten convertir
automàticament dades crues en informació biològica interpretable i per tant, permeten resoldre hipòtesis biològiques i
adquirir nous coneixements biològics.Una de les principals limitacions de la metabolòmica és la transformació de dades crues en informació biològica. A
més, la metabolòmica basada en espectrometria de masses genera grans quantitats de dades complexes
caracteritzades per la co-elució de compostos i artefactes experimentals. L'objectiu d'aquesta tesi és desenvolupar
estratègies automatitzades basades en deconvolució cega del senyal per millorar les capacitats dels mètodes
existents que tracten les limitacions de les diferents passes del processament de dades en metabolòmica. L'objectiu
d'aquesta tesi és també desenvolupar eines capaces d'executar el flux de treball del processament de dades en
metabolòmica, que inclou el preprocessament de dades, deconvolució espectral, alineament i identificació. Com a
resultat, tres nous mètodes automàtics per deconvolució espectral basats en deconvolució cega del senyal van ser
desenvolupats. Aquests mètodes van ser inclosos en dues eines computacionals que permeten convertir
automàticament dades crues en informació biològica interpretable i per tant, permeten resoldre hipòtesis biològiques i
adquirir nous coneixements biològics.Una de las principales limitaciones de la metabolómica es la transformación de datos crudos en información biológica.
Además, la metabolómica basada en espectrometría de masas genera grandes cantidades de datos complejos
caracterizados por la co-elución de compuestos y artefactos experimentales. El objetivo de esta tesis es desarrollar
estrategias automatizadas basadas en deconvolución ciega de la señal para mejorar las capacidades de los métodos
existentes que tratan las limitaciones de los diferentes pasos del procesamiento de datos en metabolómica. El
objetivo de esta tesis es también desarrollar herramientas capaces de ejecutar el flujo de trabajo del procesamiento
de datos en metabolómica, que incluye el preprocessamiento de datos, deconvolución espectral, alineamiento e
identificación. Como resultado, tres nuevos métodos automáticos para deconvolución espectral basados en
deconvolución ciega de la señal fueron desarrollados. Estos métodos fueron incluidos en dos herramientas
computacionales que permiten convertir automáticamente datos crudos en información biológica interpretable y por lo
tanto, permiten resolver hipótesis biológicas y adquirir nuevos conocimientos biológicos.One of the major bottlenecks in metabolomics is to convert raw data samples into biological interpretable information.
Moreover, mass spectrometry-based metabolomics generates large and complex datasets characterized by co-eluting
compounds and with experimental artifacts. This thesis main objective is to develop automated strategies based on
blind source separation to improve the capabilities of the current methods that tackle the different metabolomics data
processing workflow steps limitations. Also, the objective of this thesis is to develop tools capable of performing the
entire metabolomics workflow for GC--MS, including pre-processing, spectral deconvolution, alignment and
identification. As a result, three new automated methods for spectral deconvolution based on blind source separation
were developed. These methods were embedded into two computation tools able to automatedly convert raw data into
biological interpretable information and thus, allow resolving biological answers and discovering new biological
insights
Peaks detection and alignment for mass spectrometry data
The goal of this paper is to review existing methods for protein mass spectrometry data analysis, and to present a new methodology for automatic extraction of significant peaks (biomarkers). For the pre-processing step required for data from MALDI-TOF or SELDI- TOF spectra, we use a purely nonparametric approach that combines stationary invariant wavelet transform for noise removal and penalized spline quantile regression for baseline correction. We further present a multi-scale spectra alignment technique that is based on identification of statistically significant peaks from a set of spectra. This method allows one to find common peaks in a set of spectra that can subsequently be mapped to individual proteins. This may serve as useful biomarkers in medical applications, or as individual features for further multidimensional statistical analysis. MALDI-TOF spectra obtained from serum samples are used throughout the paper to illustrate the methodology
- …