33 research outputs found
mGrid: A load-balanced distributed computing environment for the remote execution of the user-defined Matlab code
BACKGROUND: Matlab, a powerful and productive language that allows for rapid prototyping, modeling and simulation, is widely used in computational biology. Modeling and simulation of large biological systems often require more computational resources then are available on a single computer. Existing distributed computing environments like the Distributed Computing Toolbox, MatlabMPI, Matlab*G and others allow for the remote (and possibly parallel) execution of Matlab commands with varying support for features like an easy-to-use application programming interface, load-balanced utilization of resources, extensibility over the wide area network, and minimal system administration skill requirements. However, all of these environments require some level of access to participating machines to manually distribute the user-defined libraries that the remote call may invoke. RESULTS: mGrid augments the usual process distribution seen in other similar distributed systems by adding facilities for user code distribution. mGrid's client-side interface is an easy-to-use native Matlab toolbox that transparently executes user-defined code on remote machines (i.e. the user is unaware that the code is executing somewhere else). Run-time variables are automatically packed and distributed with the user-defined code and automated load-balancing of remote resources enables smooth concurrent execution. mGrid is an open source environment. Apart from the programming language itself, all other components are also open source, freely available tools: light-weight PHP scripts and the Apache web server. CONCLUSION: Transparent, load-balanced distribution of user-defined Matlab toolboxes and rapid prototyping of many simple parallel applications can now be done with a single easy-to-use Matlab command. Because mGrid utilizes only Matlab, light-weight PHP scripts and the Apache web server, installation and configuration are very simple. Moreover, the web-based infrastructure of mGrid allows for it to be easily extensible over the Internet
Normalization and missing value imputation for label-free LC-MS analysis
Shotgun proteomic data are affected by a variety of known and unknown systematic biases as well as high proportions of missing values. Typically, normalization is performed in an attempt to remove systematic biases from the data before statistical inference, sometimes followed by missing value imputation to obtain a complete matrix of intensities. Here we discuss several approaches to normalization and dealing with missing values, some initially developed for microarray data and some developed specifically for mass spectrometry-based data
Liquid Chromatography Mass Spectrometry-Based Proteomics: Biological and Technological Aspects
Mass spectrometry-based proteomics has become the tool of choice for
identifying and quantifying the proteome of an organism. Though recent years
have seen a tremendous improvement in instrument performance and the
computational tools used, significant challenges remain, and there are many
opportunities for statisticians to make important contributions. In the most
widely used "bottom-up" approach to proteomics, complex mixtures of proteins
are first subjected to enzymatic cleavage, the resulting peptide products are
separated based on chemical or physical properties and analyzed using a mass
spectrometer. The two fundamental challenges in the analysis of bottom-up
MS-based proteomics are as follows: (1) Identifying the proteins that are
present in a sample, and (2) Quantifying the abundance levels of the identified
proteins. Both of these challenges require knowledge of the biological and
technological context that gives rise to observed data, as well as the
application of sound statistical principles for estimation and inference. We
present an overview of bottom-up proteomics and outline the key statistical
issues that arise in protein identification and quantification.Comment: Published in at http://dx.doi.org/10.1214/10-AOAS341 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Review of Machine Learning Algorithms in Differential Expression Analysis
In biological research machine learning algorithms are part of nearly every analytical process. They are used to identify new insights into biological phenomena, interpret data, provide molecular diagnosis for diseases and develop personalized medicine that will enable future treatments of diseases. In this paper we (1) illustrate the importance of machine learning in the analysis of large scale sequencing data, (2) present an illustrative standardized workflow of the analysis process, (3) perform a Differential Expression (DE) analysis of a publicly available RNA sequencing (RNA-Seq) data set to demonstrate the capabilities of various algorithms at each step of the workflow, and (4) show a machine learning solution in improving the computing time, storage requirements, and minimize utilization of computer memory in analyses of RNA-Seq datasets. The source code of the analysis pipeline and associated scripts are presented in the paper appendix to allow replication of experiments
Lipopolysaccharide-induced interferon response networks at birth are predictive of severe viral lower respiratory infections in the first year of life
Appropriate innate immune function is essential to limit pathogenesis and severity of severe lower respiratory infections (sLRI) during infancy, a leading cause of hospitalization and risk factor for subsequent asthma in this age group. Employing a systems biology approach to analysis of multi-omic profiles generated from a high-risk cohort (n = 50), we found that the intensity of activation of an LPS-induced interferon gene network at birth was predictive of sLRI risk in infancy (AUC = 0.724). Connectivity patterns within this network were stronger among susceptible individuals, and a systems biology approach identified IRF1 as a putative master regulator of this response. These findings were specific to the LPS-induced interferon response and were not observed following activation of viral nucleic acid sensing pathways. Comparison of responses at birth versus age 5 demonstrated that LPS-induced interferon responses but not responses triggered by viral nucleic acid sensing pathways may be subject to strong developmental regulation. These data suggest that the risk of sLRI in early life is in part already determined at birth, and additionally that the developmental status of LPS-induced interferon responses may be a key determinant of susceptibility. Our findings provide a rationale for the identification of at-risk infants for early intervention aimed at sLRI prevention and identifies targets which may be relevant for drug development
Airway epithelium respiratory illnesses and allergy (AERIAL) birth cohort: Study protocol
Introduction: Recurrent wheezing disorders including asthma are complex and heterogeneous diseases that affect up to 30% of all children, contributing to a major burden on children, their families, and global healthcare systems. It is now recognized that a dysfunctional airway epithelium plays a central role in the pathogenesis of recurrent wheeze, although the underlying mechanisms are still not fully understood. This prospective birth cohort aims to bridge this knowledge gap by investigating the influence of intrinsic epithelial dysfunction on the risk for developing respiratory disorders and the modulation of this risk by maternal morbidities, in utero exposures, and respiratory exposures in the first year of life. Methods: The Airway Epithelium Respiratory Illnesses and Allergy (AERIAL) study is nested within the ORIGINS Project and will monitor 400 infants from birth to 5 years. The primary outcome of the AERIAL study will be the identification of epithelial endotypes and exposure variables that influence the development of recurrent wheezing, asthma, and allergic sensitisation. Nasal respiratory epithelium at birth to 6 weeks, 1, 3, and 5 years will be analysed by bulk RNA-seq and DNA methylation sequencing. Maternal morbidities and in utero exposures will be identified on maternal history and their effects measured through transcriptomic and epigenetic analyses of the amnion and newborn epithelium. Exposures within the first year of life will be identified based on infant medical history as well as on background and symptomatic nasal sampling for viral PCR and microbiome analysis. Daily temperatures and symptoms recorded in a study-specific Smartphone App will be used to identify symptomatic respiratory illnesses. Discussion: The AERIAL study will provide a comprehensive longitudinal assessment of factors influencing the association between epithelial dysfunction and respiratory morbidity in early life, and hopefully identify novel targets for diagnosis and early intervention
An Introspective Comparison of Random Forest-Based Classifiers for the Analysis of Cluster-Correlated Data by Way of RF++
Many mass spectrometry-based studies, as well as other biological experiments produce cluster-correlated data. Failure to account for correlation among observations may result in a classification algorithm overfitting the training data and producing overoptimistic estimated error rates and may make subsequent classifications unreliable. Current common practice for dealing with replicated data is to average each subject replicate sample set, reducing the dataset size and incurring loss of information. In this manuscript we compare three approaches to dealing with cluster-correlated data: unmodified Breiman's Random Forest (URF), forest grown using subject-level averages (SLA), and RF++ with subject-level bootstrapping (SLB). RF++, a novel Random Forest-based algorithm implemented in C++, handles cluster-correlated data through a modification of the original resampling algorithm and accommodates subject-level classification. Subject-level bootstrapping is an alternative sampling method that obviates the need to average or otherwise reduce each set of replicates to a single independent sample. Our experiments show nearly identical median classification and variable selection accuracy for SLB forests and URF forests when applied to both simulated and real datasets. However, the run-time estimated error rate was severely underestimated for URF forests. Predictably, SLA forests were found to be more severely affected by the reduction in sample size which led to poorer classification and variable selection accuracy. Perhaps most importantly our results suggest that it is reasonable to utilize URF for the analysis of cluster-correlated data. Two caveats should be noted: first, correct classification error rates must be obtained using a separate test dataset, and second, an additional post-processing step is required to obtain subject-level classifications. RF++ is shown to be an effective alternative for classifying both clustered and non-clustered data. Source code and stand-alone compiled versions of command-line and easy-to-use graphical user interface (GUI) versions of RF++ for Windows and Linux as well as a user manual (Supplementary File S2) are available for download at: http://sourceforge.org/projects/rfpp/ under the GNU public license
Airway epithelium respiratory illnesses and allergy (AERIAL) birth cohort: study protocol
IntroductionRecurrent wheezing disorders including asthma are complex and heterogeneous diseases that affect up to 30% of all children, contributing to a major burden on children, their families, and global healthcare systems. It is now recognized that a dysfunctional airway epithelium plays a central role in the pathogenesis of recurrent wheeze, although the underlying mechanisms are still not fully understood. This prospective birth cohort aims to bridge this knowledge gap by investigating the influence of intrinsic epithelial dysfunction on the risk for developing respiratory disorders and the modulation of this risk by maternal morbidities, in utero exposures, and respiratory exposures in the first year of life.MethodsThe Airway Epithelium Respiratory Illnesses and Allergy (AERIAL) study is nested within the ORIGINS Project and will monitor 400 infants from birth to 5 years. The primary outcome of the AERIAL study will be the identification of epithelial endotypes and exposure variables that influence the development of recurrent wheezing, asthma, and allergic sensitisation. Nasal respiratory epithelium at birth to 6 weeks, 1, 3, and 5 years will be analysed by bulk RNA-seq and DNA methylation sequencing. Maternal morbidities and in utero exposures will be identified on maternal history and their effects measured through transcriptomic and epigenetic analyses of the amnion and newborn epithelium. Exposures within the first year of life will be identified based on infant medical history as well as on background and symptomatic nasal sampling for viral PCR and microbiome analysis. Daily temperatures and symptoms recorded in a study-specific Smartphone App will be used to identify symptomatic respiratory illnesses.DiscussionThe AERIAL study will provide a comprehensive longitudinal assessment of factors influencing the association between epithelial dysfunction and respiratory morbidity in early life, and hopefully identify novel targets for diagnosis and early intervention
A phase I clinical trial assessing the safety, tolerability, and pharmacokinetics of inhaled ethanol in humans as a potential treatment for respiratory tract infections
BackgroundCurrent treatments for respiratory infections are severely limited. Ethanol’s unique properties including antimicrobial, immunomodulatory, and surfactant-like activity make it a promising candidate treatment for respiratory infections if it can be delivered safely to the airway by inhalation. Here, we explore the safety, tolerability, and pharmacokinetics of inhaled ethanol in a phase I clinical trial.MethodsThe study was conducted as a single-centre, open-label clinical trial in 18 healthy adult volunteers, six with no significant medical comorbidities, four with stable asthma, four with stable cystic fibrosis, and four active smokers. A dose-escalating design was used, with participants receiving three dosing cycles of 40, 60%, and then 80% ethanol v/v in water, 2 h apart, in a single visit. Ethanol was nebulised using a standard jet nebuliser, delivered through a novel closed-circuit reservoir system, and inhaled nasally for 10 min, then orally for 30 min. Safety assessments included adverse events and vital sign monitoring, blood alcohol concentrations, clinical examination, spirometry, electrocardiogram, and blood tests.ResultsNo serious adverse events were recorded. The maximum blood alcohol concentration observed was 0.011% immediately following 80% ethanol dosing. Breath alcohol concentrations were high (median 0.26%) following dosing suggesting high tissue levels were achieved. Small transient increases in heart rate, blood pressure, and blood neutrophil levels were observed, with these normalising after dosing, with no other significant safety concerns. Of 18 participants, 15 completed all dosing cycles with three not completing all cycles due to tolerability. The closed-circuit reservoir system significantly reduced fugitive aerosol loss during dosing.ConclusionThese data support the safety of inhaled ethanol at concentrations up to 80%, supporting its further investigation as a treatment for respiratory infections.Clinical trial registration: identifier ACTRN12621000067875