69 research outputs found
Dinucleotide distance histograms for fast detection of rRNA in metatranscriptomic sequences
With the advent of metatranscriptomics it has now become possible to study the dynamics of microbial communities. The analysis of environmental RNA-Seq data implies several challenges for the development of efficient tools in bioinformatics. One of the first steps in the computational analysis of metatranscriptomic sequencing reads requires the separation of rRNA and mRNA fragments to ensure that only protein coding sequences are actually used in a subsequent functional analysis. In the context of the rRNA filtering task it is desirable to have a broad spectrum of different methods in order to find a suitable trade-off between speed and accuracy for a particular dataset. We introduce a machine learning approach for the detection of rRNA in metatranscriptomic sequencing reads that is based on support vector machines in combination with dinucleotide distance histograms for feature representation. The results show that our SVM-based approach is at least one order of magnitude faster than any of the existing tools with only a slight degradation of the detection performance when compared to state-of-the-art alignment-based methods
Experimental Investigation on Heat Transfer Enhancement with Passive Inserts in Flat Tubes in due Consideration of an Efficiency Assessment
This paper presents results of an experimental investigation on pressure drop and heat transfer for a wide range of Reynolds and Prandtl numbers ranging from 8 < Pr < 60 and 40 < Re < 3500, for flat tubes without and with passive inserts. For three different kinds of passive insert designs, the impact on heat and momentum transfer due to coaction of the total set of passive inserts with different shape and amount was investigated. Experimental results were analyzed regarding two main aspects: Heat transfer mechanisms and pressure drop induced by friction and form drag forces due to the presence of different shapes. After heat and momentum transfer mechanisms for each passive insert design were analyzed, heat transfer and pressure drop enhancement were compared to each other, leading to an efficiency discussion. Different concepts for efficiency evaluation, which are cited in literature, were applied to the presented experimental data. Pros and cons of the different concepts are discussed. Finally, we propose an equation for evaluation of total performance, which fully respects the energetic and exergetic aspects of heat transfer and pressure drop enhancement
MarVis: a tool for clustering and visualization of metabolic biomarkers
<p>Abstract</p> <p>Background</p> <p>A central goal of experimental studies in systems biology is to identify meaningful markers that are hidden within a diffuse background of data originating from large-scale analytical intensity measurements as obtained from metabolomic experiments. Intensity-based clustering is an unsupervised approach to the identification of metabolic markers based on the grouping of similar intensity profiles. A major problem of this basic approach is that in general there is no prior information about an adequate number of biologically relevant clusters.</p> <p>Results</p> <p>We present the tool MarVis (Marker Visualization) for data mining on intensity-based profiles using one-dimensional self-organizing maps (1D-SOMs). MarVis can import and export customizable CSV (Comma Separated Values) files and provides aggregation and normalization routines for preprocessing of intensity profiles that contain repeated measurements for a number of different experimental conditions. Robust clustering is then achieved by training of an 1D-SOM model, which introduces a similarity-based ordering of the intensity profiles. The ordering allows a convenient visualization of the intensity variations within the data and facilitates an interactive aggregation of clusters into larger blocks. The intensity-based visualization is combined with the presentation of additional data attributes, which can further support the analysis of experimental data.</p> <p>Conclusion</p> <p>MarVis is a user-friendly and interactive tool for exploration of complex pattern variation in a large set of experimental intensity profiles. The application of 1D-SOMs gives a convenient overview on relevant profiles and groups of profiles. The specialized visualization effectively supports researchers in analyzing a large number of putative clusters, even though the true number of biologically meaningful groups is unknown. Although MarVis has been developed for the analysis of metabolomic data, the tool may be applied to gene expression data as well.</p
Predicting phenotypic traits of prokaryotes from protein domain frequencies
BACKGROUND: Establishing the relationship between an organism's genome sequence and its phenotype is a fundamental challenge that remains largely unsolved. Accurately predicting microbial phenotypes solely based on genomic features will allow us to infer relevant phenotypic characteristics when the availability of a genome sequence precedes experimental characterization, a scenario that is favored by the advent of novel high-throughput and single cell sequencing techniques. RESULTS: We present a novel approach to predict the phenotype of prokaryotes directly from their protein domain frequencies. Our discriminative machine learning approach provides high prediction accuracy of relevant phenotypes such as motility, oxygen requirement or spore formation. Moreover, the set of discriminative domains provides biological insight into the underlying phenotype-genotype relationship and enables deriving hypotheses on the possible functions of uncharacterized domains. CONCLUSIONS: Fast and accurate prediction of microbial phenotypes based on genomic protein domain content is feasible and has the potential to provide novel biological insights. First results of a systematic check for annotation errors indicate that our approach may also be applied to semi-automatic correction and completion of the existing phenotype annotation
Score-based prediction of genomic islands in prokaryotic genomes using hidden Markov models
BACKGROUND: Horizontal gene transfer (HGT) is considered a strong evolutionary force shaping the content of microbial genomes in a substantial manner. It is the difference in speed enabling the rapid adaptation to changing environmental demands that distinguishes HGT from gene genesis, duplications or mutations. For a precise characterization, algorithms are needed that identify transfer events with high reliability. Frequently, the transferred pieces of DNA have a considerable length, comprise several genes and are called genomic islands (GIs) or more specifically pathogenicity or symbiotic islands. RESULTS: We have implemented the program SIGI-HMM that predicts GIs and the putative donor of each individual alien gene. It is based on the analysis of codon usage (CU) of each individual gene of a genome under study. CU of each gene is compared against a carefully selected set of CU tables representing microbial donors or highly expressed genes. Multiple tests are used to identify putatively alien genes, to predict putative donors and to mask putatively highly expressed genes. Thus, we determine the states and emission probabilities of an inhomogeneous hidden Markov model working on gene level. For the transition probabilities, we draw upon classical test theory with the intention of integrating a sensitivity controller in a consistent manner. SIGI-HMM was written in JAVA and is publicly available. It accepts as input any file created according to the EMBL-format. It generates output in the common GFF format readable for genome browsers. Benchmark tests showed that the output of SIGI-HMM is in agreement with known findings. Its predictions were both consistent with annotated GIs and with predictions generated by different methods. CONCLUSION: SIGI-HMM is a sensitive tool for the identification of GIs in microbial genomes. It allows to interactively analyze genomes in detail and to generate or to test hypotheses about the origin of acquired genes
CoMet—a web server for comparative functional profiling of metagenomes
Analyzing the functional potential of newly sequenced genomes and metagenomes has become a common task in biomedical and biological research. With the advent of high-throughput sequencing technologies comparative metagenomics opens the way to elucidate the genetically determined similarities and differences of complex microbial communities. We developed the web server ‘CoMet’ (http://comet.gobics.de), which provides an easy-to-use comparative metagenomics platform that is well-suitable for the analysis of large collections of metagenomic short read data. CoMet combines the ORF finding and subsequent assignment of protein sequences to Pfam domain families with a comparative statistical analysis. Besides comprehensive tabular data files, the CoMet server also provides visually interpretable output in terms of hierarchical clustering and multi-dimensional scaling plots and thus allows a quick overview of a given set of metagenomic samples
Linagliptin and its effects on hyperglycaemia and albuminuria in patients with type 2 diabetes and renal dysfunction : the randomized MARLINA-T2D trial
Aims: The MARLINA-T2D study (ClinicalTrials. gov, NCT01792518) was designed to investigate the glycaemic and renal effects of linagliptin added to standard-of-care in individuals with type 2 diabetes and albuminuria. Methods: A total of 360 individuals with type 2 diabetes, HbA1c 6.5% to 10.0% (48-86 mmol/ mol), estimated glomerular filtration rate (eGFR) >= 30 mL/min/1.73 m(2) and urinary albumin-tocreatinine ratio (UACR) 30-3000 mg/g despite single agent renin-angiotensin-system blockade were randomized to double-blind linagliptin (n = 182) or placebo (n = 178) for 24 weeks. The primary and key secondary endpoints were change from baseline in HbA1c at week 24 and time-weighted average of percentage change from baseline in UACR over 24 weeks, respectively. Results: Baseline mean HbA1c and geometric mean (gMean) UACR were 7.8% +/- 0.9% (62.2 +/- 9.6 mmol/mol) and 126 mg/g, respectively; 73.7% and 20.3% of participants had microalbuminuria or macroalbuminuria, respectively. After 24 weeks, the placebo-adjusted mean change in HbA1c from baseline was -0.60% (-6.6 mmol/mol) (95% confidence interval [CI], -0.78 to -0.43 [-8.5 to -4.7 mmol/mol]; P Conclusions: In individuals at early stages of diabetic kidney disease, linagliptin significantly improved glycaemic control but did not significantly lower albuminuria. There was no significant change in placebo-adjusted eGFR. Detection of clinically relevant renal effects of linagliptin may require longer treatment, as its main experimental effects in animal studies have been to reduce interstitial fibrosis rather than alter glomerular haemodynamics.Peer reviewe
Word correlation matrices for protein sequence analysis and remote homology detection
<p>Abstract</p> <p>Background</p> <p>Classification of protein sequences is a central problem in computational biology. Currently, among computational methods discriminative kernel-based approaches provide the most accurate results. However, kernel-based methods often lack an interpretable model for analysis of discriminative sequence features, and predictions on new sequences usually are computationally expensive.</p> <p>Results</p> <p>In this work we present a novel kernel for protein sequences based on average word similarity between two sequences. We show that this kernel gives rise to a feature space that allows analysis of discriminative features and fast classification of new sequences. We demonstrate the performance of our approach on a widely-used benchmark setup for protein remote homology detection.</p> <p>Conclusion</p> <p>Our word correlation approach provides highly competitive performance as compared with state-of-the-art methods for protein remote homology detection. The learned model is interpretable in terms of biologically meaningful features. In particular, analysis of discriminative words allows the identification of characteristic regions in biological sequences. Because of its high computational efficiency, our method can be applied to ranking of potential homologs in large databases.</p
- …