Search CORE

1,308 research outputs found

Species-level functional profiling of metagenomes and metatranscriptomes.

Author: A Sczyrba
A Shafquat
AE Duran-Pinedo
AK Sharma
B Buchfink
B Langmead
BE Suzek
BK Swan
C Burke
C Luo
Curtis Huttenhower
D Medini
DH Huson
DT Truong
DT Truong
E Pasolli
EA Franzosa
EA Franzosa
Eric A. Franzosa
George Weingart
GG Silva
Gholamali Rahnavard
H Hauswedell
J Kim
J Lloyd-Price
J Lloyd-Price
J Ravel
J. Gregory Caporaso
JA Fuhrman
K Huang
Karen Schwarzberg Lipson
Lauren J. McIver
LR Thompson
LR Thompson
Luke R. Thompson
M Hamady
M Kanehisa
M Scholz
Melanie Schirmer
MY Galperin
N Segata
N Segata
Nicola Segata
OU Mason
P Petrenko
PJ Turnbaugh
R Caspi
RC Edgar
RD Finn
Rob Knight
S Abubucker
S Nayfach
S Sunagawa
S Sunagawa
T Bose
UniProt Consortium.
W Huang
Y Ye
Y Zhao
Publication venue: eScholarship, University of California
Publication date: 01/11/2018
Field of study

Functional profiles of microbial communities are typically generated using comprehensive metagenomic or metatranscriptomic sequence read searches, which are time-consuming, prone to spurious mapping, and often limited to community-level quantification. We developed HUMAnN2, a tiered search strategy that enables fast, accurate, and species-resolved functional profiling of host-associated and environmental communities. HUMAnN2 identifies a community's known species, aligns reads to their pangenomes, performs translated search on unclassified reads, and finally quantifies gene families and pathways. Relative to pure translated search, HUMAnN2 is faster and produces more accurate gene family profiles. We applied HUMAnN2 to study clinal variation in marine metabolism, ecological contribution patterns among human microbiome pathways, variation in species' genomic versus transcriptional contributions, and strain profiling. Further, we introduce 'contributional diversity' to explain patterns of ecological assembly across different microbial community types

Crossref

eScholarship - University of California

Metabolic Network Alignments and their Applications

Author: Cheng Qiong
Publication venue: ScholarWorks @ Georgia State University
Publication date: 01/12/2009
Field of study

The accumulation of high-throughput genomic and proteomic data allows for the reconstruction of the increasingly large and complex metabolic networks. In order to analyze the accumulated data and reconstructed networks, it is critical to identify network patterns and evolutionary relations between metabolic networks. But even finding similar networks becomes computationally challenging. The dissertation addresses these challenges with discrete optimization and the corresponding algorithmic techniques. Based on the property of the gene duplication and function sharing in biological network,we have formulated the network alignment problem which asks the optimal vertex-to-vertex mapping allowing path contraction, vertex deletion, and vertex insertions. We have proposed the first polynomial time algorithm for aligning an acyclic metabolic pattern pathway with an arbitrary metabolic network. We also have proposed a polynomial-time algorithm for patterns with small treewidth and implemented it for series-parallel patterns which are commonly found among metabolic networks. We have developed the metabolic network alignment tool for free public use. We have performed pairwise mapping of all pathways among five organisms and found a set of statistically significant pathway similarities. We also have applied the network alignment to identifying inconsistency, inferring missing enzymes, and finding potential candidates

ScholarWorks @ Georgia State University

Information Theory in Computational Biology: Where We Stand Today

Author: Chanda Pritam
Costa Eduardo
Hu Jie
Sukumar Shravan
Van Hemert John
Walia Rasna
Publication venue: 'MDPI AG'
Publication date: 01/06/2020
Field of study

"A Mathematical Theory of Communication" was published in 1948 by Claude Shannon to address the problems in the field of data compression and communication over (noisy) communication channels. Since then, the concepts and ideas developed in Shannon's work have formed the basis of information theory, a cornerstone of statistical learning and inference, and has been playing a key role in disciplines such as physics and thermodynamics, probability and statistics, computational sciences and biological sciences. In this article we review the basic information theory based concepts and describe their key applications in multiple major areas of research in computational biology-gene expression and transcriptomics, alignment-free sequence comparison, sequencing and error correction, genome-wide disease-gene association mapping, metabolic networks and metabolomics, and protein sequence, structure and interaction analysis

IUPUIScholarWorks

Development of algorithms and next-generation sequencing data workflows for the analysis of gene regulatory networks

Author: Shomroni Orr
Publication venue
Publication date: 02/03/2017
Field of study

Georg-August-University Göttingen

METHODS FOR HIGH-THROUGHPUT COMPARATIVE GENOMICS AND DISTRIBUTED SEQUENCE ANALYSIS

Author: Angiuoli Samuel Vincent
Publication venue
Publication date: 01/01/2011
Field of study

High-throughput sequencing has accelerated applications of genomics throughout the world. The increased production and decentralization of sequencing has also created bottlenecks in computational analysis. In this dissertation, I provide novel computational methods to improve analysis throughput in three areas: whole genome multiple alignment, pan-genome annotation, and bioinformatics workflows. To aid in the study of populations, tools are needed that can quickly compare multiple genome sequences, millions of nucleotides in length. I present a new multiple alignment tool for whole genomes, named Mugsy, that implements a novel method for identifying syntenic regions. Mugsy is computationally efficient, does not require a reference genome, and is robust in identifying a rich complement of genetic variation including duplications, rearrangements, and large-scale gain and loss of sequence in mixtures of draft and completed genome data. Mugsy is evaluated on the alignment of several dozen bacterial chromosomes on a single computer and was the fastest program evaluated for the alignment of assembled human chromosome sequences from four individuals. A distributed version of the algorithm is also described and provides increased processing throughput using multiple CPUs. Numerous individual genomes are sequenced to study diversity, evolution and classify pan-genomes. Pan-genome annotations contain inconsistencies and errors that hinder comparative analysis, even within a single species. I introduce a new tool, Mugsy-Annotator, that identifies orthologs and anomalous gene structure across a pan-genome using whole genome multiple alignments. Identified anomalies include inconsistently located translation initiation sites and disrupted genes due to draft genome sequencing or pseudogenes. An evaluation of pan-genomes indicates that such anomalies are common and alternative annotations suggested by the tool can improve annotation consistency and quality. Finally, I describe the Cloud Virtual Resource, CloVR, a desktop application for automated sequence analysis that improves usability and accessibility of bioinformatics software and cloud computing resources. CloVR is installed on a personal computer as a virtual machine and requires minimal installation, addressing challenges in deploying bioinformatics workflows. CloVR also seamlessly accesses remote cloud computing resources for improved processing throughput. In a case study, I demonstrate the portability and scalability of CloVR and evaluate the costs and resources for microbial sequence analysis

CiteSeerX

Digital Repository at the University of Maryland

Computational functional annotation of crop genomics using hierarchical orthologous groups

Author: Warwick Vesztrocy Alexander George
Publication venue: UCL (University College London)
Publication date: 28/12/2019
Field of study

Improving agronomically important traits, such as yield, is important in order to meet the ever growing demands of increased crop production. Knowledge of the genes that have an effect on a given trait can be used to enhance genomic selection by prediction of biologically interesting loci. Candidate genes that are strongly linked to a desired trait can then be targeted by transformation or genome editing. This application of prioritisation of genetic material can accelerate crop improvement. However, the application of this is currently limited due to the lack of accurate annotations and methods to integrate experimental data with evolutionary relationships. Hierarchical orthologous groups (HOGs) provide nested groups of genes that enable the comparison of highly diverged and similar species in a consistent manner. Over 2,250 species are included in the OMA project, resulting in over 600,000 HOGs. This thesis provides the required methodology and a tool to exploit this rich source of information, in the HOGPROP algorithm. The potential of this is then demonstrated in mining crop genome data, from metabolic QTL studies and utilising Gene Ontology (GO) annotations as well as ChEBI terms (Chemical Entities of Biological Interest) in order to prioritise candidate causal genes. Gauging the performance of the tool is also important. When considering GO annotations, the CAFA series of community experiments has provided the most extensive benchmarking to-date. However, this has not fully taken into account the incomplete knowledge of protein function – the open world assumption (OWA). This will require extra negative annotations, for which one such source has been identified based on expertly curated gene phylogenies. These negative annotations are then utilised in the proposed, OWA-compliant, improved framework for benchmarking. The results show that current benchmarks tend to focus on the general terms, which means that conclusions are not merely uninformative, but misleading

UCL Discovery

Application of knowledge discovery and data mining methods in livestock genomics for hypothesis generation and identification of biomarker candidates influencing meat quality traits in pigs

Author: Sahadevan Sudeep
Publication venue: Universitäts- und Landesbibliothek Bonn
Publication date
Field of study

Recent advancements in genomics and genome profiling technologies have lead to an increase in the amount of data available in livestock genomics. Yet, most of the studies done in livestock genomics have been following a reductionist approach and very few studies have either followed data mining or knowledge discovery concepts or made use of the wealth of information available in the public domain to gain new knowledge. The goals of this thesis were: (i) the adoption of existing analysis strategies or the development of novel approaches in livestock genomics for integrative data analysis following the principles of data mining and knowledge discovery and (ii) demonstrating the application of such approaches in livestockgenomics for hypothesis generation and biomarker discovery. A pig meat quality trait termed androstenone measurement in backfat was selected as the target phenotype for the experiments. Two experiments were performed as a part of this thesis. The first one followed a knowledge driven approach merging high-throughput expression data with metabolic interaction network. Based on the results from this experiment, several novel biomarker candidates and a hypothesis regarding different mechanisms regulating androstenone synthesis in porcine testis samples with divergent androstenone measurements in back fat were proposed. The model proposed that the elevated levels of androstenone synthesis in sample population could be due to the combined effect of cAMP/PKA signaling, elevated levels of fatty acid metabolism and anti lipid peroxidation activity of members of glutathione metabolic pathway. The second experiment followed a data driven approach and integrated gene expression data from multiple porcine populations to identify similarities in gene expression patterns related to hepatic androstenone metabolism. The results indicated that one of the low androstenone phenotype specific co-expression cluster was functionally enriched in pathways related to androgen and androstenone metabolism and that the members of this cluster exhibited weak co-expression in high androstenone phenotype. Based on the results from this experiment, this co-expression cluster was proposed as a signature cluster for hepatic androstenone metabolism in boars with low androstenone content in back fat. The results from these experiments indicate that integrative analysis approaches following data mining and knowledge discovery concepts can be used for the generation of new knowledge from existing data in livestock genomics. But, limited data availability in livestock genomics is a hindrance to the extensive use such analysis methods in livestock genomics field for gaining new knowledge. In conclusion, this study was aimed at demonstrating the capabilities of data mining and knowledge discovery methods and integrative analysis approaches to generate new knowledge in livestock genomics using existing datasets. The results from the experiments hint the possibilities of further exploring such methods for knowledge generation in this field. Although the application of such methods is limited in livestock genomics due to data availability issues at present, the increase in data availability due to evolving high throughput technologies and decrease in data generation costs would aid in the wide spread use of such methods in livestock genomics in the coming future

bonndoc – Der Publikationsserver der Universität Bonn

Recommended from our members

Contributions of anisotropic and heterogeneous tissue modulus to apparent trabecular bone mechanical properties

Author: Yu Yue
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2017
Field of study

The highly optimized hierarchical structure of trabecular bone is a major contributor to its remarkable mechanical properties. At the micro-scale level, individual plate-like and rod-like trabeculae are interconnected, forming a complex trabecular architecture. It is widely believed that bone strength, an important mechanical characteristic that describes the capability of bone to resist fracture, is largely determined by the tissue-level material properties of these microscopic trabecular elements. However, due to the complicated microstructure and irregular morphology of trabecular bone, a link between the tissue-level and the apparent level mechanics in trabecular bone has never been established. Thus, the goal of this thesis is to examine the tissue-level material properties of trabecular bone and their contribution to apparent-level bone mechanics, and ultimately to improve our fundamental understanding and assessment of bone strength in diseased and healthy patients. At the micro-scale level, plate-like and rod-like trabeculae are distinctly aligned along different orientations on the anatomical axis of the skeleton. Also, the highly organized underlying ultrastructure of bone tissue suggests trabecular bone might possess an anisotropic tissue modulus, i.e. different modulus in the axial and lateral cross-section of a trabecula. In this thesis, we studied this tissue-level anisotropy by examining mechanical properties of individual trabecular plates and rods aligned longitudinally, obliquely, and transversely on the anatomical axis using micro-indentation. We discovered that, despite the different orientations of trabeculae, tissue moduli are higher in the axial direction than in the lateral direction for both plates and rods. We also discovered that plates have a higher tissue modulus than rods, suggesting different degrees of mineralization. Furthermore, the tissue mineral density correlated strongly but distinctly with tissue modulus in the axial and lateral directions, providing descriptions on how spatially heterogeneous mineralization at the tissue level affects the tissue modulus. After characterization of the anisotropic and heterogeneous modulus of trabecular bone at the tissue level, we then sought to investigate its contribution to apparent-level mechanical properties, including apparent Young’s modulus and yield strength. Non-linear FE voxel models incorporating experimentally determined anisotropy and heterogeneity were created from micro-computed tomography (µCT) images of healthy trabecular bone samples. Apparent Young's modulus and yield strength predicted by the models were compared to and correlated with gold standard mechanical testing measurements, as well as to the same FE models without incorporation of anisotropy and/or heterogeneity. We discovered that the anisotropic model prediction was highly correlated and indistinguishable from mechanical testing measurements. However, the prediction power of the model was not enhanced by incorporating anisotropy and heterogeneity (compared to a homogeneous and isotropic model), suggesting that variances in tissue-level material properties contribute minimally to the apparent level bone behaviors in healthy bone. However, the possibility remained that a more substantial contribution could arise in diseased bone, particularly diseases in which tissue-level properties are compromised. Therefore, we studied trabecular bone in two diseased conditions – subchondral bone in human knees affected by osteoarthritis and pelvic bone affected by adolescent idiopathic sclerosis – to see how disease can alter the tissue-level and, consequently, apparent-level bone mechanics. In OA bone, we found a significant decrease in tissue modulus in the subchondral bone under severely damaged cartilage compared to control, which provides an explanation for a minimal increase in apparent stiffness with an almost doubled bone volume fraction. In AIS bone, no differences were found in tissue-level or apparent level Young’s modulus compared to control. However, the mineral density was found to play a distinct role in the modulus of growing bone tissue compared to mature bone

Columbia University Academic Commons

JOINT CODING OF MULTIMODAL BIOMEDICAL IMAGES US ING CONVOLUTIONAL NEURAL NETWORKS

Author: Parracho João Oliveira
Publication venue
Publication date: 08/11/2020
Field of study

The massive volume of data generated daily by the gathering of medical images with different modalities might be difficult to store in medical facilities and share through communication networks. To alleviate this issue, efficient compression methods must be implemented to reduce the amount of storage and transmission resources required in such applications. However, since the preservation of all image details is highly important in the medical context, the use of lossless image compression algorithms is of utmost importance. This thesis presents the research results on a lossless compression scheme designed to encode both computerized tomography (CT) and positron emission tomography (PET). Different techniques, such as image-to-image translation, intra prediction, and inter prediction are used. Redundancies between both image modalities are also investigated. To perform the image-to-image translation approach, we resort to lossless compression of the original CT data and apply a cross-modality image translation generative adversarial network to obtain an estimation of the corresponding PET. Two approaches were implemented and evaluated to determine a PET residue that will be compressed along with the original CT. In the first method, the residue resulting from the differences between the original PET and its estimation is encoded, whereas in the second method, the residue is obtained using encoders inter-prediction coding tools. Thus, in alternative to compressing two independent picture modalities, i.e., both images of the original PET-CT pair solely the CT is independently encoded alongside with the PET residue, in the proposed method. Along with the proposed pipeline, a post-processing optimization algorithm that modifies the estimated PET image by altering the contrast and rescaling the image is implemented to maximize the compression efficiency. Four different versions (subsets) of a publicly available PET-CT pair dataset were tested. The first proposed subset was used to demonstrate that the concept developed in this work is capable of surpassing the traditional compression schemes. The obtained results showed gains of up to 8.9% using the HEVC. On the other side, JPEG2k proved not to be the most suitable as it failed to obtain good results, having reached only -9.1% compression gain. For the remaining (more challenging) subsets, the results reveal that the proposed refined post-processing scheme attains, when compared to conventional compression methods, up 6.33% compression gain using HEVC, and 7.78% using VVC

IC-online