Search CORE

45 research outputs found

Whole Genome Phylogenetic Tree Reconstruction Using Colored de Bruijn Graphs

Author: Bodily Paul M.
Bybee Seth M.
Clement Mark J.
Crandall Keith A.
Fujimoto M. Stanley
Lyman Cole A.
Snell Quinn
Suvorov Anton
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/07/2017
Field of study

We present kleuren, a novel assembly-free method to reconstruct phylogenetic trees using the Colored de Bruijn Graph. kleuren works by constructing the Colored de Bruijn Graph and then traversing it, finding bubble structures in the graph that provide phylogenetic signal. The bubbles are then aligned and concatenated to form a supermatrix, from which a phylogenetic tree is inferred. We introduce the algorithms that kleuren uses to accomplish this task, and show its performance on reconstructing the phylogenetic tree of 12 Drosophila species. kleuren reconstructed the established phylogenetic tree accurately, and is a viable tool for phylogenetic tree reconstruction using whole genome sequences. Software package available at: https://github.com/Colelyman/kleurenComment: 6 pages, 3 figures, accepted at BIBE 2017. Minor modifications to the text due to reviewer feedback and fixed typo

arXiv.org e-Print Archive

Crossref

George Washington University: Health Sciences Research Commons (HSRC)

Learning the Language of Genes: Representing Global Codon Bias with Deep Language Models

Author: Bodily Paul M.
Clement Mark J.
Fujimoto M. Stanley
Jacobsen J. Andrew
Lyman Cole A.
Publication venue: DigitalCommons@USU
Publication date: 08/05/2017
Field of study

Codon bias, the usage patterns of synonymous codons for encoding a protein sequence as nucleotides, is a biological phenomenon that is not well understood. Current methods that measure and model the codon bias of an organism exist for usage in codon optimization. In synthetic biology, codon optimization is a task the involves selecting the appropriate codons to reverse translate a protein sequence into a nucleotide sequence to maximize expression in a vector. These features include codon adaptation index (CAI) [1], individual codon usage (ICU), hidden stop codons (HSC) [2] and codon context (CC) [3]. While explicitly modeling these features has helped us to engineer high synthesis yield proteins, it is unclear what other biological features should be taken into account during codon selection for protein synthesis maximization. In this article, we present a method for modeling global codon bias through deep language models that is more robust than current methods by providing more contextual information and long-range dependencies to be considered during codon selection

DigitalCommons@USU

Dimensions of distance: international flight connections, historical determinism, and economic relations in Africa

Author: Anton Suvorov (129941)
Camilla Sharkey (3243678)
Haley Wightman (3243666)
M. Fujimoto (3243669)
Mark Clement (3243663)
Nicholas Jensen (3243675)
Paul Bodily (3243672)
Seth Bybee (3243681)
T. Ogden (3243684)
Publication venue: Emerald
Publication date: 17/10/2016
Field of study

Purpose: The paper examines how distance manifests in terms of air passenger transport links between countries and focuses on the 48 countries of sub-Saharan Africa (SSA). It asks to what extent do existing flight connections reflect economic relations between countries and if so, do they represent past, current or future relations? It asks whether the impact of distance is similar for all countries and at different stages of development. Design/methodology/approach: Passenger flight connection data was extracted to generate map images and flight frequencies in order to observe inter-relationships between different locations and to observe emerging patterns. The paper uses ESRIs ArcGIS software to visualise all these data into maps. Findings: SSA is poorly connected both intra- and inter-continentally. Cultural and historical ties dominate and elements of historical determinism appear within flight connections in SSA reflecting the biases associated with colonialism. Larger economies in SSA are less dependent on these past ties and their flight connections reveal a greater level of diversity and interests. SSA has generally been slow to develop flight routings to the new emerging markets. Originality/value: Its contribution lies not only in examining these flight patterns for an under-researched region but aides in future work on SSA and its integration into the global economy and international business networks. It argues that whilst distance matters; how it matters varies

Dryad Digital Repository (Duke University)

Sussex Research Online

FigShare

Toward more accurate variant calling for “personal genomes”

Author: Bodily Paul
Harkonarson Hakon
Hu Jingchu
Jiang Tao
Johnson W. Evan
Lyon Gholson J.
O'Rawe Jason
Robison Reid J.
Sun Guangqing
Tian Lifeng
Wang Kai
Wang Wei
Wei Zhi
Wu Yiyang
Publication venue
Publication date: 01/03/2013
Field of study

To date, researchers and clinicians use widely different methods for detecting and reporting human genetic variation. As the size of academic and private databases grow and as the use of the existing genomic techniques expand, researchers and clinicians stand to greatly benefit from the standardization of data generating approaches and analysis methodologies. To successfully implement genomic analyses in the clinic, it will be critically important to optimize the existing pipelines for attaining a higher sensitivity and specificity for more accurate and consistent variant calling

Cold Spring Harbor Laboratory Institutional Repository

Low concordance of multiple variant-calling pipelines: practical implications for exome and genome sequencing

Author: Bodily Paul
Hakonarson Hakon
Hu Jingchu
Jiang Tao
Johnson W. Evan
Lyon Gholson J.
O'Rawe Jason
Robison Reid J.
Sun Guangqing
Tian Lifeng
Wang Kai
Wang Wei
Wei Zhi
Wu Yiyang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

BACKGROUND: To facilitate the clinical implementation of genomic medicine by next-generation sequencing, it will be critically important to obtain accurate and consistent variant calls on personal genomes. Multiple software tools for variant calling are available, but it is unclear how comparable these tools are or what their relative merits in real-world scenarios might be. METHODS: We sequenced 15 exomes from four families using commercial kits (Illumina HiSeq 2000 platform and Agilent SureSelect version 2 capture kit), with approximately 120X mean coverage. We analyzed the raw data using near-default parameters with five different alignment and variant-calling pipelines (SOAP, BWA-GATK, BWA-SNVer, GNUMAP, and BWA-SAMtools). We additionally sequenced a single whole genome using the sequencing and analysis pipeline from Complete Genomics (CG), with 95% of the exome region being covered by 20 or more reads per base. Finally, we validated 919 single-nucleotide variations (SNVs) and 841 insertions and deletions (indels), including similar fractions of GATK-only, SOAP-only, and shared calls, on the MiSeq platform by amplicon sequencing with approximately 5000X mean coverage. RESULTS: SNV concordance between five Illumina pipelines across all 15 exomes was 57.4%, while 0.5 to 5.1% of variants were called as unique to each pipeline. Indel concordance was only 26.8% between three indel-calling pipelines, even after left-normalizing and intervalizing genomic coordinates by 20 base pairs. There were 11% of CG variants falling within targeted regions in exome sequencing that were not called by any of the Illumina-based exome analysis pipelines. Based on targeted amplicon sequencing on the MiSeq platform, 97.1%, 60.2%, and 99.1% of the GATK-only, SOAP-only and shared SNVs could be validated, but only 54.0%, 44.6%, and 78.1% of the GATK-only, SOAP-only and shared indels could be validated. Additionally, our analysis of two families (one with four individuals and the other with seven), demonstrated additional accuracy gained in variant discovery by having access to genetic data from a multi-generational family. CONCLUSIONS: Our results suggest that more caution should be exercised in genomic medicine settings when analyzing individual genomes, including interpreting positive and negative findings with scrutiny, especially for indels. We advocate for renewed collection and sequencing of multi-generational families to increase the overall accuracy of whole genomes

Cold Spring Harbor Laboratory Institutional Repository

Springer - Publisher Connector

PubMed Central

eScholarship - University of California

Computed tomographic enterography adds information to clinical management in small bowel Crohn's disease

Author: Beth Manoogian
Bodily
Colombel
Elaine Caoili
Ellen M. Zimmermann
Guidi
Hatoum
Hatoum
Joel F. Platt
Macdonald
Michael Zimmermann
Paul L. Sonda
Peter D. R. Higgins
Solem
Taft P. Bhuket
Thornton
Wold
Yekeler
Publication venue: 'Wiley'
Publication date: 01/03/2007
Field of study

Background: CT enterography yields striking findings in the bowel wall in Crohn's disease. These images may help to evaluate whether small bowel narrowing results from active disease requiring anti-inflammatory therapy. However, the clinical relevance of these images is unknown. It is also not known if these radiologic findings correlate with objective biomarkers of inflammation. Methods: In a blinded and independent evaluation, IBD subspecialty gastroenterologists reviewed clinical data, and CT radiologists reviewed CT enterography scans of 67 consecutive patients with Crohn's disease and suspicion of either small bowel inflammation or stricture. Comparisons were made between (1) clinical and radiologic assessments of inflammation and stricture, (2) clinical assessments before and after computed tomographic enterography (CTE) reports were revealed, and (3) radiologic findings and objective biomarkers of inflammation. Results: (1) Individual CTE findings correlated poorly (Spearman's rho < 0.30) with clinical assessment; (2) clinicians did not suspect 16% of radiologic strictures, and more than half the cases of clinically suspected strictures did not have them on CTE; (3) CTE data changed clinicians' perceptions of the likelihood of steroid benefit in 41 of 67 cases; (4) specific CTE findings correlated with CRP, and a distinct set of CTE findings correlated with ESR in the subset of patients who had these biomarkers measured. Conclusions: CTE seems to add unique information to clinical assessment, both in detecting additional strictures and in changing clinicians' perceptions of the likelihood of steroids benefiting patients. The biomarker correlations suggest that CTE is measuring real biologic phenomena that correlate with inflammation, providing information distinct from that in a standard clinical assessment. (Inflamm Bowel Dis 2006)Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/55965/1/20013_ftp.pd

Crossref

Deep Blue Documents

Recommended from our members

Inferring Structural Constraints in Musical Sequences via Multiple Self-Alignment

Author: Bodily Paul Mark
Ventura Dan
Publication venue: eScholarship, University of California
Publication date: 01/01/2021
Field of study

A critical aspect of the way humans recognize and understand meaning in sequential data is the ability to identify abstract structural repetitions. We present a novel approach to discovering structural repetitions within sequences that uses a multiple Smith-Waterman self-alignment. We illustrate our approach in the context of finding different forms of structural repetition in music composition. Feature-specific alignment scoring functions enable structure finding in primitive features such as rhythm, melody, and lyrics. These can be compounded to create scoring functions that find higher-level structure including verse-chorus structure. We demonstrate our approach by finding harmonic, pitch, rhythmic, and lyrical structure in symbolic music and compounding these viewpoints to identify the abstract structure of verse-chorus segmentation

eScholarship - University of California

HeyLo: Visualizing User Interests from Twitter Using Emoji in Mixed Reality

Author: Bodily Paul, (Mentor)
Griffith Isaac, (Mentor)
Harris Hunter M.
Thompson Makayla
Publication venue: 'IUScholarWorks'
Publication date: 21/07/2020
Field of study

We tackle the problem of analyzing a user\u27s interests from social media content and subsequently visualizing these interests in an extended reality environment. We compare five models for extracting interests from Twitter users and how we can measure the effectiveness of these models. We also look at how these interest extraction models fit in the context of HeyLo, an extended reality computational creativity (XRCC) framework for visualizing potential conversational topics. The chosen interests for a particular person are visualized using emoji. We accomplish this by using an emoji2vector model to find the closest related emoji to a given interest. We perform a comparative analysis between the five interest extraction models on real-world users and their tweets, evaluating specificity, variance, and relevance

Boise State University - ScholarWorks

Open challenges in musical metacreation

Author: Bodily Paul M
Colton Simon
Pease Alison
Turing Alan M.
Ventura Dan
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2019
Field of study

Crossref

Archivio istituzionale della ricerca - Università di Padova