4,001 research outputs found

    Species-level functional profiling of metagenomes and metatranscriptomes.

    Get PDF
    Functional profiles of microbial communities are typically generated using comprehensive metagenomic or metatranscriptomic sequence read searches, which are time-consuming, prone to spurious mapping, and often limited to community-level quantification. We developed HUMAnN2, a tiered search strategy that enables fast, accurate, and species-resolved functional profiling of host-associated and environmental communities. HUMAnN2 identifies a community's known species, aligns reads to their pangenomes, performs translated search on unclassified reads, and finally quantifies gene families and pathways. Relative to pure translated search, HUMAnN2 is faster and produces more accurate gene family profiles. We applied HUMAnN2 to study clinal variation in marine metabolism, ecological contribution patterns among human microbiome pathways, variation in species' genomic versus transcriptional contributions, and strain profiling. Further, we introduce 'contributional diversity' to explain patterns of ecological assembly across different microbial community types

    Composite structural motifs of binding sites for delineating biological functions of proteins

    Get PDF
    Most biological processes are described as a series of interactions between proteins and other molecules, and interactions are in turn described in terms of atomic structures. To annotate protein functions as sets of interaction states at atomic resolution, and thereby to better understand the relation between protein interactions and biological functions, we conducted exhaustive all-against-all atomic structure comparisons of all known binding sites for ligands including small molecules, proteins and nucleic acids, and identified recurring elementary motifs. By integrating the elementary motifs associated with each subunit, we defined composite motifs which represent context-dependent combinations of elementary motifs. It is demonstrated that function similarity can be better inferred from composite motif similarity compared to the similarity of protein sequences or of individual binding sites. By integrating the composite motifs associated with each protein function, we define meta-composite motifs each of which is regarded as a time-independent diagrammatic representation of a biological process. It is shown that meta-composite motifs provide richer annotations of biological processes than sequence clusters. The present results serve as a basis for bridging atomic structures to higher-order biological phenomena by classification and integration of binding site structures.Comment: 34 pages, 7 figure

    Fine-Scale Haplotype Structure Reveals Strong Signatures of Positive Selection in a Recombining Bacterial Pathogen

    Get PDF
    Identifying genetic variation in bacteria that has been shaped by ecological differences remains an important challenge. For recombining bacteria, the sign and strength of linkage provide a unique lens into ongoing selection. We show that derived allelesPeer reviewe

    μœ μ „μ²΄ 비ꡐ뢄석을 ν†΅ν•œ 포유λ₯˜ 감염성 λ°”μ΄λŸ¬μŠ€μ˜ 진화에 λŒ€ν•œ 톡찰

    Get PDF
    ν•™μœ„λ…Όλ¬Έ(석사)--μ„œμšΈλŒ€ν•™κ΅ λŒ€ν•™μ› :농업생λͺ…κ³Όν•™λŒ€ν•™ 농생λͺ…곡학뢀,2019. 8. κΉ€ν¬λ°œ.감염성 λ°”μ΄λŸ¬μŠ€λŠ” 인간을 λΉ„λ‘―ν•œ λ§Žμ€ μ’…μ˜ 동물을 κ°μ—Όμ‹œμΌœ λŒμ΄ν‚¬ 수 μ—†λŠ” κ²°κ³Όλ₯Ό μ΄ˆλž˜ν•˜κΈ°λ„ ν•©λ‹ˆλ‹€. μˆ˜λ§Žμ€ μ‚¬λžŒμ„ μ£½μŒμ— 이λ₯΄κ²Œ ν•˜λŠ” 것은 λ¬Όλ‘ , 맀 ν•΄λ§ˆλ‹€ λŒ€κ·œλͺ¨ κ°€μΆ• κ°μ—Όμ‚¬λ‘€λ‘œ μΈν•˜μ—¬ 좕산업에 μ»€λ‹€λž€ 경제적 ν”Όν•΄λ₯Ό 끼치고 μžˆμŠ΅λ‹ˆλ‹€. κ·Έλ ‡κΈ° λ•Œλ¬Έμ— 감염성 λ°”μ΄λŸ¬μŠ€μ— λŒ€ν•œ μΆ©λΆ„ν•œ 연ꡬ가 ν•„μš”ν•©λ‹ˆλ‹€. λ°”μ΄λŸ¬μŠ€λŠ” λ‹€λ₯Έ λ―Έμƒλ¬Όμ΄λ‚˜ 생λͺ…체에 λΉ„ν•˜μ—¬ μœ μ „μž λ³€ν˜•μ΄ 보닀 λΉ λ₯΄κ³  λ¬΄μž‘μœ„λ‘œ μ΄λ£¨μ–΄μ§€λŠ” νŠΉμ§•μ΄ μžˆμŠ΅λ‹ˆλ‹€. λŒ€λΆ€λΆ„μ˜ λ°”μ΄λŸ¬μŠ€λŠ” μˆ™μ£Όμ˜ 쒅에 따라 감염 μ—¬λΆ€κ°€ λ‹¬λΌμ§€μ§€λ§Œ, λ‰΄ν΄λ ˆμ˜€νƒ€μ΄λ“œμ™€ μ•„λ―Έλ…Έμ‚° μ„œμ—΄ ν•˜λ‚˜μ˜ λ³€ν˜•μœΌλ‘œλ„ μƒˆλ‘œμš΄ μ’…μ˜ μˆ™μ£Όλ₯Ό κ°μ—Όμ‹œν‚€κ±°λ‚˜ κ·Έ 독성이 달라지기도 ν•˜κΈ° λ•Œλ¬Έμ— κ·Έλ“€μ˜ μœ μ „μ²΄ μ°¨μ›μ—μ„œμ˜ νŠΉμ§•μ„ λ°œκ²¬ν•˜κ³  λΆ„μ„ν•˜λŠ” 것은 상업적 및 과학적 μ£Όμš”ν•œ κ°€μΉ˜λ₯Ό μ œκ³΅ν•©λ‹ˆλ‹€. μ΄λŸ¬ν•œ μœ μ „μ²΄ νŠΉμ§• μ€‘μ—μ„œ 단일 μœ μ „μž 변이체(Single Nucleotide and Amino acid variant)λŠ” λ§Žμ€ μ—°κ΅¬μ—μ„œ 연ꡬ λŒ€μƒμœΌλ‘œ μ‚¬μš©λ˜κ³  μžˆμŠ΅λ‹ˆλ‹€. μ‹€μ œμ μœΌλ‘œ λ°”μ΄λŸ¬μŠ€ μ—°κ΅¬μ—μ„œ λ°”μ΄λŸ¬μŠ€μ˜ 쒅을 λ™μ •ν•˜κ±°λ‚˜ λ°±μ‹  개발 λ“± λ‹€μ–‘ν•œ 뢄야에 μ‚¬μš©λ˜κ³  μžˆμŠ΅λ‹ˆλ‹€. 챕터 2μ§€μΉ΄λ°”μ΄λŸ¬μŠ€λŠ” 일반적인 성인이 κ°μ—Όλ˜μ—ˆμ„ μ‹œμ—λŠ” 지카열, 두톡 및 κ΄€μ ˆν†΅ λ“±μ˜ 증상을 μœ λ°œν•˜μ§€λ§Œ μž„μ‚°λΆ€κ°€ κ°μ—Όλ˜μ—ˆμ„ μ‹œμ—λŠ” νƒœμ•„μ˜ μ†Œλ‘μ¦μ„ μΌμœΌν‚€λŠ” 것과 연관이 μžˆλ‹€κ³  μ•Œλ €μ Έ μžˆμŠ΅λ‹ˆλ‹€. μ§€λ‚œ 10λ…„κ°„ μ „ 세계에 폭발적으둜 퍼져 λ‚˜κ°”μœΌλ©° λ§Žμ€ ν•™μžλ“€μ΄ μ§€μΉ΄λ°”μ΄λŸ¬μŠ€μ˜ λΆ„μž λ©”μ»€λ‹ˆμ¦˜μ— λŒ€ν•œ 연ꡬλ₯Ό μˆ˜ν–‰ν–ˆμŠ΅λ‹ˆλ‹€. κ·ΈλŸ¬λ‚˜ μΉ˜λ£Œμ™€ μ˜ˆλ°©μ„ μœ„ν•œ μ˜μ•½ν’ˆ 및 λ°±μ‹  κ°œλ°œμ€ μ•„μ§κΉŒμ§€ 진행 쀑이며 보닀 λ§Žμ€ μœ μ „μ²΄ μˆ˜μ€€μ—μ„œμ˜ 연ꡬ가 ν•„μš”ν•©λ‹ˆλ‹€. 이 μ—°κ΅¬μ—μ„œ κ³΅κ°œλ°μ΄ν„°λ² μ΄μŠ€λ‘œλΆ€ν„° 이용 κ°€λŠ₯ν•œ μ§€μΉ΄λ°”μ΄λŸ¬μŠ€μ˜ NGS μœ μ „μ²΄ 데이터λ₯Ό μˆ˜μ§‘ν•˜κ³  뢄석을 ν†΅ν•˜μ—¬ 지리적, μ‹œκΈ°μ  관점을 κ³ λ €ν•œ 지역 특이적 μœ μ „μ²΄ 변이(Single Nucleotide and Amino Acid variants)λ₯Ό μœ μ „μž 마컀둜써 μ œμ‹œν•˜μ˜€μŠ΅λ‹ˆλ‹€. 진화적 연관뢄석과 μžμœ¨ν•™μŠ΅ k-means ν΄λŸ¬μŠ€ν„°λ§ μ•Œκ³ λ¦¬μ¦˜μ„ μ΄μš©ν•˜μ—¬ 4개의 λŒ€ν‘œκ·Έλ£Ήμ„ μ„ μ •ν•˜μ˜€μŠ΅λ‹ˆλ‹€. λŒ€ν‘œ 4그룹에 μ΄ˆμ μ„ λ§žμΆ”μ–΄ ν†΅κ³„μ μœΌλ‘œ μœ μ˜λ―Έν•œ μœ μ „μ²΄ 변이듀을 μ°Ύμ•„λ‚΄κ³  dN/dS 진화 λΆ„μ„μœΌλ‘œ μ§„ν™”μ μœΌλ‘œ κ°€μ†ν™”λœ λ‹¨λ°±μ§ˆ μ•”ν˜Έν™” μ˜μ—­μ„ ν™•μΈν–ˆμŠ΅λ‹ˆλ‹€. 이후 κ·Έλ£Ή κΈ°λŠ₯μ„± λ‹¨λ°±μ§ˆ μ˜μ—­κ³Ό B-cell, T-cell 특이적 항원결정기 후보λ₯Ό μ˜ˆμΈ‘ν•˜μ—¬ μ°Ύμ•„λ‚Έ μœ μ „μ²΄ 변이듀이 λ‹¨λ°±μ§ˆ 및 항원결정기 ν˜•μ„±μ˜ 결정적인 역할을 ν™•μΈν•˜μ—¬ 그룹별 μ£Όμš” μœ μ „μž 마컀둜써 μ œμ•ˆν•˜μ˜€μŠ΅λ‹ˆλ‹€. 챕터 3μΈν”Œλ£¨μ—”μžμ˜ μƒˆλ‘œμš΄ νƒ€μž…μœΌλ‘œ λΆ„λ₯˜λœ μΈν”Œλ£¨μ—”μž D λ°”μ΄λŸ¬μŠ€λŠ” μ†Œλ₯Ό λΉ„λ‘―ν•œ λ°˜μΆ”λ™λ¬Όμ„ κ°μ—Όμ‹œν‚€λŠ” ν˜Έν‘κΈ°μ„± λ°”μ΄λŸ¬μŠ€μž…λ‹ˆλ‹€. 감염 증상은 κ²½λ―Έν•˜μ§€λ§Œ λ‹€λ₯Έ 치λͺ…적인 ν˜Έν‘κΈ°μ„± λ°”μ΄λŸ¬μŠ€ 감염을 μœ λ°œν•˜κ³  μΈκ°„μ—κ²Œλ„ 감염될 수 μžˆλŠ” μž μž¬μ„±μ΄ 있기 λ•Œλ¬Έμ— μœ μ „μ²΄ μ°¨μ›μ—μ„œμ˜ 연ꡬλ₯Ό μˆ˜ν–‰ν•˜μ˜€μŠ΅λ‹ˆλ‹€. μΈν”Œλ£¨μ—”μž D λ°”μ΄λŸ¬μŠ€μ˜ λͺ¨λ“  μœ μ „μž λ‹¨νŽΈ NGS데이터λ₯Ό μ΄μš©ν•œ μœ μ „μ²΄ νŠΉμ„± 및 진화적 상관관계 λΆ„μ„μœΌλ‘œ ν•˜λ‚˜μ˜ μœ μ „μž λ‹¨νŽΈμ„ ν†΅ν•œ λΆ„μ„μ˜ κ²°κ³Όμ™€μ˜ 차이점을 λ°ν˜€λƒˆμŠ΅λ‹ˆλ‹€. κ·Έ κ²°κ³Όλ₯Ό ν† λŒ€λ‘œ μ„ μ •ν•œ λŒ€ν‘œ 그룹을 초점으둜, ν†΅κ³„μ μœΌλ‘œ μœ μ˜λ―Έν•œ 특이적 μœ μ „μ²΄ 변이λ₯Ό μ°Ύμ•„λƒˆμŠ΅λ‹ˆλ‹€. 이후 dN/dS 진화 뢄석과 λ‹¨λ°±μ§ˆ μ½”λ”©μ˜μ—­, B-cell 특이적 항원결정기 예츑 뢄석 결과와 λΉ„κ΅ν•˜μ—¬ κ·Έλ£Ή 특이적 μœ μ „μž 마컀둜써 μ œμ•ˆν•˜μ˜€μŠ΅λ‹ˆλ‹€. 이 연ꡬλ₯Ό ν†΅ν•˜μ—¬ 감염성 λ°”μ΄λŸ¬μŠ€μ˜ 그룹별 특이적 μœ μ „μž 마컀λ₯Ό μ œμ‹œν•˜κ³  이 λ§ˆμ»€κ°€ μƒˆλ‘œμš΄ λ°”μ΄λŸ¬μŠ€ μ’…μ˜ 동정과 병독성 진화에 λŒ€ν•œ 톡찰, 그리고 λ°±μ‹  κ°œλ°œμ— 도움을 쀄 수 μžˆμ„ κ²ƒμž…λ‹ˆλ‹€.Infectious viruses infect many species of animal, including human, and cause irreversible consequence. They bring fetal death to human and cause massive economic losses to livestock industry due to the large-scale infection. Therefore, we need more research on infectious viruses. Viruses have faster and random genetic variable features than other organisms. Most viruses are susceptible to infection depending on the host species. However, since a single nucleotide and amino acid sequence variation leads infection to a new species or alter its toxicity, genomic level of virus research provides major commercial and scientific value. Therefore, many researchers focus on the single genetic variation for identification of a new virus species or vaccine study. Chapter 1Zika virus (ZIKV) is known to be associated with a serious brain disease, fetal microcephaly in pregnant women, and has been explosively spread throughout the world over the last decade. Virologists of most countries attempted investigations of ZIKV molecular mechanisms to prevent the worldwide proliferation. However, only few genetic variants in several regions were anticipated as targets of vaccines and medicines. Here, I analyzed all of available ZIKV complete genomes from the Virus Pathogen Resource (ViPR) database to identify novel genetic markers by considering geographical and temporal perspectives. By principal component and phylogenetic analysis, ZIKV strains formed four clusters according to collected continent. Focusing on the major groups in African, Asian, Central America and Caribbean, I found single nucleotide variants (SNVs) supported by statistical significance. From the dN/dS analysis, I identified the protein coding regions that were evolutionary accelerated in each group. Out of the intercontinental SNVs, non-synonymous and synonymous variants on functional protein domains and predicted B-cell and T-cell epitopes were suggested as regional markers. I believe these local genetic markers can improve medical strategies for ZIKV prevention, diagnosis, and treatment. Chapter 2Influenza D virus (IDV), a new type of influenza, is a respiratory virus that infects ruminants, including cattle. Because the infection symptoms of IDV are mild, but, causes fatal infection of other respiratory viruses and have potential for infection in human, I conducted researches at the genomic level. Using the results of phylogeny and principal coordinate analysis (PCoA), we compared concatenated all of coding sequence dataset and each of genes coding sequence dataset. I confirmed that concatenated dataset results were more appropriately clustered into four groups with isolated region, and I selected the main three groups. Focusing on the main three groups, I found statistically significant genetic markers in comparison with dN/dS analysis, searching protein coding region, and B-cell epitope prediction analysis. Through this study, I suggest local-specific genetic markers of infectious virus, and these markers will give a deep insight for further studies.ABSTRACT IV CONTENTS VII LIST OF TABLES VIII LIST OF FIGURES IX CHAPTER 1. LITERATURE REVIEW 1 CHAPTER 2. IDENTIFICATION OF LOCAL-SPECIFIC GENETIC MARKERS OF ZIKA VIRUS ACROSS THE ENTIRE GLOBE 7 2.1 ABSTRACT 8 2.2 INTRODUCTION 9 2.3 MATERIALS AND METHODS 12 2.4 RESULTS 18 2.5 DISCUSSION 26 CHAPTER 3. LOCAL GENETIC MARKERS CLUSTERED BY CODING SEQUENCES OF INFLUENZA D VIRUS 56 3.1 ABSTRACT 57 3.2 INTRODUCTION 59 3.3 MATERIALS AND METHODS 61 3.4 RESULTS 66 3.5 DISCUSSION 72 REFERENCES 93 μš”μ•½(ꡭ문초둝) 100Maste

    A bioinformatics toolkit: in silico tools and online resources for investigating genetic variation

    Get PDF
    With the advent of large-scale next-generation sequencing initiatives, there is an increasing importance to interpret and understand the potential phenotypic influence of identified genetic variation and its significance in the human genome. Bioinformatics analyses can provide useful information to assist with variant interpretation. This review provides an overview of tools/resources currently available, and how they can help predict the impact of genetic variation at the deoxyribonucleic acid, ribonucleic acid, and protein level

    Understanding The Intra And Inter-Cellular Interaction Complexities And Flexibilities Using Systems And Sequence Analysis Approach

    Get PDF
    The present thesis work has been undertaken to gain an understanding of intra-cellular or inter-cellular interactions between bio-molecular entities utilizing either a systems analysis based perspective or different sequence analysis approaches. During this study different principles likely to be prevalent among intra-cellular and inter-cellular interactions have been studied with the help of computational approaches. Broadly, the complexities in intra-cellular interactions have been studied by determining the effect of perturbations such as over-expression or down-regulation of a key regulator on the intra-cellular interaction network architecture or its components. In particular, network analysis of regulatory network proteins in association with the intra-cellular proteinprotein interaction network, led to a key observation that topologically important effector proteins in the regulatory network could be important signaling proteins. Identification of such important effector proteins essential for the regulatory network integrity of a key regulator may be performed by network analysis. It is likely that alterations in these important effector proteins may lead to disruptions in cellular physiology and as such in this manner probable disease associated entities can be determined. Alternately, the flexibility among protein-protein interactions has been studied by analyzing homologous sequence families of interacting proteins with the help of information theory based measures like mutual information and Bhattacharyya co-efficient. Since interacting proteins may co-evolve, co-variation may allow the preservation of a functional interaction between co-evolving proteins and interdependent residue pair alterations may occur as a result of evolutionary pressure. Analysis of molecular co-evolution in inter-cellular protein interaction complexes determined that co-evolutionary pairings may be present among interface and noninterface residue pairs and such positions are likely to be crucial for a functional interaction between these sets of proteins. Therefore, utilising information contained in biological sequences, co-evolutionary pairings involving structurally or functionally crucial residue positions in disease associated inter-cellular protein-protein interaction complexes were predicted. Thus, different computational approaches have been utilised to study a particular hypothesis in a disease scenario in order to delineate certain themes prevalent in intra-cellular or inter-cellular interactions among bio-molecular entities while predicting disease associated entities or studying interaction patterns among them
    • …
    corecore