22 research outputs found

    Computational Analysis of Microbial Sequence Data Using Statistics and Machine Learning

    Get PDF
    Since the discovery of the double helix of DNA in 1953, modern molecular biology has opened the door to a better understanding of how genes control chemical processes within cells, including protein synthesis. Although we are still far from claiming a complete understanding, recent advances in sequencing technologies, increased computational capacity, and more sophisticated computational methods have allowed the development of various new applications that provide further insight into DNA sequence data and how the information they encode impacts living organisms and their environment. Sequencing data can now be used to start identifying the relationships between microorganisms, where they live, and in some cases how they affect their host organisms. We introduce and compare methods used for this bioinformatics application, and develop a machine learning model that can be used to effectively predict environmental factors associated with these microorganisms. Codon Usage Bias (CUB), which refers to the highly non-uniform usage of codons that code for the same amino acid has been known to reflect the expression level of a protein-coding gene under the evolutionary theory that selection favors certain synonymous codons. Traditional methods used to estimate CUB and its relation with protein translation have been proven effective on single-celled organisms such as yeast and E. coli, but their applications are limited when it comes to more complex multi-cellular organisms such as plants and animals. To extend our abilities to further understand the relations between codon usage patterns and the protein translation processes in these organisms, we develop a novel deep learning model that can discover patterns in codon usage bias between different species using only their DNA sequences

    Comparison of Three Assembly Strategies for a Heterozygous Seedless Grapevine Genome Assembly

    Get PDF
    BackgroundDe novo heterozygous assembly is an ongoing challenge requiring improved assembly approaches. In this study, three strategies were used to develop de novo Vitis vinifera ‘Sultanina’ genome assemblies for comparison with the inbred V. vinifera (PN40024 12X.v2) reference genome and a published Sultanina ALLPATHS-LG assembly (AP). The strategies were: 1) a default PLATANUS assembly (PLAT_d) for direct comparison with AP assembly, 2) an iterative merging strategy using METASSEMBLER to combine PLAT_d and AP assemblies (MERGE) and 3) PLATANUS parameter modifications plus GapCloser (PLAT*_GC).ResultsThe three new assemblies were greater in size than the AP assembly. PLAT*_GC had the greatest number of scaffolds aligning with a minimum of 95% identity and ≥1000 bp alignment length to V. vinifera (PN40024 12X.v2) reference genome. SNP analysis also identified additional high quality SNPs. A greater number of sequence reads mapped back with zero-mismatch to the PLAT_d, MERGE, and PLAT*_GC (\u3e94%) than was found in the AP assembly (87%) indicating a greater fidelity to the original sequence data in the new assemblies than in AP assembly. A de novo gene prediction conducted using seedless RNA-seq data predicted \u3e 30,000 coding sequences for the three new de novo assemblies, with the greatest number (30,544) in PLAT*_GC and only 26,515 for the AP assembly. Transcription factor analysis indicated good family coverage, but some genes found in the VCOST.v3 annotation were not identified in any of the de novo assemblies, particularly some from the MYB and ERF families.ConclusionsThe PLAT_d and PLAT*_GC had a greater number of synteny blocks with the V. vinifera (PN40024 12X.v2) reference genome than AP or MERGE. PLAT*_GC provided the most contiguous assembly with only 1.2% scaffold N, in contrast to AP (10.7% N), PLAT_d (6.6% N) and Merge (6.4% N). A PLAT*_GC pseudo-chromosome assembly with chromosome alignment to the reference genome V. vinifera, (PN40024 12X.v2) provides new information for use in seedless grape genetic mapping studies. An annotated de novo gene prediction for the PLAT*_GC assembly, aligned with VitisNet pathways provides new seedless grapevine specific transcriptomic resource that has excellent fidelity with the seedless short read sequence data

    Large expert-curated database for benchmarking document similarity detection in biomedical literature search

    Get PDF
    Document recommendation systems for locating relevant literature have mostly relied on methods developed a decade ago. This is largely due to the lack of a large offline gold-standard benchmark of relevant documents that cover a variety of research fields such that newly developed literature search techniques can be compared, improved and translated into practice. To overcome this bottleneck, we have established the RElevant LIterature SearcH consortium consisting of more than 1500 scientists from 84 countries, who have collectively annotated the relevance of over 180 000 PubMed-listed articles with regard to their respective seed (input) article/s. The majority of annotations were contributed by highly experienced, original authors of the seed articles. The collected data cover 76% of all unique PubMed Medical Subject Headings descriptors. No systematic biases were observed across different experience levels, research fields or time spent on annotations. More importantly, annotations of the same document pairs contributed by different scientists were highly concordant. We further show that the three representative baseline methods used to generate recommended articles for evaluation (Okapi Best Matching 25, Term Frequency-Inverse Document Frequency and PubMed Related Articles) had similar overall performances. Additionally, we found that these methods each tend to produce distinct collections of recommended articles, suggesting that a hybrid method may be required to completely capture all relevant articles. The established database server located at https://relishdb.ict.griffith.edu.au is freely available for the downloading of annotation data and the blind testing of new methods. We expect that this benchmark will be useful for stimulating the development of new powerful techniques for title and title/abstract-based search engines for relevant articles in biomedical research.Peer reviewe

    Research on Wang Xiaobo’s Humanitarian Enlightenment

    Get PDF
    In the 20th century of China, The enlightenment spirit was obviously active twice, once in the May 4th, 1919 when the New Culture Movement happen, and once in the 1980s. The core of the spirit of enlightenment is a kind of humanitarianism, which emphasizes rationality and freedom. And the core of Wang Xiaobo’s spiritual exactly is consistent with Humanitarian enlightenment, so the discussion of Wang Xiaobo’s ideological value can be summarized from the perspective of Humanitarian enlightenment: advocating science and rationality, advocating freedom and human rights, and the pursuit the true interest in life

    Comparison of Three Assembly Strategies for a Heterozygous Grapevine Genome Assembly

    No full text
    Each folder includes the following files. Assembly.fa: Genome assembly in scaffolds. Blast2GO_results.b2g: The full annotation of coding sequences. Coding_sequences.fa: The predicted coding sequences from masked assembly. Gene_models.gff: Gene predictions from masked assembly. Masked_assembly.fa: The masked genome assembly. Protein.fa: The predicted protein sequences. PLAT*_GC contains chromosomal assembly and annotation as pseudo_chromosomal_assembly folder. SnpEff folder contains results from SnpEff as SnpEff_summary, genes affected are described as vini_snp.txt.genes and vini_snpeff is output from SnpEff

    Comparison of three assembly strategies for a heterozygous seedless grapevine genome assembly

    Get PDF
    Abstract Background De novo heterozygous assembly is an ongoing challenge requiring improved assembly approaches. In this study, three strategies were used to develop de novo Vitis vinifera ‘Sultanina’ genome assemblies for comparison with the inbred V. vinifera (PN40024 12X.v2) reference genome and a published Sultanina ALLPATHS-LG assembly (AP). The strategies were: 1) a default PLATANUS assembly (PLAT_d) for direct comparison with AP assembly, 2) an iterative merging strategy using METASSEMBLER to combine PLAT_d and AP assemblies (MERGE) and 3) PLATANUS parameter modifications plus GapCloser (PLAT*_GC). Results The three new assemblies were greater in size than the AP assembly. PLAT*_GC had the greatest number of scaffolds aligning with a minimum of 95% identity and ≥1000 bp alignment length to V. vinifera (PN40024 12X.v2) reference genome. SNP analysis also identified additional high quality SNPs. A greater number of sequence reads mapped back with zero-mismatch to the PLAT_d, MERGE, and PLAT*_GC (>94%) than was found in the AP assembly (87%) indicating a greater fidelity to the original sequence data in the new assemblies than in AP assembly. A de novo gene prediction conducted using seedless RNA-seq data predicted > 30,000 coding sequences for the three new de novo assemblies, with the greatest number (30,544) in PLAT*_GC and only 26,515 for the AP assembly. Transcription factor analysis indicated good family coverage, but some genes found in the VCOST.v3 annotation were not identified in any of the de novo assemblies, particularly some from  the MYB and ERF families. Conclusions The PLAT_d and PLAT*_GC had a greater number of synteny blocks with the V. vinifera (PN40024 12X.v2) reference genome than AP or MERGE. PLAT*_GC provided the most contiguous assembly with only 1.2% scaffold N, in contrast to AP (10.7% N), PLAT_d (6.6% N) and Merge (6.4% N). A PLAT*_GC pseudo-chromosome assembly with chromosome alignment to the reference genome V. vinifera, (PN40024 12X.v2) provides new information for use in seedless grape genetic mapping studies. An annotated de novo gene prediction for the PLAT*_GC assembly, aligned with VitisNet pathways provides new seedless grapevine specific transcriptomic resource that has excellent fidelity with the seedless short read sequence data

    Effects of a liquid high-fat meal on postprandial lipid metabolism in type 2 diabetic patients with abdominal obesity

    No full text
    Abstract Background Postprandial lipemia and lipoprotein lipase (LPL) activity play crucial roles in the pathogenesis of accelerated atherosclerosis. This study aimed to evaluate the postprandial lipid metabolism after the ingestion of a liquid high-fat meal in type 2 diabetic patients with abdominal obesity, and determine if the PvuII polymorphisms of LPL influence their postprandial lipid responses. Methods Serum glucose, insulin, triglycerides (TG), total cholesterol (TC) and high density lipoprotein cholesterol (HDL-C) were measured in fasting and postprandial state at 0.5, 1, 2, 4, 6 and 8 h after a liquid high-fat meal in 51 type 2 diabetic patients with abdominal obesity, 31 type 2 diabetic patients without abdominal obesity and 39 controls. Their PvuII polymorphisms of LPL were tested in fasting. Results Type 2 diabetic patients with abdominal obesity had significantly higher postprandial areas under the curve (AUC) of glucose [least square mean difference (LSMD) = 30.763, 95% confidence interval (CI) = 23.071–38.455, F = 37.346, P < 0.05] and TC (LSMD = 3.995, 95% CI = 1.043–6.947, F = 3.681, P < 0.05) than controls. Postprandial AUCs for insulin, homeostasis model assessment-insulin resistance (HOMA-IR) and TG were higher (LSMD = 86.987, 95% CI = 37.421–136.553, F = 16.739, P < 0.05; LSMD = 37.456, 95% CI = 16.312–58.600, F = 27.012, P < 0.05; LSMD = 4.684, 95% CI = 2.662–6.705, F = 26.158, P < 0.05), whereas HDL-C AUC was lower (LSMD = −1.652, 95% CI = −2.685 – -0.620, F = 8.190, P < 0.05) in type 2 diabetic subjects with abdominal obesity than those without abdominal obesity. In type 2 diabetic patients with abdominal obesity, postprandial TG AUC was lower in P−/− than in P+/− (LSMD = −4.393, 95% CI = −9.278 – -0.491, F = 4.476, P < 0.05) and P+/+ (LSMD = −7.180, 95% CI = −12.319 – -2.014, F = 4.476, P < 0.05) phenotypes. Postprandial AUCs for glucose, insulin, HOMA-IR, TC and HDL-C were not different according to PvuII phenotypes. Conclusions Abdominal obesity exacerbates the postprandial lipid responses in type 2 diabetic patients, which partly explains the excess atherogenic risk in these patients. In addition, the presence of P+ allele could contribute to a greater postprandial TG increase in type 2 diabetic patients with abdominal obesity. Trial registration ChiCTR-IOR- 16008435 . Registered 8 May 2016

    Additional file 2: Figure S1. of Comparison of three assembly strategies for a heterozygous seedless grapevine genome assembly

    No full text
    Protein alignment with V. vinifera (PN40024 12X.v2, VCOSTv.3 proteins. a. Orthologous proteins for all seedless grape assemblies in relation to the V. vinifera VCOST.v3 (V. vinifera V3). b. Comparison of AP with the three de novo seedless assmemblies. (JPEG 107 kb
    corecore