172 research outputs found

    MICRO$EC: Cost Effective, Whole-Genome Sequencing

    Get PDF
    While the feasibility of whole human genome sequencing was proven by the success of the Human Genome Project several years ago, the prevalence of personal genome sequencing in the medical industry is still elusive due to its unrealistic cost and time requirements. Microeqisastartupcompanywiththegoalofovercomingtheselimitationsbysequencingaminimumof12completehumangenomesperdayatanerrorratelessthantenpartsinmillionataprofitablemarketpriceoflessthanUSeq is a startup company with the goal of overcoming these limitations by sequencing a minimum of 12 complete human genomes per day at an error rate less than ten parts in million at a profitable market price of less than US1000 per genome. To overcome the technology bottlenecks hindering current biotech companies from achieving these target throughput, error rate, and market price goals, Microeqhasdevelopedaninnovativesequencingtechniquethatusesshortreadfragmentswithhighcoverageonamicrofluidicsplatform.Short,amplifiedDNAfragmentsaregeneratedfromaninputofcustomersaliva.6basepair(bp)sequencehybridizationisusedforsequencingeachoftheDNAfragmentsindividually.TheresultsarethesehydridizationreadsarethenassembledviadeBruijngraphtheoryandthegraphicalreconstructionsofeachfragmentssequencearethenassembledtoacompletegenomeviashotgunsequencingwithanexpectederrorratelessthan1in100,000bp.Uponthecompletionoffinancialanalysis,bothasmallscalebusinessmodelproducing72genomesperdayatUSeq has developed an innovative sequencing technique that uses shortread fragments with high coverage on a microfluidics platform. Short, amplified DNA fragments are generated from an input of customer saliva. 6 base pair(bp) sequence hybridization is used for sequencing each of the DNA fragments individually. The results are these hydridization reads are then assembled via de Bruijn graph theory and the graphical reconstructions of each fragment’s sequence are then assembled to a complete genome via shotgun sequencing with an expected error rate less than 1 in 100,000bp. Upon the completion of financial analysis, both a small-scale business model producing 72 genomes per day at US999 per genome, and a largescale business model producing 52.2 genomes per year at a market price of US299pergenomewerefoundtobeprofitable,yieldingMicro299 per genome were found to be profitable, yielding Microeq investors return margins of ~90% and 300% for the small and large scale models, respectively. With a market price Micro$eq offers personal genome sequencing at one-tenth of its nearest potential competitor’s cost. Additionally, its ability for bulk-sequencing allows it to profitably venture into the previously untapped Pharmaceutical Industry market sector, enabling the creation of large-scale genome databases which are the next step forward in the quest for truly personalized

    The MGX framework for microbial community analysis

    Get PDF
    Jaenicke S. The MGX framework for microbial community analysis. Bielefeld: Universität Bielefeld; 2020

    Evaluation and Optimization of Bioinformatic Tools for the Detection of Human Foodborne Pathogens in Complex Metagenomic Datasets

    Get PDF
    Foodborne human pathogens pose a significant risk to human health as each year one in six Americans becomes sick from one of over 31 known human foodborne pathogens. Due to the differences in their growth requirements, current detection assays can only detect one to a few of these pathogens per single assay. Metagenomics, an emerging field, allows for an entire community of organisms to be analyzed from DNA or RNA sequence data generated from a single sample, and therefore has the potential to detect any and all foodborne pathogens present in a single complex matrix. However, currently available bioinformatic pipelines for metagenomic sequence analysis require extensive time and high computer power inputs, often with unreliable results. The objectives of this study are 1) to evaluate community profiling bioinformatic pipelines, mapping pipelines and a novel pipeline created at Oklahoma State University, E-probe Diagnostic Nucleic-acid Analysis (EDNA), for the detection of S. enterica (as a model foodborne pathogen) in metagenomic data, 2) to optimize EDNA pipeline for sensitive detection of the S. enterica in metagenomic data, and 3) to simultaneously detect multiple foodborne pathogens from a single metagenomic sample. EDNA was able to detect S. enterica in metagenomic data in approximately five minutes compared to the other pipelines, which took between 2-500 hours. The optimized parameters for the EDNA pipeline were limited to using cleaned Illumina data with a read depth of one. The minimum BLAST E-value was set to 10^-3 for curation. For detection the minimum percent identity was set to 95% and the minimum query coverage to 90% with an E-probe length of 80 nt. These new parameters significantly improved the sensitivity of the assay 100-fold, from 10^3 S. enterica cells detected by the original EDNA pipeline to just 10 cells. In the simultaneous detection of multiple foodborne pathogens, EDNA detected three additional pathogens Listeria monocytogenes, Campylobacter jejuni and Shiga toxin producing Escherichia coli at ten contamination levels in less than ten minutes and provided new detection insights into read abundance as it corresponds to pathogen cell numbers

    Development of a novel platform for high-throughput gene design and artificial gene synthesis to produce large libraries of recombinant venom peptides for drug discovery

    Get PDF
    Tese de Doutoramento em Ciências Veterinárias na Especialidade de Ciências Biológicas e BiomédicasAnimal venoms are complex mixtures of biologically active molecules that, while presenting low immunogenicity, target with high selectivity and efficacy a variety of membrane receptors. It is believed that animal venoms comprise a natural library of more than 40 million different natural compounds that have been continuously fine-tuned during the evolutionary process to disturb cellular function. Within animal venoms, reticulated peptides are the most attractive class of molecules for drug discovery. However, the use of animal venoms to develop novel pharmacological compounds is still hampered by difficulties in obtaining these low molecular mass cysteine-rich polypeptides in sufficient amounts. Here, a high-throughput gene synthesis platform was developed to produce synthetic genes encoding venom peptides. The final goal of this project is the production of large libraries of recombinant venom peptides that can be screened for drug discovery. A robust and efficient Polymerase Chain Reaction (PCR) methodology was refined to assemble overlapping oligonucleotides into small artificial genes (< 500 bp) with high-fidelity. In addition, two bioinformatics tools were constructed to design multiple optimized genes (ATGenium) and overlapping oligonucleotides (NZYOligo designer), in order to allow automation of the high-throughput gene synthesis platform. The platform can assemble 96 synthetic genes encoding venom peptides simultaneously, with an error rate of 1.1 mutations per kb. To decrease the error rate associated with artificial gene synthesis, an error removal step using phage T7 endonuclease I was designed and integrated into the gene synthesis methodology. T7 endonuclease I was shown to be highly effective to specifically recognize and cleave DNA mismatches allowing a dramatically reduction of error frequency in large synthetic genes, from 3.45 to 0.43 errors per kb. Combining the knowledge acquired in the initial stages of the work, a comprehensive study was performed to investigate the influence of gene design, presence of fusion tags, cellular localization of expression, and usage of Tobacco Etch Virus (TEV) protease for tag removal, on the recombinant expression of disulfide-rich venom peptides in Escherichia coli. Codon usage dramatically affected the levels of recombinant expression in E. coli. In addition, a significant pressure in the usage of the two cysteine codons suggests that both need to be present at equivalent levels in genes designed de novo to ensure high levels of expression. This study also revealed that DsbC was the best fusion tag for recombinant expression of disulfide-rich peptides, in particular when expression of the fusion peptide was directed to the bacterial periplasm. TEV protease was highly effective for efficient tag removal and its recognition sites can tolerate all residues at its C-terminal, with exception of proline, confirming that no extra residues need to be incorporated at the N-terminus of recombinant venom peptides. This study revealed that E. coli is a convenient heterologous host for the expression of soluble and potentially functional venom peptides. Thus, this novel high-throughput gene synthesis platform was used to produce ~5,000 synthetic genes with a low error rate. This genetic library supported the production of the largest library of recombinant venom peptides constructed until now. The library contains 2736 animal venom peptides and it is presently being screened for the discovery of novel drug leads related to different diseases.RESUMO - Desenvolvimento de uma nova plataforma de alta capacidade para desenhar e sintetizar genes artificiais, para a produção de péptidos venómicos recombinantes - Os venenos animais são misturas complexas de moléculas biologicamente activas que se ligam com elevada selectividade e eficácia a uma grande variedade de receptores de membrana. Embora apresentem baixa imunogenicidade, os venenos podem afectar a função celular actuando ao nível dos seus receptores. Actualmente, pensa-se que os venenos de animais constituam uma biblioteca natural de mais de 40 milhões de moléculas diferentes que têm sido continuamente aperfeiçoadas ao longo do processo evolutivo. Tendo em conta a composição dos venenos, os péptidos reticulados são a classe mais atractiva de moléculas com interesse farmacológico. No entanto, a utilização de venenos para o desenvolvimento de novos fármacos está limitada por dificuldades em obter estas moléculas em quantidades adequadas ao seu estudo. Neste trabalho desenvolveu-se uma plataforma de alta capacidade para a síntese de genes sintéticos codificadores de péptidos venómicos, com o objectivo de produzir bibliotecas de péptidos venómicos recombinantes que possam ser rastreadas para a descoberta de novos medicamentos. Com o objectivo de sintetizar genes pequenos (< 500 pares de bases) com elevada fidelidade e em simultâneo, desenvolveu-se uma metodologia de PCR (polymerase chain reaction) robusta e eficiente, que se baseia na extensão de oligonucleótidos sobrepostos. Para possibilitar a automatização da plataforma de síntese de genes, foram construídas duas ferramentas bioinformáticas para desenhar simultaneamente dezenas a milhares de genes optimizados para a expressão em Escherichia coli (ATGenium) e os respectivos oligonucleótios sobrepostos (NZYOligo designer). Esta plataforma foi optimizada para sintetizar em simultâneo 96 genes sintéticos, tendo-se obtido uma taxa de erro de 1.1 mutações por kb de DNA sintetizado. A fim de diminuir a taxa de erro associada à produção de genes sintéticos, desenvolveu-se um método para remoção de erros utilizando a enzima T7 endonuclease I. A enzima T7 endonuclease I mostrou-se muito eficaz no reconhecimento e clivagem de moléculas DNA que apresentam emparelhamentos incorrectos, reduzindo drasticamente a frequência de erros identificados em genes grandes, de 3.45 para 0.43 erros por kb de DNA sintetizado. Investigou-se também a influência do desenho dos genes, da presença de tags de fusão, da localização celular da expressão e da actividade da protease Tobacco Etch Virus (TEV) para a remoção eficiente de tags, na expressão de péptidos venómicos ricos em cisteínas em E. coli. A utilização de codões meticulosamente escolhidos afectou drasticamente os níveis de expressão em E. coli. Para além disso, os resultados mostram que existe uma pressão significativa no uso dos dois codões que codificam para o resíduo cisteína, o que sugere que ambos os codões têm de estar presentes, em níveis equivalentes, nos genes que foram desenhados e optimizados para garantir elevados níveis de expressão. Este trabalho indicou também que o tag de fusão DsbC foi o mais apropriado para a expressão eficiente de péptidos venómicos ricos em cisteínas, particularmente quando os péptidos recombinantes foram expressos no periplasma bacteriano. Confirmou-se que a protease TEV é eficaz na remoção de tags de fusão, podendo o seu local de reconhecimento conter quaisquer aminoácidos na extremidade C-terminal, com excepção da prolina. Desta forma, verificou-se não ser necessário incorporar qualquer aminoácido extra na extremidade N-terminal dos péptidos venómicos recombinantes. Reunindo todos os resultados, verificou-se que a E. coli é um hospedeiro adequado para a expressão, na forma solúvel, de péptidos venómicos potencialmente funcionais. Por último, foram produzidos, com uma taxa de erro reduzida, ~5000 genes sintéticos codificadores de péptidos venómicos utilizando a nova plataforma de elevada capacidade para a síntese de genes aqui desenvolvida. A nova biblioteca de genes sintéticos foi usada para produzir a maior biblioteca de péptidos venómicos recombinantes construída até agora, que inclui 2736 péptidos venómicos. Esta biblioteca recombinante está presentemente a ser rastreada com o objectivo de descobrir novas drogas com interesse para a saúde humana

    Evidence-ranked motif identification

    Get PDF
    A new computational method for the identification of regulatory motifs from large genomic datasets is presented her

    CAD Tools for DNA Micro-Array Design, Manufacture and Application

    Get PDF
    Motivation: As the human genome project progresses and some microbial and eukaryotic genomes are recognized, numerous biotechnological processes have attracted increasing number of biologists, bioengineers and computer scientists recently. Biotechnological processes profoundly involve production and analysis of highthroughput experimental data. Numerous sequence libraries of DNA and protein structures of a large number of micro-organisms and a variety of other databases related to biology and chemistry are available. For example, microarray technology, a novel biotechnology, promises to monitor the whole genome at once, so that researchers can study the whole genome on the global level and have a better picture of the expressions among millions of genes simultaneously. Today, it is widely used in many fields- disease diagnosis, gene classification, gene regulatory network, and drug discovery. For example, designing organism specific microarray and analysis of experimental data require combining heterogeneous computational tools that usually differ in the data format; such as, GeneMark for ORF extraction, Promide for DNA probe selection, Chip for probe placement on microarray chip, BLAST to compare sequences, MEGA for phylogenetic analysis, and ClustalX for multiple alignments. Solution: Surprisingly enough, despite huge research efforts invested in DNA array applications, very few works are devoted to computer-aided optimization of DNA array design and manufacturing. Current design practices are dominated by ad-hoc heuristics incorporated in proprietary tools with unknown suboptimality. This will soon become a bottleneck for the new generation of high-density arrays, such as the ones currently being designed at Perlegen [109]. The goal of the already accomplished research was to develop highly scalable tools, with predictable runtime and quality, for cost-effective, computer-aided design and manufacturing of DNA probe arrays. We illustrate the utility of our approach by taking a concrete example of combining the design tools of microarray technology for Harpes B virus DNA data

    Mass Spectrometry: An Ideal Method For Rna Modification Analysis

    Get PDF
    Currently there is no good way to measure and find the exact location of multiple RNA modifications. Existing technology can effectively find single varieties of modifications, but cannot identify co-occurrence. As the field of proteomics has shown, mass spectrometry is a powerful and versatile technique assessing broad ranges of chemical modifications in the context of the cellular environment. In this project I used our expertise in proteomics to build a mass spectrometry based platform for identifying RNA modifications. I have since set up both software and analytical platforms querying RNA modifications, and used this platform to survey human tRNA samples and identify what modifications there are, and where they occur

    Discovering Biomarkers of Alzheimer's Disease by Statistical Learning Approaches

    Get PDF
    In this work, statistical learning approaches are exploited to discover biomarkers for Alzheimer's disease (AD). The contributions has been made in the fields of both biomarker and software driven studies. Surprising discoveries were made in the field of blood-based biomarker search. With the inclusion of existing biological knowledge and a proposed novel feature selection method, several blood-based protein models were discovered to have promising ability to separate AD patients from healthy individuals. A new statistical pattern was discovered which can be potential new guideline for diagnosis methodology. In the field of brain-based biomarker, the positive contribution of covariates such as age, gender and APOE genotype to a AD classifier was verified, as well as the discovery of panel of highly informative biomarkers comprising 26 RNA transcripts. The classifier trained by the panetl of genes shows excellent capacity in discriminating patients from control. Apart from biomarker driven studies, the development of statistical packages or application were also involved. R package metaUnion was designed and developed to provide advanced meta-analytic approach applicable for microarray data. This package overcomes the defects appearing in previous meta-analytic packages { 1) the neglection of missing data, 2) the in exibility of feature dimension 3) the lack of functions to support post-analysis summary. R package metaUnion has been applied in a published study as part of the integrated genomic approaches and resulted in significant findings. To provide benchmark references about significance of features for dementia researchers, a web-based platform AlzExpress was built to provide researchers with granular level of differential expression test and meta-analysis results. A combination of fashionable big data technologies and robust data mining algorithms make AlzExpress flexible, scalable and comprehensive platform of valuable bioinformatics in dementia research.Plymouth Universit

    Extracellular vesicles and their nucleic acids for biomarker discovery

    Get PDF
    Extracellular vesicles (EVs) are a heterogenous population of vesicles originate from cells. EVs are found in different biofluids and carry different macromolecules, including proteins, lipids, and nucleic acids, providing a snap shot of the parental cells at the time of release. EVs have the ability to transfer molecular cargoes to other cells and can initiate different physiological and pathological processes. Mounting lines of evidence demonstrated that EVs' cargo and machinery is affected in disease states, positioning EVs as potential sources for the discovery of novel biomarkers. In this review, we demonstrate a conceptual overview of the EV field with particular focus on their nucleic acid cargoes. Current knowledge of EV subtypes, nucleic acid cargo and pathophysiological roles are outlined, with emphasis placed on advantages against competing analytes. We review the utility of EVs and their nucleic acid cargoes as biomarkers and critically assess the newly available advances in the field of EV biomarkers and high throughput technologies. Challenges to achieving the diagnostic potential of EVs, including sample handling, EV isolation, methodological considerations, and bioassay reproducibility are discussed. Future implementation of ‘omics-based technologies and integration of systems biology approaches for the development of EV-based biomarkers and personalized medicine are also considered
    corecore