340 research outputs found

    DNA as a medium for storing digital signals

    Get PDF
    Motivated by the storage capacity and efficiency of the DNA molecule in this paper we propose to utilize DNA molecules to store digital signals. We show that hybridization of DNA molecules can be used as a similarity criterion for retrieving digital signals encoded and stored in a DNA database. Since retrieval is achieved through hybridization of query and data carrying DNA molecules, we present a mathematical model to estimate hybridization efficiency (also known as selectivity annealing). We show that selectivity annealing is inversely proportional to the mean squared error (MSE) of the encoded signal values. In addition, we show that the concentration of the molecules plays the same role as the decision threshold employed in digital signal matching algorithms. Finally, similarly to the digital domain, we define a DNA signal-to-noise ratio (SNR) measure to assess the performance of the DNA-based retrieval scheme. Simulations are presented to validate our arguments

    Retrieval accuracy of very large DNA-Based databases of digital signals

    Get PDF
    In this paper a simulation of single query searches in very large DNA-based databases that are capable of storing and retrieving digital signals is presented. Similarly to the digital domain, a signal-to-noise ratio (SNR) measure to assess the performance of theDNA-based retrieval scheme in terms of database size and source statistics is defined. With approximations, it is shown that the SNR of any finite sizeDNA-based database is upper bounded by the SNR of an infinitely large one with the same source distribution. Computer simulations are presented to validate the theoretical outcomes

    Chromosomal-level assembly of the Asian Seabass genome using long sequence reads and multi-layered scaffolding

    Get PDF
    We report here the ~670 Mb genome assembly of the Asian seabass (Lates calcarifer), a tropical marine teleost. We used long-read sequencing augmented by transcriptomics, optical and genetic mapping along with shared synteny from closely related fish species to derive a chromosome-level assembly with a contig N50 size over 1 Mb and scaffold N50 size over 25 Mb that span ~90% of the genome. The population structure of L. calcarifer species complex was analyzed by re-sequencing 61 individuals representing various regions across the species' native range. SNP analyses identified high levels of genetic diversity and confirmed earlier indications of a population stratification comprising three clades with signs of admixture apparent in the South-East Asian population. The quality of the Asian seabass genome assembly far exceeds that of any other fish species, and will serve as a new standard for fish genomics

    A systems-based approach for detecting molecular interactions across tissues.

    Get PDF
    Current high-throughput gene expression experiments have a straightforward design of examining the gene expression of one group or condition relative to that of another. The data is typically analyzed as if they represent strictly intracellular events, and often treats genes as coming from a homogeneous population. Although intracellular events are crucial to nearly all biological processes, cell-cell interactions are often just as important, especially when gene expression data is generated from heterogeneous cell populations, such as from whole tissues. Cell-cell molecular interactions are generally lost in the available analytical procedures and as a result, are not examined experimentally, at least not accurately or with efficiency. Most importantly, this imposes major limitations when studying gene expression changes in multiple samples that interact with one another. In order to addresses the limitations of current techniques, we have developed a novel systems-based approach that expands the traditional analysis of gene expression in two stages. This includes a novel sequence-based meta-analytic tool, AbsIDconvert, that allows for conversion of annotated features using an interval tree for storing and querying absolute genomic coordinates for comparison of multi-scale macro-molecule identifiers across platforms and/or organisms. In addition, a systems-based heuristic algorithm is developed to find intercellular interactions between two sets of genes, potentially from different tissues by utilizing location information of each gene along with the information available in the secondary databases in the form of interactions, pathways and signaling. AbsIDconvert is shown to provide a high accuracy in identifier conversion as compared to other available methodologies (typically at an average rate of 84%) while maintaining a higher efficiency (O(n*log(n)). Our intercellular interaction approach and underlying visualization shows promise in allowing researchers to uncover novel signaling pathways in an intercellular fashion that to this point has not been possible

    Prediction and verification of NF-ÎșB targets in the porcine MHC through the use of sequence similarity and pathway inhibition

    Get PDF
    With the advent of high throughput technologies for both the sequencing of genomic DNA and the measure of the expression of RNA a tremendous amount of information has been generated and deposited into public databases. This large amount of data has led to the better understanding of how a genome is organized, the number of regions encoding information for transcripts, as well as how the amount of these transcripts change due to various perturbations a cell or organism encounters, whether it be an outside stimuli, such as bacteria or viruses, or internal, such as a mutation within the genome. Some species, such as the human and mouse, have had a significant amount of sequencing completed, leading to excellent reference genome sequences, as well as these sequences being well understood at the function and structure level, termed gene annotation. However, for most vertebrate species, their genomes are in various states of completion; from being nearly completed with partial annotation, like the pig, to having only portions of their genomes completed, such as Alatina moseri, a species of Hawaiian jellyfish. For these species the amount of direct annotation is greatly lacking compared to that of other species, such as human and mouse. When annotation is lacking for one species, it is possible to leverage the information already obtained for closely related but better-studied species by comparing sequences across species and identifying similar regions between them, allowing the annotations of these regions to be inferred across species. Once a species has sufficient sequence annotation, high throughput expression data, such as that from microarrays, can be better understood. One area of research that is under development, which can utilize high throughput expression measures, is understanding how a set of transcripts changing together in response to perturbations in the environment is controlled by specific proteins, called transcription factors, such as NF-ÎșB. NF-ÎșB is an important transcription factor, having a role in a variety of cellular functions, such as mounting a response to infection and preventing cell death by inhibiting apoptosis. While some transcription factors, like NF-ÎșB, have been well studied and many of its target genes identified, this identification is typically done one or a few genes at a time. However, as more genomes are sequenced, better algorithms developed for identification of possible targets, and new biological techniques optimized, the ability to predict and verify targets is also moving toward high throughput. In order to create more reliable gene annotation for the pig, raw porcine sequences were assembled into more full length sequences to create an accurate base for which to compare to other species, as well as identify possible sequence variation within the assembled sequences. This annotation was then used in a high throughput experiment to look for genes changing expression due to an inoculation of Salmonella choleraesuis in pigs, and to determine which genes are potential NF-ÎșB targets. Then, potential target genes found in an immune related region of the genome were tested in response of bacterial endotoxin either in the presence or absence of an NF-ÎșB inhibitor. The ability of NF-ÎșB to bind to their promoters was also tested using a labeled EMSA probe. Using these two methods, we show the murine H2-Eb1 and Trim26 and porcine C2 and UBD are novel targets of NF-ÎșB and that such bioinformatic predictions can be confirmed using molecular assays

    Chemometric methods for microarray data analysis and their application to leukemia subtype identification

    Full text link
    Verschiedene chemometrische Methoden wurden entwickelt, die die komplette Datenverarbeitungskette bei der Analyse von Affymetrix U133 DNA Biosensoren umfassen. Ziel war es die QualitĂ€t der Daten zu erhöhen. DafĂŒr wurden Indikatoren erstellt, mit deren Hilfe es möglich ist, Signale mangelnder QualitĂ€t zu detektieren, sowie Hintergrund und Artefakte zu entfernen. Diese Methoden können mit einem ebenfalls neu entwickeltes Datenbank-System verwendet werden, um bei der gesamten Datenverarbeitung die QualitĂ€t der Daten zu gewĂ€hrleisten. Angewandt wurde dieses System bei der Diskriminierung von verschiedenen pĂ€diatrischen LeukĂ€mie-Typen. Es wurden Indikator-Gene gefunden, mit deren Hilfe unbekannte LeukĂ€mie-Proben klassifiziert werden können

    Plant-parasitic nematodes: from genomics to functional analysis of parasitism genes

    Get PDF
    Nematodes (roundworms) belong to the largest phylum on earth. The numerous species inhabit practically all ecological niches, including plants. Plant-parasitic species live on plant roots, causing substantial damage to the plant and hampering its development. As such, they cause gigantic economical losses in crop production. We used a molecular approach to analyze the plant-parasitic nematode Radopholus similis by generating expressed sequence tags (ESTs). The most striking discovery was tags corresponding to aWolbachia-like endosymbiont, which was subsequently located in the ovaria of R. similis. Numerous tags corresponding to parasitism genes with potential roles in, amongst other things, host localisation, detoxification, cell wall modification, and even putative host transcriptional reprogramming were identified. In addition, a tool to explore all available nematode EST data is presented in this study. The ‘nematode EST exploration tool’ (NEXT) (http://zion.ugent.be/joachim/next) extends the usefulness by extracting and storing temporal and spatial information of all publicly available nematode EST libraries. Some members of the transthyretin-like gene family of R. similis were characterized. All stages except developing embryos express the analyzed genes, and expression is localized to the ventral nerve cord and tissues surrounding the vulva. Predicted secondary structure is suggestive of a binding capacity with a yet unknown ligand. Further, the annotation of the complete mitochondrial (mt) genome of R. similis is reported. The mt genome has the expected gene content, but shows many aberrant features such as: a considerably smaller 16S rRNA with reduced structures, two large repeat regions, the lack of stop codons on many genes and a unique codon reassignment UAA:Stop to UAA:Tyrosine. The aberrant features in the mt genome could be related to this codon reassignment, but results are ambiguous and require further research. A last part of the study reports on the response of the plant on nematode infection. Signaling of two plant hormones involved in plant defense is measured during early phases of parasitism. In addition, the role of flavonoid compounds produced by the plant is analyzed by infection tests on several mutants

    Chromosomal-level assembly of the Asian seabass genome using long sequence reads and multi-layered scaffolding

    Get PDF
    We report here the ~670 Mb genome assembly of the Asian seabass (Lates calcarifer), a tropical marine teleost. We used long-read sequencing augmented by transcriptomics, optical and genetic mapping along with shared synteny from closely related fish species to derive a chromosome-level assembly with a contig N50 size over 1 Mb and scaffold N50 size over 25 Mb that span ~90% of the genome. The population structure of L. calcarifer species complex was analyzed by re-sequencing 61 individuals representing various regions across the species’ native range. SNP analyses identified high levels of genetic diversity and confirmed earlier indications of a population stratification comprising three clades with signs of admixture apparent in the South-East Asian population. The quality of the Asian seabass genome assembly far exceeds that of any other fish species, and will serve as a new standard for fish genomics.Web of Scienc

    Decoding heterogeneous big data in an integrative way

    Get PDF
    Biotechnologies in post-genomic era, especially those that generate data in high-throughput, bring opportunities and challenges that are never faced before. And one of them is how to decode big heterogeneous data for clues that are useful for biological questions. With the exponential growth of a variety of data, comes with more and more applications of systematic approaches that investigate biological questions in an integrative way. Systematic approaches inherently require integration of heterogeneous information, which is urgently calling for a lot more efforts. In this thesis, the effort is mainly devoted to the development of methods and tools that help to integrate big heterogeneous information. In Chapter 2, we employed a heuristic strategy to summarize/integrate genes that are essential for the determination of mouse retinal cells in the format of network. These networks with experimental evidence could be rediscovered in the analysis of high-throughput data set and thus would be useful in the leverage of high-throughput data. In Chapter 3, we described EnRICH, a tool that we developed to help qualitatively integrate heterogeneous intro-organism information. We also introduced how EnRICH could be applied to the construction of a composite network from different sources, and demonstrated how we used EnRICH to successfully prioritize retinal disease genes. Following the work of Chapter 3 (intro-organism information integration), in Chapter 4 we stepped to the development of method and tool that can help deal with inter-organism information integration. The method we proposed is able to match genes in a one-to-one fashion between any two genomes. In summary, this thesis contributes to integrative analysis of big heterogeneous data by its work on the integration of intro- and inter-organism information

    Use of next Generation Sequencing to Detect Plant Pathogenic Prokaryotes

    Get PDF
    Increasing importation of commodities from countries abroad increases the risk of introduction of exotic plant pathogens. Although individual pathogen assays are available, current screening methods have limited ability to detect multiple plant pathogens concurrently. The advent of next generation sequencing (NGS) technology allows for the creation of a single assay to detect simultaneously, any and all microbes in a sample, including pathogens that have been genetically modified. In this project, bioinformatic pipelines, streamlined PC programs, were developed to generate mock sample databases used to simulate 454 runs, query "electronic probe" (e-probe) design and BLAST searches. Pathogen specific queries, ranging in lengths from 20 nt to 140 nt, were created for detection of the bacterial select agents, Xanthomonas oryzae pv. oryzae and Ralstonia solanacearum race 3 biovar 2, as well as for Candidatus Liberibacter asiaticus and Xylella fastidiosa 9a5c (not select agents). The query sets were used to BLAST mock sample databases with one host, grapevine (Vitis vinifera), for all pathogen sequences at various ratios. All four bacterial pathogens were readily detectable in silico, suggesting that NGS technology has advantages beyond those of existing pathogen detection assays. To test in silico results pathogen specific e-probes, ranging in lengths from 15 nt to 60 nt, were created for detection of Ralstonia solanacearum race 3 biovar 2, and Pseudomonas syringae pv. tomato DC3000. The e-probe sets were used to query NGS sequencing data of diseased hosts, potato inoculated with Rs r3b2, and tomato inoculated with DC3000. Both bacterial pathogens were readily detectable; suggesting NGS data can be used, when combined with e-probes, as a prokaryotic plant pathogen detection assay. This research merges bioinformatics and plant pathology for addressing national security needs of a quick detection tool for any pathogen in a single assay for the agriculture industry.Entomolog
    • 

    corecore