25 research outputs found

    Computational framework for high-quality production and large-scale evolutionary analysis of metagenome assembled genomes

    Full text link
    Microbial species play important roles in different environments and the production of high-quality genomes from metagenome data sets represents a major obstacle to understanding their ecological and evolutionary dynamics. Metagenome-Assembled Genomes Orchestra (MAGO) is a computational framework that integrates and simplifies metagenome assembly, binning, bin improvement, bin quality (completeness and contamination), bin annotation, and evolutionary placement of bins via detailed maximum-likelihood phylogeny based on multiple marker genes using different amino acid substitution models, next to average nucleotide identity analysis of genomes for delineation of species boundaries and operational taxonomic units. MAGO offers streamlined execution of the entire metagenomics pipeline, error checking, computational resource distribution and compatibility of data formats, governed by usertailored pipeline processing. MAGO is an open-source-software package released in three different ways, as a singularity image and a Docker container for HPC purposes as well as for running MAGO on a commodity hardware, and a virtual machine for gaining a full access to MAGO underlying structure and source code. MAGO is open to suggestions for extensions and is amenable for use in both research and teaching of genomics and molecular evolution of genomes assembled from small single-cell projects or large-scale and complex environmental metagenomes

    Efficient coding of DNA

    Full text link
    V zadnjem obdobju smo priča znatnemu naraščanju uporabe mikroračunalnikov pri raziskavah in analizah zaporedij DNA. Molekule DNA so računalnikom najpogosteje predstavljene v obliki zapisov v formatu FASTA , ki kodirajo sekvence DNA v obliki ASCII niza štirih nukleotidnih oznak A, G, C in T, katerim se po potrebi pridružijo še degenerativne kode in znak za presledek, ko gre za množice med seboj poravnanih zaporedij DNA. Zapis FASTA je dojemljiv za biologa in enostaven za programerja, ki razvija računalniški program, saj si pri razvoju lahko pomaga z bogatim naborom obstoječih knjinic za delo z znakovnimi polji. Kljub omenjenim prednostim ima zapis FASTA določene slabosti, kot je manj učinkovito iskanje zaporedij nukleotidov, še posebej ob prisotnosti degenerativnih kod. Druga slabost izvira iz dejstva, da vsak posamezni znak FASTA za presledek zasede po en zlog računalniškega pomnilnika,kar je ob prisotnosti velikega števila presledkov neučinkovito in tudi dodatno manjša hitrost iskanja nukleotidnih zaporedij. Zaradi omenjenih slabosti predstavljamo alternativni zapis zaporedij DNA, ki omogoča hitrejše iskanje nukleotidnih zaporedij in učinkovitejše shranjevanje informacij o poravnavi, kar vodi v hitrejše delovanje programov in odpira monost shranjevanja večjega števila zapisov DNA v delovni pomnilnik računalnika.Microcomputers have become ubiquitous tools for DNA research and analysis. Before DNA sequences can be fed into computer programs they need to be suitably coded, which is usually done in a widely accepted FASTA format. According to this scheme, DNA sequence is represented as an ASCII string of four nucleotide characters A, G, C and T, possibly extended with additional codes for representation of degenerated sites, and a character code for FASTA blanks when dealing with aligned DNA sequences. FASTA representation is intuitive for biologists and it eases development of programs since developer scan utilize a myriad of available libraries for working with ASCII strings. Despite the mentioned advantages, FASTA format possesses certain drawbacks like inefficient searching for substrings, especially in the presence of degenerative codes. The second disadvantage is inefficient storage of FASTA blank characters, since each such character occupies one byte of memory. Substring searching speed is also negatively affected in the case of excessive number of blanks. Due to the stated drawbacks, we propose an alternative coding of DNA sequences, which enables faster searching of substrings and efficient storage of FASTA blanks, with the result that a greater set of DNA sequences can be held in working memory of a computer and processed faster

    Razlike v temperaturi taljenja začetnih oligonucleotidov za odkrivanje gena nosZ

    Full text link
    One of the basic principles of molecular biology is the use oligonucleotides with comparable melting temperatures (Tm). To accommodate various evolutionary changes in target gene sequences in order to detect numerous variants of the same gene in complex microbial communities, the researchers were forced to design degenerated oligonucleotide probes and primers. In addition, recent studies suggested that relevant parameters influencing microbial activity should be included into models currently describing the final greenhouse gas emissions for public use. Further, data on microbial community structure and abundance should be included as well in near future. As one of the most potent greenhouse gases, nitrous oxide, results mainly from incomplete denitrification process, we chose nitrous oxide reductase gene (nosZ) as a model and surveyed published literature for nosZ gene oligonucleotides. We calculated in-silico Tm for each oligonucleotide degenerated variant and compared the resulting average Tm of both oligonucleotides used in pair. Degenerated oligonucleotides were found to contain variants differing in Tm for as much as 13 °C. More than 85% of oligonucleotides had difference in average Tm of paired oligonucleotide larger than 2 °C, more than 60% larger than 4 °C and more than 40% larger than 6 °C, 25% larger than 8 °C. By using such combinations at one annealing temperature or touch-down PCR or hybridization protocol, the full use of all degenerate variants could never be achieved thus bringing under the consideration the reaction chemistry. To increase the consistency of molecular results, a simple adjustment of Tm to at least comparable average Tm is recommended. In addition, critical evaluation of other methodological pitfalls should be regular practice in order to strengthen the value of molecular results as future public models parameters.Eden osnovnih principov molekularne biologije je uporaba oligonukleotidov s primerljivimi temperaturami taljenja (Tm). Da bi lahko z oligonukleotidi zajeli tudi evolucijske spremembe na tarčnih sekvencah istega gena znotraj kompleksnih mikrobnih združb, so se raziskovalci zatekli k uporabi degeneriranih oligonukleotidov. Nedavne študije predlagajo vključitev za mikrobe relevantnih parametrov, ki vplivajo na njihovo aktivnost, v modele, ki se trenutno uporabljajo za opis emisij toplogrednih plinov v javnosti. V bližnji prihodnosti pa se predvideva tudi vključitev podatkov o strukturi mikrobnih zdrub in velikosti njihovih populacij. Ker je eden najmočnejših toplogrednih plinov, N2O, rezultat v največji meri nepopolnega poteka denitrifikacije, sva izbrala gen za reduktazo N2O (nosZ) kot model ter iz objavljene literature sestavila nabor uporabljanih oligonukleotidnih parov. Za vsako varianto degeneriranega oligonukleotida v paru sva izračunala predvideno Tm in primerjala povprečne Tm obeh oligonukleotidov v paru. Tm variant degeneriranih oligonukleotidov so se razlikovale do 13 °C. Več kot 85 % oligonukleotidov je imelo povprečno razliko Tm para > 2 °C, več kot 60 % > 4°C in več kot 40 % oligonukleotidov je imelo Tm večjo od 6 °C. Z uporabo takih kombinacij pri eni temperaturi prileganja ali ,,PCR z zniževanjem temperature\u27\u27 ali hibridizacijskih protokolih, je praktično nemogoče zagotoviti polno uporabo vseh degeneriranih variant. Našteto posledično različno vpliva na potek kemijskih reakcij prepoynave tarčnih mest. Da bi izboljšali konsistentnost molekularnih rezultatov, priporočava uskladitev povprečnih Tm para oligonukleotidov. Podobno pa je potrebno kritično oceniti druge metodološke šibke točke, da bi zagotovili uporabno vrednost rezultatov molekularnih tehnik kot bodočih parametrov v modelih

    New primer combinations with comparable melting temperatures detecting highest numbers of nosZ sequences from sequence databases

    Full text link
    We explored existing primer sequences targeting nitrous oxide reductase (nosZ) gene in order to explore their capability to recognize variant nosZ sequences. Published nosZ sequences longer than 380 AA residues were obtained from FunctionalGene Database /Repository (http://flyingcloud.cme.msu.edu/fungene/) and used for explorations with PrimerChart program. The numbers of sequences recovered using all possible forward and reverse primer combinations were determined and the stringency of primer site recognition was further varied by allowing 1, 2, or 3 primer mismatches to DNA binding site. We identified novel primer combinations resulting in satisfactory amplicon length (> 500 bp) and increased sequence recognition capabilities at comparable forward and reverse primer melting temperatures. Overall, this study indicates that current state of the art molecular methods can be and should frequently be further refined by the use of targeted bioinformatic approaches.V tej študiji sva raziskala obstoječe sekvence začetnih oligonukleotidov, s katerimi se pomnožujejo fragmenti gena za reduktazo N2O (nosZ), da bi proučila njihovo zmožnost prepoznavanja variant sekvenc nosZ. Objavljene sekvence gena nosZ daljše od 380 aminokislninskih ostankov sva pridobila od FunctionalGene Database /Repository (http://flyingcloud.cme.msu.edu/fungene/) in jih analizirala s programom PrimerChart. Raziskala sva število, ki ga prepoznajo posamične mone kombinacije yačetnih oligonukleotidov. V nadaljevanju sva spreminjala natančnost prileganja začetnih oligonukleotidov na tarčno DNK tako, da sva dovolila 1, 2, or 3 napačna parjenja med začetnim oligonukleotidom in DNK. Tako sva identificirala nove kombinacije začetnih oligonukleotidov, ki ustvarijo ustrezno dolge fragmente (> 500 bp), s povišano sposobnostjo prepoznavanja sekvenc pri primerljivi temperaturi taljenja začetnih oligonukleotidov. Prav tako so se nakazale nove možnosti za izboljšanje začetnih oligonukleotidov z vnosom novih degeneriranih mest. Ta študija nakazuje, da je novejše molekularne metode možno in tudi potrebno pogosto nadgrajevati s ciljanimi bioinformatskimi pristopi

    General Unified Microbiome Profiling Pipeline (GUMPP) for large scale, streamlined and reproducible analysis of bacterial 16S rRNA data to predicted microbial metagenomes, enzymatic reactions and metabolic pathways

    Full text link
    General Unified Microbiome Profiling Pipeline (GUMPP) was developed for large scale, streamlined and reproducible analysis of bacterial 16S rRNA data and prediction of microbial metagenomes, enzymatic reactions and metabolic pathways from amplicon data. GUMPP workflow introduces reproducible data analyses at each of the three levels of resolution (genusoperational taxonomic units (OTUs)amplicon sequence variants (ASVs)). The ability to support reproducible analyses enables production of datasets that ultimately identify the biochemical pathways characteristic of disease pathology. These datasets coupled to biostatistics and mathematical approaches of machine learning can play a significant role in extraction of truly significant and meaningful information from a wide set of 16S rRNA datasets. The adoption of GUMPP in the gut-microbiota related research enables focusing on the generation of novel biomarkers that can lead to the development of mechanistic hypotheses applicable to the development of novel therapies in personalized medicine

    Enhanced stability and failure avoidance of hydropower plant in contingent island operation by model predictive frequency control

    Full text link
    The challenges of contingent island operation of hydropower plants are addressed by proposing an enhanced frequency control with advanced control algorithms. The study is based on a real power plant and considers the relevant power system in its entirety along with the load and transmission lines. Integrated transfer functions between the guide vane opening of a hydro turbine and the frequency of the associated generator were determined by in-situ identification. A model predictive controller, a fractional order PID controller, and a PID controller utilizing integral absolute error and integral time absolute error criteria were designed and developed. The performance of the proposed algorithms with the identified plant model was tested on a NI-cRIO 9049 FPGA platform. Improved frequency control in terms of setpoint change was achieved with the MPC and FOPID controllers. The MPC controller also features betters disturbance rejection compared to conventional PID controllers, which are tuned according to integral error criteria. The considered algorithms have been simulated in different operating scenarios, such as variations of active power load and transmission length. The proposed approach enhances stability and consequently avoids operational failures of hydropower plants in contingent islanding mode

    DNA encoding for an efficient \u27Omics processing

    Full text link
    The exponential growth of available DNA sequences and the increased interoperability of biological information is triggering intergoivernmental efforts aimed at increasing the access, dissemination, and analysis of sequence data. Achieving the efficient storage and processing of DNA material is an important goal that parallels well with the foreseen coding standardization on the horizon. This paper proposes novel coding approaches, for both the dissemination and processing of sequences, where the speed of the DNA processing is shown to be boosted by exploring more than the normally utilized eight bits for encoding a single nucleotide. Further gains are achived by encoding the nucleotides together with their trailing alignament information as a single 64-bit data structure. the paper also proposes a slight modification to the established FASTA scheme in order to improve on its representation of alignament information. The significance of the proposition is confirmed by the encouraging results from empirical tests

    BEsTRF

    Full text link
    BEsTRF (Best Estimated T-RF) provides a stand-alone environment for analyzing primers-enzymes-gene section combinations used in T-RFLP for its optimal resolution. User defined sequence databases of several hundred thousand DNA sequences can be explored and the resolution of user specified sets of primers and restriction endonucleases can be analyzed on either forward or reverse terminal fragments. Sequence quality, primer mismatches, insertions and delitions can be controlled and each primer-pair specific sequence collections can be exported for downstream analyses. The configuration for a novel T-RFLP population profiling using rpoB gene (DNA-directed RNA polymerase, beta subunit) on forward fluorescently labelled primer are presented
    corecore