61 research outputs found

    Applications of next-generation sequencing technologies and computational tools in molecular evolution and aquatic animals conservation studies : a short review

    Get PDF
    Aquatic ecosystems that form major biodiversity hotspots are critically threatened due to environmental and anthropogenic stressors. We believe that, in this genomic era, computational methods can be applied to promote aquatic biodiversity conservation by addressing questions related to the evolutionary history of aquatic organisms at the molecular level. However, huge amounts of genomics data generated can only be discerned through the use of bioinformatics. Here, we examine the applications of next-generation sequencing technologies and bioinformatics tools to study the molecular evolution of aquatic animals and discuss the current challenges and future perspectives of using bioinformatics toward aquatic animal conservation efforts

    Bioinformatics approaches for hybrid de novo genome assembly

    Get PDF
    De novo genome assembly, the computational process to reconstruct the genomic sequence from scratch stitching together overlapping reads, plays a key role in computational biology and, to date, it cannot be considered a solved problem. Many bioinformatics approaches are available to deal with different type of data generated by diverse technologies. Assemblies relying on short read data resulted to be highly fragmented, reconstructing short contigs interrupted in repetitive region; on the other side long-read based approaches still suffer of high sequencing error rate, worsening the final consensus quality. This thesis aimed to assess the impact of different assembly approaches on the reconstruction of a highly repetitive genome, identifying the strengths and limiting the weaknesses of such approaches through the integration of orthogonal data types. Moreover, a benchmarking study has been undertaken to improve the contiguity of this genome, describing the improvements obtained thanks to the integration of additional data layers. Assemblies performed using short reads confirmed the limitation in the reconstruction of long sequences for both the software adopted. The use of long reads allowed to improve the genome assembly contiguity, reconstructing also a greater number of gene models. Despite the enhancement of contiguity, base level accuracy of long reads-based assembly could still not reach higher levels. Therefore, short reads were integrated within the assembly process to limit the base level errors present in the reconstructed sequences up to 96%. To order and orient the assembled polished contigs into longer scaffolds, data derived from three different technologies (linked read, chromosome conformation capture and optical mapping) have been analysed. The best contiguity metrics were obtained using chromosome conformation data, which permit to obtain chromosome-scale scaffolds. To evaluate the obtained results, data derived from linked reads and optical mapping have been used to identify putative misassemblies in the scaffolds. Both the datasets allowed the identification of misassemblies, highlighting the importance of integrating data derived from orthogonal technologies in the de novo assembly process. 4 This work underlines the importance of adopting bioinformatics approaches able to deal with data type generated by different technologies. In this way, results could be more accurately validated for the reconstruction of assemblies that could be eventually considered reference genomes

    Factors Affecting the Quality of Bacterial Genomes Assemblies by Canu after Nanopore Sequencing

    Get PDF
    Long-read sequencing (LRS), like Oxford Nanopore Technologies, is usually associated with higher error rates compared to previous generations. Factors affecting the assembly quality are the integrity of DNA, the flowcell efficiency, and, not least all, the raw data processing. Among LRS-intended de novo assemblers, Canu is highly flexible, with its dozens of adjustable parameters. Different Canu parameters were compared for assembling reads of Salmonellaenterica ser. Bovismorbificans (genome size of 4.8 Mbp) from three runs on MinION (N50 651, 805, and 5573). Two of them, with low quality and highly fragmented DNA, were not usable alone for assembly, while they were successfully assembled when combining the reads from all experiments. The best results were obtained by modifying Canu parameters related to the error correction, such as corErrorRate (exclusion of overlaps above a set error rate, set up at 0.40), corMhapSensitivity (the coarse sensitivity level, set to “high”), corMinCoverage (set to 0 to correct all reads, regardless the overlaps length), and corOutCoverage (corrects the longest reads up to the imposed coverage, set to 100). This setting produced two contigs corresponding to the complete sequences of the chromosome and a plasmid. The overall results highlight the importance of a tailored bioinformatic analysis

    SIDE: Sequence-Interaction-Aware Dual Encoder for Predicting circRNA Back-Splicing Events

    Get PDF
    Circular RNAs (circRNAs) play a critical role in gene regulation and association with diseases due to their specialized structure, which is formed as a closed loop structure during a non-canonical splicing process where the donor site back-spliced to an upstream acceptor site. As fundamental work to clarify their functions and mechanisms, a large number of computational methods for predicting circRNA formation have been proposed, among which, in particular, deep learning is utilized to capture relevant patterns from raw RNA sequences and model their interactions to facilitate prediction. However, these methods fail to fully utilize the important characteristics of back-splicing events, i.e., the positional information of the splice sites and the interaction features of its flanking sequences, for prediction. To this end, we hereby propose a novel approach called SIDE for predicting circRNA back-splicing events using only nucleotide sequences. Our model employs a dual encoder to capture global and interactive features of the sequence, and then a decoder designed by the contrastive learning to fuse out discriminative features improving the prediction of circRNAs formation. Empirical results on three real-world datasets have shown the effectiveness of SIDE. Our code is publicly available at https://github.com/scu-kdde/Bioinfo-SIDE-2023

    Применение алгоритмов биоинформатики для обнаружения мутирующих кибератак

    Get PDF
    Функционал любой системы может быть представлен в виде совокупности команд, которые приводят к изменению состояния системы. Задача обнаружения атаки для сигнатурных систем обнаружения вторжений эквивалентна сопоставлению последовательностей команд, выполняемых защищаемой системой, с известными сигнатурами атак. Различные мутации в векторах атак (включая замену команд на равносильные, перестановку команд и их блоков, добавление мусорных и пустых команд) снижают эффективность и точность обнаружения вторжений. В статье проанализированы существующие решения в области биоинформатики, рассмотрена их применимость для идентификации мутирующих атак. Предложен новый подход к обнаружению атак на основе технологии суффиксных деревьев, используемой при сборке и проверке схожести геномных последовательностей. Применение алгоритмов биоинформатики позволяет добиться высокой точности обнаружения мутирующих атак на уровне современных систем обнаружения вторжений (более 90%), при этом превосходя их по экономичности использования памяти, быстродействию и устойчивости к изменениям векторов атак. Для улучшения показателей точности проведен ряд модификаций разработанного решения, вследствие которых точность обнаружения атак увеличена до 95% при уровне мутаций в последовательности до 10%. Метод может применяться для обнаружения вторжений как в классических компьютерных сетях, так и в современных реконфигурируемых сетевых инфраструктурах с ограниченными ресурсами (Интернет вещей, сети киберфизических объектов, сенсорные сети)

    Применение алгоритмов биоинформатики для обнаружения мутирующих кибератак

    Get PDF
    The functionality of any system can be represented as a set of commands that lead to a change in the state of the system. The intrusion detection problem for signature-based intrusion detection systems is equivalent to matching the sequences of operational commands executed by the protected system to known attack signatures. Various mutations in attack vectors (including replacing commands with equivalent ones, rearranging the commands and their blocks, adding garbage and empty commands into the sequence) reduce the effectiveness and accuracy of the intrusion detection. The article analyzes the existing solutions in the field of bioinformatics and considers their applicability for solving the problem of identifying polymorphic attacks by signature-based intrusion detection systems. A new approach to the detection of polymorphic attacks based on the suffix tree technology applied in the assembly and verification of the similarity of genomic sequences is discussed. The use of bioinformatics technology allows us to achieve high accuracy of intrusion detection at the level of modern intrusion detection systems (more than 0.90), while surpassing them in terms of cost-effectiveness of storage resources, speed and readiness to changes in attack vectors. To improve the accuracy indicators, a number of modifications of the developed algorithm have been carried out, as a result of which the accuracy of detecting attacks increased by up to 0.95 with the level of mutations in the sequence up to 10%. The developed approach can be used for intrusion detection both in conventional computer networks and in modern reconfigurable network infrastructures with limited resources (Internet of Things, networks of cyber-physical objects, wireless sensor networks).Функционал любой системы может быть представлен в виде совокупности команд, которые приводят к изменению состояния системы. Задача обнаружения атаки для сигнатурных систем обнаружения вторжений эквивалентна сопоставлению последовательностей команд, выполняемых защищаемой системой, с известными сигнатурами атак. Различные мутации в векторах атак (включая замену команд на равносильные, перестановку команд и их блоков, добавление мусорных и пустых команд) снижают эффективность и точность обнаружения вторжений. В статье проанализированы существующие решения в области биоинформатики, рассмотрена их применимость для идентификации мутирующих атак. Предложен новый подход к обнаружению атак на основе технологии суффиксных деревьев, используемой при сборке и проверке схожести геномных последовательностей. Применение алгоритмов биоинформатики позволяет добиться высокой точности обнаружения мутирующих атак на уровне современных систем обнаружения вторжений (более 90%), при этом превосходя их по экономичности использования памяти, быстродействию и устойчивости к изменениям векторов атак. Для улучшения показателей точности проведен ряд модификаций разработанного решения, вследствие которых точность обнаружения атак увеличена до 95% при уровне мутаций в последовательности до 10%. Метод может применяться для обнаружения вторжений как в классических компьютерных сетях, так и в современных реконфигурируемых сетевых инфраструктурах с ограниченными ресурсами (Интернет вещей, сети киберфизических объектов, сенсорные сети)

    A Representation Learning Approach for Predicting circRNA Back-Splicing Event via Sequence-Interaction-Aware Dual Encoder

    Get PDF
    Circular RNAs (circRNAs) play a crucial role in generegulation and association with diseases because of their uniqueclosed continuous loop structure, which is more stable and conserved than ordinary linear RNAs. As fundamental work to clarifytheir functions, a large number of computational approaches foridentifying circRNA formation have been proposed. However, thesemethods fail to fully utilize the important characteristics of backsplicing events, i.e., the positional information of the splice sitesand the interaction features of its flanking sequences, for predicting circRNAs. To this end, we hereby propose a novel approachcalled SIDE for predicting circRNA back-splicing events using onlyraw RNA sequences. Technically, SIDE employs a dual encoderto capture global and interactive features of the RNA sequence,and then a decoder designed by the contrastive learning to fuseout discriminative features improving the prediction of circRNAsformation. Empirical results on three real-world datasets showthe effectiveness of SIDE. Further analysis also reveals that theeffectiveness of SIDE

    5S rDNA copy number in WGS data

    Get PDF
    corecore