43 research outputs found

    Dimension and shape invariant programming: the implementation and application

    Get PDF
    This thesis implements a model for the shape and dimension invariant programming based on the notation of the Mathematics of Arrays (MOA) algebra. It focuses on dimension and shape invariance implementation, and their effect in parallel computing. A new design for the MOA notation is implemented that eliminates the need for another PSI-compiler, or a language extension to functional programming languages. The MOA notation is designed as a library of Application Programming Interfaces (APIs), contains object oriented classes implemented in C++. The library executes array operations correctly, and is expected to enhance the performance invariant of dimension and shape. To implement these APIs, the mathematical equations of the original notation were analyzed and sometimes simplified to become more comprehensible to implement from the programming point of view, and some more operations were added. The APIs reduce the erroneous loops starts, strides, and stops used by programmers in the traditional handling of multi-dimension arrays. The library defines the dimension and shape of the arrays at runtime; and gives the source code of the problem in hand better chances to be automatically parallelized. The MOA library testing tool developed and implemented in this thesis, can be used by mathematicians and computer arithmetic researchers to translate high level arithmetic functions in applications like image processing, video -111 processing, fluid dynamic, ... etc. to the MOA notation, utilizing its benefits. An image-processing tool is implemented using this new MOA library, proving the correctness of the design on 2D-array application, where image operations are expressed concisely in the source code and easily manipulated on the conceptual leveL Image processing transfonnations, filtering and detections are implemented. Video processing operations like transformations on the AVI Frames after decomposing them, and motion detection scheme are implemented using the MOA library, to prove the correctness of the library on a 3D-array application. Also, the parailelisation factors inherent in the MOA library design are discussed in terms of shape polymorphism, MOA parallel architecture, data redistribution, and Tiling algorithms, in relation to the MOA notation. Furthermore, pipelining with MOA has been investigated. In addition to the above experiments, a hardware implementation of the MOA APIs was implemented using VHDL on Renoir as a package, and simulated using ModelSim. Perfonuance analysis is conducted in tenus of general benefits of programming invariant of shape and dimension as designed in this thesis, which is open to further analysis based on the application domain

    Enhancing Deep Learning Models through Tensorization: A Comprehensive Survey and Framework

    Full text link
    The burgeoning growth of public domain data and the increasing complexity of deep learning model architectures have underscored the need for more efficient data representation and analysis techniques. This paper is motivated by the work of (Helal, 2023) and aims to present a comprehensive overview of tensorization. This transformative approach bridges the gap between the inherently multidimensional nature of data and the simplified 2-dimensional matrices commonly used in linear algebra-based machine learning algorithms. This paper explores the steps involved in tensorization, multidimensional data sources, various multiway analysis methods employed, and the benefits of these approaches. A small example of Blind Source Separation (BSS) is presented comparing 2-dimensional algorithms and a multiway algorithm in Python. Results indicate that multiway analysis is more expressive. Contrary to the intuition of the dimensionality curse, utilising multidimensional datasets in their native form and applying multiway analysis methods grounded in multilinear algebra reveal a profound capacity to capture intricate interrelationships among various dimensions while, surprisingly, reducing the number of model parameters and accelerating processing. A survey of the multi-away analysis methods and integration with various Deep Neural Networks models is presented using case studies in different application domains.Comment: 34 pages, 8 figures, 4 table

    Spinal Muscle Atrophy Disease Modelling as Bayesian Network

    Get PDF
    © 2021 The Author(s). Published under licence by IOP Publishing Ltd at https://doi.org/10.1088/1742-6596/2128/1/012015. This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY), https://creativecommons.org/licenses/by/3.0/We investigate the molecular gene expressions studies and public databases for disease modelling using Probabilistic Graphical Models and Bayesian Inference. A case study on Spinal Muscle Atrophy Genome-Wide Association Study results is modelled and analyzed. The genes up and down-regulated in two stages of the disease development are linked to prior knowledge published in the public domain and co-expressions network is created and analyzed. The Molecular Pathways triggered by these genes are identified. The Bayesian inference posteriors distributions are estimated using a variational analytical algorithm and a Markov chain Monte Carlo sampling algorithm. Assumptions, limitations and possible future work are concluded.Peer reviewe

    Dynamic Programming Algorithms for Discovery of Antibiotic Resistance in Microbial Genomes

    Full text link
    The translation of comparative genomics into clinical decision support tools often depends on the quality of sequence alignments. However, currently used methods of multiple sequence alignments suffer from significant biases and problems with aligning diverged sequences. The objective of this study was to develop and test a new multiple sequence alignment (MSA) algorithm suitable for the high-throughput comparative analysis of different microbial genomes. This algorithm employs an innovative tensor indexing method for partitioning the dynamic programming hyper-cube space for parallel processing. We have used the clinically relevant task of identifying regions that determine resistance to antibiotics to test the new algorithm and to compare its performance with existing MSA methods. The new method "mmDst" performed better than existing MSA algorithms for more divergent sequences because it employs a simultaneous alignment scoring recurrence, which effectively approximated the score for edge missing cell scores that fall outside the scoring region.Comment: https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=d06cf32f66c6866e2867abdca587419d4958af1

    Performance Evaluation of Checkpoint/Restart Techniques

    Full text link
    Distributed applications running on a large cluster environment, such as the cloud instances will have shorter execution time. However, the application might suffer from sudden termination due to unpredicted computing node failures, thus loosing the whole computation. Checkpoint/restart is a fault tolerance technique used to solve this problem. In this work we evaluated the performance of two of the most commonly used checkpoint/restart techniques (Distributed Multithreaded Checkpointing (DMTCP) and Berkeley Lab Checkpoint/Restart library (BLCR) integrated into the OpenMPI framework). We aimed to test their validity and evaluate their performance in both local and Amazon Elastic Compute Cloud (EC2) environments. The experiments were conducted on Amazon EC2 as a well-known proprietary cloud computing service provider. Results obtained were reported and compared to evaluate checkpoint and restart time values, data scalability and compute processes scalability. The findings proved that DMTCP performs better than BLCR for checkpoint and restart speed, data scalability and compute processes scalability experiments

    Spinal Muscle Atrophy Disease Modelling as Bayesian Network

    Full text link
    We investigate the molecular gene expressions studies and public databases for disease modelling using Probabilistic Graphical Models and Bayesian Inference. A case study on Spinal Muscle Atrophy Genome-Wide Association Study results is modelled and analyzed. The genes up and down-regulated in two stages of the disease development are linked to prior knowledge published in the public domain and co-expressions network is created and analyzed. The Molecular Pathways triggered by these genes are identified. The Bayesian inference posteriors distributions are estimated using a variational analytical algorithm and a Markov chain Monte Carlo sampling algorithm. Assumptions, limitations and possible future work are concluded

    Parallelizing Optimal Multiple Sequence Alignment by Dynamic Programming

    Full text link
    Optimal multiple sequence alignment by dynamic programming, like many highly dimensional scientific computing problems, has failed to benefit from the improvements in computing performance brought about by multi-processor systems, due to the lack of suitable scheme to manage partitioning and dependencies. A scheme for parallel implementation of the dynamic programming multiple sequence alignment is presented, based on a peer to peer design and a multidimensional array indexing method. This design results in up to 5-fold improvement compared to a previously described master/slave design, and scales favourably with the number of processors used. This study demonstrates an approach for parallelising multi-dimensional dynamic programming and similar algorithms utilizing multi-processor architectures

    Digital Twins Approaches and Methods Review

    Get PDF
    © 2023 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. This is the accepted manuscript version of a conference paper which has been published in final form at https://doi.org/10.1109/ITC-Egypt58155.2023.10206196This paper investigates the recent advances in Digital Twin technologies. The aim is to compare the approaches, available open source and proprietary technologies and methods, their features, and their integration capabilities. The motivation is to enable better design decisions based on the available literature and case studies. Various tools for 3D reconstruction and visualisation, IoT and sensor integration, Physical simulations and other complete platforms provide complete solutions. A conclusion of current challenges and future work identified that the lack of standardisation and interoperability makes the lifetime of a digital twin short, with a high cost and time to build and rebuild if required

    Defining Reference Sequences for Nocardia Species by Similarity and Clustering Analyses of 16S rRNA Gene Sequence Data

    Full text link
    The intra- and inter-species genetic diversity of bacteria and the absence of 'reference', or the most representative, sequences of individual species present a significant challenge for sequence-based identification. The aims of this study were to determine the utility, and compare the performance of several clustering and classification algorithms to identify the species of 364 sequences of 16S rRNA gene with a defined species in GenBank, and 110 sequences of 16S rRNA gene with no defined species, all within the genus Nocardia. A total of 364 16S rRNA gene sequences of Nocardia species were studied. In addition, 110 16S rRNA gene sequences assigned only to the Nocardia genus level at the time of submission to GenBank were used for machine learning classification experiments. Different clustering algorithms were compared with a novel algorithm or the linear mapping (LM) of the distance matrix. Principal Components Analysis was used for the dimensionality reduction and visualization. Results: The LM algorithm achieved the highest performance and classified the set of 364 16S rRNA sequences into 80 clusters, the majority of which (83.52%) corresponded with the original species. The most representative 16S rRNA sequences for individual Nocardia species have been identified as 'centroids' in respective clusters from which the distances to all other sequences were minimized; 110 16S rRNA gene sequences with identifications recorded only at the genus level were classified using machine learning methods. Simple kNN machine learning demonstrated the highest performance and classified Nocardia species sequences with an accuracy of 92.7% and a mean frequency of 0.578
    corecore