132 research outputs found

    Comparative Analysis of Open Source Frameworks for Machine Learning with Use Case in Single-Threaded and Multi-Threaded Modes

    Full text link
    The basic features of some of the most versatile and popular open source frameworks for machine learning (TensorFlow, Deep Learning4j, and H2O) are considered and compared. Their comparative analysis was performed and conclusions were made as to the advantages and disadvantages of these platforms. The performance tests for the de facto standard MNIST data set were carried out on H2O framework for deep learning algorithms designed for CPU and GPU platforms for single-threaded and multithreaded modes of operation.Comment: 4 pages, 6 figures, 4 tables; XIIth International Scientific and Technical Conference on Computer Sciences and Information Technologies (CSIT 2017), Lviv, Ukrain

    Performance Analysis of Open Source Machine Learning Frameworks for Various Parameters in Single-Threaded and Multi-Threaded Modes

    Full text link
    The basic features of some of the most versatile and popular open source frameworks for machine learning (TensorFlow, Deep Learning4j, and H2O) are considered and compared. Their comparative analysis was performed and conclusions were made as to the advantages and disadvantages of these platforms. The performance tests for the de facto standard MNIST data set were carried out on H2O framework for deep learning algorithms designed for CPU and GPU platforms for single-threaded and multithreaded modes of operation Also, we present the results of testing neural networks architectures on H2O platform for various activation functions, stopping metrics, and other parameters of machine learning algorithm. It was demonstrated for the use case of MNIST database of handwritten digits in single-threaded mode that blind selection of these parameters can hugely increase (by 2-3 orders) the runtime without the significant increase of precision. This result can have crucial influence for optimization of available and new machine learning methods, especially for image recognition problems.Comment: 15 pages, 11 figures, 4 tables; this paper summarizes the activities which were started recently and described shortly in the previous conference presentations arXiv:1706.02248 and arXiv:1707.04940; it is accepted for Springer book series "Advances in Intelligent Systems and Computing

    Workflow Engineering in Materials Design within the BATTERY 2030+ Project

    Get PDF
    In recent years, modeling and simulation of materials have become indispensable to complement experiments in materials design. High-throughput simulations increasingly aid researchers in selecting the most promising materials for experimental studies or by providing insights inaccessible by experiment. However, this often requires multiple simulation tools to meet the modeling goal. As a result, methods and tools are needed to enable extensive-scale simulations with streamlined execution of all tasks within a complex simulation protocol, including the transfer and adaptation of data between calculations. These methods should allow rapid prototyping of new protocols and proper documentation of the process. Here an overview of the benefits and challenges of workflow engineering in virtual material design is presented. Furthermore, a selection of prominent scientific workflow frameworks used for the research in the BATTERY 2030+ project is presented. Their strengths and weaknesses as well as a selection of use cases in which workflow frameworks significantly contributed to the respective studies are discussed

    Structural Cheminformatics for Kinase-Centric Drug Design

    Get PDF
    Drug development is a long, expensive, and iterative process with a high failure rate, while patients wait impatiently for treatment. Kinases are one of the main drug targets studied for the last decades to combat cancer, the second leading cause of death worldwide. These efforts resulted in a plethora of structural, chemical, and pharmacological kinase data, which are collected in the KLIFS database. In this thesis, we apply ideas from structural cheminformatics to the rich KLIFS dataset, aiming to provide computational tools that speed up the complex drug discovery process. We focus on methods for target prediction and fragment-based drug design that study characteristics of kinase binding sites (also called pockets). First, we introduce the concept of computational target prediction, which is vital in the early stages of drug discovery. This approach identifies biological entities such as proteins that may (i) modulate a disease of interest (targets or on-targets) or (ii) cause unwanted side effects due to their similarity to on-targets (off-targets). We focus on the research field of binding site comparison, which lacked a freely available and efficient tool to determine similarities between the highly conserved kinase pockets. We fill this gap with the novel method KiSSim, which encodes and compares spatial and physicochemical pocket properties for all kinases (kinome) that are structurally resolved. We study kinase similarities in the form of kinome-wide phylogenetic trees and detect expected and unexpected off-targets. To allow multiple perspectives on kinase similarity, we propose an automated and production-ready pipeline; user-defined kinases can be inspected complementarily based on their pocket sequence and structure (KiSSim), pocket-ligand interactions, and ligand profiles. Second, we introduce the concept of fragment-based drug design, which is useful to identify and optimize active and promising molecules (hits and leads). This approach identifies low-molecular-weight molecules (fragments) that bind weakly to a target and are then grown into larger high-affinity drug-like molecules. With the novel method KinFragLib, we provide a fragment dataset for kinases (fragment library) by viewing kinase inhibitors as combinations of fragments. Kinases have a highly conserved pocket with well-defined regions (subpockets); based on the subpockets that they occupy, we fragment kinase inhibitors in experimentally resolved protein-ligand complexes. The resulting dataset is used to generate novel kinase-focused molecules that are recombinations of the previously fragmented kinase inhibitors while considering their subpockets. The KinFragLib and KiSSim methods are published as freely available Python tools. Third, we advocate for open and reproducible research that applies FAIR principles ---data and software shall be findable, accessible, interoperable, and reusable--- and software best practices. In this context, we present the TeachOpenCADD platform that contains pipelines for computer-aided drug design. We use open source software and data to demonstrate ligand-based applications from cheminformatics and structure-based applications from structural bioinformatics. To emphasize the importance of FAIR data, we dedicate several topics to accessing life science databases such as ChEMBL, PubChem, PDB, and KLIFS. These pipelines are not only useful to novices in the field to gain domain-specific skills but can also serve as a starting point to study research questions. Furthermore, we show an example of how to build a stand-alone tool that formalizes reoccurring project-overarching tasks: OpenCADD-KLIFS offers a clean and user-friendly Python API to interact with the KLIFS database and fetch different kinase data types. This tool has been used in this thesis and beyond to support kinase-focused projects. We believe that the FAIR-based methods, tools, and pipelines presented in this thesis (i) are valuable additions to the toolbox for kinase research, (ii) provide relevant material for scientists who seek to learn, teach, or answer questions in the realm of computer-aided drug design, and (iii) contribute to making drug discovery more efficient, reproducible, and reusable

    ALMA: ALgorithm Modeling Application

    Get PDF
    As of today, the most recent trend in information technology is the employment of large-scale data analytic methods powered by Artificial Intelligence (AI), influencing the priorities of businesses and research centers all over the world. However, due to both the lack of specialized talent and the need for greater compute, less established businesses struggle to adopt such endeavors, with major technological mega-corporations such as Microsoft, Facebook and Google taking the upper hand in this uneven playing field. Therefore, in an attempt to promote the democratization of AI and increase the efficiency of data scientists, this work proposes a novel no-code/low-code AI platform: the ALgorithm Modeling Application (ALMA). Moreover, as the state of the art of such platforms is still gradually maturing, current solutions often fail into encompassing security/safety aspects directly into their process. In that respect, the solution proposed in this thesis aims not only to achieve greater development and deployment efficiency while building machine learning applications but also to build upon others by addressing the inherent pitfalls of AI through a ”secure by design” philosophy.Atualmente, a tendência mais recente no domínio das tecnologias de informação e a utilização de métodos de análise de dados baseados em Inteligência Artificial (IA), influenciando as prioridades das empresas e centros de investigação de todo o mundo. No entanto, devido à falta de talento especializado no mercado e a necessidade de obter equipamentos com maior capacidade de computação, negócios menos estabelecidos têm maiores dificuldades em realizar esse tipo de investimentos quando comparados a grandes empresas tecnológicas como a Microsoft, o Facebook e a Google. Deste modo, na tentativa de promover a democratização da IA e aumentar a eficiência dos cientistas de dados, este trabalho propõe uma nova plataforma de no-code/low- code: “THe Algorithm Modeling Application” (ALMA). Por outro lado, e visto que a maioria das soluções atuais falham em abranger aspetos de segurança relativos ˜ a IA diretamente no seu processo, a solução proposta nesta tese visa não só alcançar maior eficiência na construção de soluções baseadas em IA, mas também abordar as questões de segurança implícitas ao seu uso

    Data science: a game changer for science and innovation

    Get PDF
    AbstractThis paper shows data science's potential for disruptive innovation in science, industry, policy, and people's lives. We present how data science impacts science and society at large in the coming years, including ethical problems in managing human behavior data and considering the quantitative expectations of data science economic impact. We introduce concepts such as open science and e-infrastructure as useful tools for supporting ethical data science and training new generations of data scientists. Finally, this work outlines SoBigData Research Infrastructure as an easy-to-access platform for executing complex data science processes. The services proposed by SoBigData are aimed at using data science to understand the complexity of our contemporary, globally interconnected society
    corecore