4,995 research outputs found

    The Big Picture: A Holistic View of E-Book Acquisitions

    Get PDF
    The merging of two departments into the Acquisitions and Collection Development Department afforded Loyola Marymount University an opportunity to rethink existing workflows, with the acquisition of electronic books (e-books) being identified as a critical task to review. Process mapping was used to show the complexity of different tasks being performed in the department and to provide a visualization mechanism for staff to see how their work fit into a sequence of actions as part of a larger workflow. The authors listed the types of acquisition models used at their library for e-books and constructed process maps for the following six major types: 1. Firm order e-books; 2. Firm order e-book collections; 3. Approval order e-books; 4. Demand-driven e-books; 5. Standing order e-books, and; 6. Subscription e-book database. The authors merged the individual process maps into a single visualization to view the entirety of the acquisition process as a whole and to show how the different e-book acquisition models relate and diverge from one another

    scalable bioinformatics via workflow conversion

    Get PDF
    Background Reproducibility is one of the tenets of the scientific method. Scientific experiments often comprise complex data flows, selection of adequate parameters, and analysis and visualization of intermediate and end results. Breaking down the complexity of such experiments into the joint collaboration of small, repeatable, well defined tasks, each with well defined inputs, parameters, and outputs, offers the immediate benefit of identifying bottlenecks, pinpoint sections which could benefit from parallelization, among others. Workflows rest upon the notion of splitting complex work into the joint effort of several manageable tasks. There are several engines that give users the ability to design and execute workflows. Each engine was created to address certain problems of a specific community, therefore each one has its advantages and shortcomings. Furthermore, not all features of all workflow engines are royalty-free —an aspect that could potentially drive away members of the scientific community. Results We have developed a set of tools that enables the scientific community to benefit from workflow interoperability. We developed a platform-free structured representation of parameters, inputs, outputs of command-line tools in so-called Common Tool Descriptor documents. We have also overcome the shortcomings and combined the features of two royalty-free workflow engines with a substantial user community: the Konstanz Information Miner, an engine which we see as a formidable workflow editor, and the Grid and User Support Environment, a web-based framework able to interact with several high-performance computing resources. We have thus created a free and highly accessible way to design workflows on a desktop computer and execute them on high-performance computing resources. Conclusions Our work will not only reduce time spent on designing scientific workflows, but also make executing workflows on remote high-performance computing resources more accessible to technically inexperienced users. We strongly believe that our efforts not only decrease the turnaround time to obtain scientific results but also have a positive impact on reproducibility, thus elevating the quality of obtained scientific results

    RekomGNN: Visualizing, Contextualizing and Evaluating Graph Neural Networks Recommendations

    Full text link
    Content recommendation tasks increasingly use Graph Neural Networks, but it remains challenging for machine learning experts to assess the quality of their outputs. Visualization systems for GNNs that could support this interrogation are few. Moreover, those that do exist focus primarily on exposing GNN architectures for tuning and prediction tasks and do not address the challenges of recommendation tasks. We developed RekomGNN, a visual analytics system that supports ML experts in exploring GNN recommendations across several dimensions and making annotations about their quality. RekomGNN straddles the design space between Neural Network and recommender system visualization to arrive at a set of encoding and interaction choices for recommendation tasks. We found that RekomGNN helps experts make qualitative assessments of the GNN's results, which they can use for model refinement. Overall, our contributions and findings add to the growing understanding of visualizing GNNs for increasingly complex tasks

    SQL for GPU Data Frames in RAPIDS Accelerating end-to-end data science workflows using GPUs

    Get PDF
    International audienceIn this work, we present BlazingSQL [2] a SQL engine build on RAPIDS open-source software, which allows us to query enterprise data lakes lightning fast with full interoperability with the RAPIDs stack. BlazingSQL makes it simple for data scientists to SQL query raw files directly into GPU memory. RAPIDS can then take these results to continue machine learning, deep learning, and visualiza-tion workloads. We present two demo workflows using BlazingSQL and RAPIDS. Moreover, our solution presents an average from 20-100x faster than an identical query on Spark Cluster at price parity. This significant gain in speed allows us to evaluate the solution on a large, realistic, and challenging set of database use cases. The increasing availability of data has created a necessity to develop better techniques and methods in order to discover knowledge from massive volumes of complex data. For these challenges, CPU impose limits on performance to deliver these kind of solutions. Resorting to GPU programming is one approach to overcome these performance limitations. GPUs in Machine Learning CPUs can no longer handle the growing data. AI/ML is unable to keep up with the growth of data being processed [3]. GPUs are well known for accelerating the training. GPUs are able to scale to the new data demands. The bigger the dataset is, the higher the training performance difference is between CPU and GPU [4]. However data preparation still happens on CPUs, and can't keep up with GPU accelererated machine learning. RAPIDS RAPIDS [5] is an end-to-end analytics solution on GPUs. More extensively, RAPIDS is a set of open source libraries for GPU accelerating data preparation and machine learning built by multiple contributors like NVIDIA, Anaconda, BlazingDB, etc. It covers all the steps of the most common data science pipelines. It is composed of cuDF for data preparation, cuML for machine learning, and cu-GRAPH for graph analytics all under the standard specification of Apache Arrow [1] in GPU memory. BlazingSQL and RAPIDS Ecosystem RAPIDS [5] allows data scientists to accelerate end-to-end data an-alytics solution on GPUs. Part fundamental of RAPIDS is the GPU DataFrame (GDF) which has the goal to support interoperability between GPU applications and define a common GPU in-memory data layer. In this context, CUDA DataFrame (cuDF) from RAPIDS covers the GPU Data Processing for GDFs (formed by GPU compute kernels and a pandas-like API) [6]. BlazingSQL [2] provides a simple SQL interface to ETL massive datasets into GPU memory for AI and Deep Learning workloads. Furthermore, BlazingSQL can directly query files, such as CSV and Apache Parquet, on data lakes, like HDFS and AWS S3, all these processes directly into GPU memory End-to-end workflows Mortgage: Load Risk end-to-end processing Train a model to assess risk of new mortgage loans based on Fannie Mae loan performance data. BlazingSQL + XGBoost Loan Risk Demo The end to end analytics workload: • Data Lake → ETL/Feature Engineering → XGBoost Training • We built two price equivalent clusters on GCP, one for Apache Spark and another for BlazingSQL • BlazingSQL ran the ETL phase of this workload 20x faster than Apache Spark RAPIDS + BlazingSQL outperforms traditional CPU pipelines Netflow Analysis: ETL + Visualization BlazingSQL, the GPU SQL engine built on RAPIDS, worked with our partners at Graphistry to show how you can analyze log data over 100x faster than using Apache Spark at price parity. Visually analyze the VAST netflow data set inside Graphistry in order to quickly detect anomalous events We took 65M rows of netflow data in Apache Parquet, and in less than a second our query built a table of nodes and edges to render a visual graph
    • …
    corecore