60 research outputs found

    AAAI Workshop on Artificial Intelligence with Biased or Scarce Data (AIBSD)

    Get PDF
    This book is a collection of the accepted papers presented at the Workshop on Artificial Intelligence with Biased or Scarce Data (AIBSD) in conjunction with the 36th AAAI Conference on Artificial Intelligence 2022. During AIBSD 2022, the attendees addressed the existing issues of data bias and scarcity in Artificial Intelligence and discussed potential solutions in real-world scenarios. A set of papers presented at AIBSD 2022 is selected for further publication and included in this book

    Doctor of Philosophy

    Get PDF
    dissertationWith the ever-increasing amount of available computing resources and sensing devices, a wide variety of high-dimensional datasets are being produced in numerous fields. The complexity and increasing popularity of these data have led to new challenges and opportunities in visualization. Since most display devices are limited to communication through two-dimensional (2D) images, many visualization methods rely on 2D projections to express high-dimensional information. Such a reduction of dimension leads to an explosion in the number of 2D representations required to visualize high-dimensional spaces, each giving a glimpse of the high-dimensional information. As a result, one of the most important challenges in visualizing high-dimensional datasets is the automatic filtration and summarization of the large exploration space consisting of all 2D projections. In this dissertation, a new type of algorithm is introduced to reduce the exploration space that identifies a small set of projections that capture the intrinsic structure of high-dimensional data. In addition, a general framework for summarizing the structure of quality measures in the space of all linear 2D projections is presented. However, identifying the representative or informative projections is only part of the challenge. Due to the high-dimensional nature of these datasets, obtaining insights and arriving at conclusions based solely on 2D representations are limited and prone to error. How to interpret the inaccuracies and resolve the ambiguity in the 2D projections is the other half of the puzzle. This dissertation introduces projection distortion error measures and interactive manipulation schemes that allow the understanding of high-dimensional structures via data manipulation in 2D projections

    Metrics for Materials Discovery

    Get PDF
    The vast corpus of experimental solid state data has enabled a variety of statistical methods to be applied in high throughput materials discovery. There are many techniques for representing a material into a numeric vector, and many investigations apply the Euclidean distance between these vectors to judge similarity. This thesis investigates applications of non-Euclidean metrics, in particular optimal transport measures, or the Earth Mover’s Distance (EMD), to quantify the similarity between two materials for use in computational workflows, with a focus on solid state electrolytes (SSEs). Chapter 1 introduces the field of lithium conducting SSEs for use in batteries, as well as an introductory precursor for some of the machine learning concepts, for those without exposure to this field. The EMD is a function which returns the minimal quantity of work that is required to transform one distribution into another, and a tutorial on how to compute the EMD using the simplest known technique is provided given its relevance to later chapters. In chapter 2 the discussion around the EMD is continued, and we introduce the workflow that has been developed for quantifying the chemical similarity of materials with the Element Movers Distance (ElMD). Given the affect that minor dopants can have on physical properties, it is imperative that we use techniques that capture nuanced differences in stoichiometry between materials. The relationships between the binary compounds of the ICSD are shown to be well captured using this metric. Larger scale maps of materials space are generated, and used to explore some of the known SSE chemistries. At the beginning of the PhD, there were no substantial datasets of lithium SSEs available, as such chapter 3 outlines the lengthy process of gathering this data. This resulted in the Liverpool ionics dataset, containing 820 entries, with 403 unique compositions having conductivities measured at room temperature. The performance of leading composition based property prediction models against this dataset is rigorously assessed. The resultant classification model gives a strong enough improvement over human guesswork that it may be used for screening in future studies. At present, materials datasets are disparate and scattered. Using the ElMD in chapter 4, we investigate how different metric indexing methods may be used to partition gathered datasets of compositions. This enables very fast nearest neighbour queries allowing the automated retrieval of similar compounds across millions of records in milliseconds. Chapter 5 introduces the technique Percifter for characterizing crystal structures, based on the principles of persistent homology (PH). This increasingly popular technique is used in materials science to describe the topology of a crystal. Percifter seeks to improve the stability of these representations for different choices of unit cells. These similarities may be observed directly, or compared through the EMD

    LIPIcs, Volume 258, SoCG 2023, Complete Volume

    Get PDF
    LIPIcs, Volume 258, SoCG 2023, Complete Volum

    Machine Learning Methods with Noisy, Incomplete or Small Datasets

    Get PDF
    In many machine learning applications, available datasets are sometimes incomplete, noisy or affected by artifacts. In supervised scenarios, it could happen that label information has low quality, which might include unbalanced training sets, noisy labels and other problems. Moreover, in practice, it is very common that available data samples are not enough to derive useful supervised or unsupervised classifiers. All these issues are commonly referred to as the low-quality data problem. This book collects novel contributions on machine learning methods for low-quality datasets, to contribute to the dissemination of new ideas to solve this challenging problem, and to provide clear examples of application in real scenarios

    Tracking the Temporal-Evolution of Supernova Bubbles in Numerical Simulations

    Get PDF
    The study of low-dimensional, noisy manifolds embedded in a higher dimensional space has been extremely useful in many applications, from the chemical analysis of multi-phase flows to simulations of galactic mergers. Building a probabilistic model of the manifolds has helped in describing their essential properties and how they vary in space. However, when the manifold is evolving through time, a joint spatio-temporal modelling is needed, in order to fully comprehend its nature. We propose a first-order Markovian process that propagates the spatial probabilistic model of a manifold at fixed time, to its adjacent temporal stages. The proposed methodology is demonstrated using a particle simulation of an interacting dwarf galaxy to describe the evolution of a cavity generated by a Supernov

    Q(sqrt(-3))-Integral Points on a Mordell Curve

    Get PDF
    We use an extension of quadratic Chabauty to number fields,recently developed by the author with Balakrishnan, Besser and M ̈uller,combined with a sieving technique, to determine the integral points overQ(√−3) on the Mordell curve y2 = x3 − 4

    Seventh Biennial Report : June 2003 - March 2005

    No full text

    Novel Architectures and Optimization Algorithms for Training Neural Networks and Applications

    Get PDF
    The two main areas of Deep Learning are Unsupervised and Supervised Learning. Unsupervised Learning studies a class of data processing problems in which only descriptions of objects are known, without label information. Generative Adversarial Networks (GANs) have become among the most widely used unsupervised neural net models. GAN combines two neural nets, generative and discriminative, that work simultaneously. We introduce a new family of discriminator loss functions that adopts a weighted sum of real and fake parts, which we call adaptive weighted loss functions. Using the gradient information, we can adaptively choose weights to train a discriminator in the direction that benefits the GAN\u27s stability. Also, we propose several improvements to the GAN training schemes. One is self-correcting optimization for training a GAN discriminator on Speech Enhancement tasks, which helps avoid ``harmful\u27\u27 training directions for parts of the discriminator loss. The other improvement is a consistency loss, which targets the inconsistency in time and time-frequency domains caused by Fourier Transforms. Contrary to Unsupervised Learning, Supervised Learning uses labels for each object, and it is required to find the relationship between objects and labels. Building computing methods to interpret and represent human language automatically is known as Natural Language Processing which includes tasks such as word prediction, machine translation, etc. In this area, we propose a novel Neumann-Cayley Gated Recurrent Unit (NC-GRU) architecture based on a Neumann series-based Scaled Cayley transformation. The NC-GRU uses orthogonal matrices to prevent exploding gradient problems and enhance long-term memory on various prediction tasks. In addition, we propose using our newly introduced NC-GRU unit inside Neural Nets model to create neural molecular fingerprints. Integrating novel NC-GRU fingerprints and Multi-Task Deep Neural Networks schematics help to improve the performance of several molecular-related tasks. We also introduce a new normalization method - Assorted-Time Normalization, that helps to preserve information from multiple consecutive time steps and normalize using them in Recurrent Nets like architectures. Finally, we propose a Symmetry Structured Convolutional Neural Network (SCNN), an architecture with 2D structured symmetric features over spatial dimensions, that generates and preserves the symmetry structure in the network\u27s convolutional layers
    • …
    corecore