2 research outputs found

    A Study on the Acceleration of Arrival Curve Construction and Regular Specification Mining using GPUs

    Get PDF
    Data analytics is a process of examining datasets using various analytical and statistical techniques. Several tools have been proposed in the literature to extract hidden patterns, gather insights and build mathematical models from large datasets. However, these tools have been known to be computationally demanding as the datasets become larger over time. Two such recently proposed tools are the construction of arrival curves from execution traces and mining specifications in the form of regular expressions from execution traces. Though the architectures in CPUs have extensively improved over the years to execute such computationally intensive tasks, further enhancements have been impeded due to increased heat dissipation. This has resulted in enabling parallel computing through GPUs as a vastly favorable alternative to overcome the computational challenges. In this thesis, we present an exploratory work on applying GPU computing to the construction of arrival curves and mining specifications in the form of regular expressions as case studies. The novel approaches taken for each of the case studies are first presented followed by the algorithmic breakdown to expose the parallelism involved. Lastly, experiments using commodity GPUs are presented to showcase the significant speedups obtained in comparison to the equivalent non-parallel implementations

    Learning From Almost No Data

    Get PDF
    The tremendous recent growth in the fields of artificial intelligence and machine learning has largely been tied to the availability of big data and massive amounts of compute. The increasingly popular approach of training large neural networks on large datasets has provided great returns, but it leaves behind the multitude of researchers, companies, and practitioners who do not have access to sufficient funding, compute power, or volume of data. This thesis aims to rectify this growing imbalance by probing the limits of what machine learning and deep learning methods can achieve with small data. What knowledge does a dataset contain? At the highest level, a dataset is just a collection of samples: images, text, etc. Yet somehow, when we train models on these datasets, they are able to find patterns, make inferences, detect similarities, and otherwise generalize to samples that they have previously never seen. This suggests that datasets may contain some kind of intrinsic knowledge about the systems or distributions from which they are sampled. Moreover, it appears that this knowledge is somehow distributed and duplicated across the samples; we intuitively expect that removing an image from a large training set will have virtually no impact on the final model performance. We develop a framework to explain efficient generalization around three principles: information sharing, information repackaging, and information injection. We use this framework to propose `less than one'-shot learning, an extreme form of few-shot learning where a learner must recognize N classes from M < N training examples. To achieve this extreme level of efficiency, we develop new framework-consistent methods and theory for lost data restoration, for dataset size reduction, and for few-shot learning with deep neural networks and other popular machine learning models
    corecore