676,305 research outputs found
GraphLab: A New Framework for Parallel Machine Learning
Designing and implementing efficient, provably correct parallel machine
learning (ML) algorithms is challenging. Existing high-level parallel
abstractions like MapReduce are insufficiently expressive while low-level tools
like MPI and Pthreads leave ML experts repeatedly solving the same design
challenges. By targeting common patterns in ML, we developed GraphLab, which
improves upon abstractions like MapReduce by compactly expressing asynchronous
iterative algorithms with sparse computational dependencies while ensuring data
consistency and achieving a high degree of parallel performance. We demonstrate
the expressiveness of the GraphLab framework by designing and implementing
parallel versions of belief propagation, Gibbs sampling, Co-EM, Lasso and
Compressed Sensing. We show that using GraphLab we can achieve excellent
parallel performance on large scale real-world problems
Generative Adversarial Networks for Mitigating Biases in Machine Learning Systems
In this paper, we propose a new framework for mitigating biases in machine
learning systems. The problem of the existing mitigation approaches is that
they are model-oriented in the sense that they focus on tuning the training
algorithms to produce fair results, while overlooking the fact that the
training data can itself be the main reason for biased outcomes. Technically
speaking, two essential limitations can be found in such model-based
approaches: 1) the mitigation cannot be achieved without degrading the accuracy
of the machine learning models, and 2) when the data used for training are
largely biased, the training time automatically increases so as to find
suitable learning parameters that help produce fair results. To address these
shortcomings, we propose in this work a new framework that can largely mitigate
the biases and discriminations in machine learning systems while at the same
time enhancing the prediction accuracy of these systems. The proposed framework
is based on conditional Generative Adversarial Networks (cGANs), which are used
to generate new synthetic fair data with selective properties from the original
data. We also propose a framework for analyzing data biases, which is important
for understanding the amount and type of data that need to be synthetically
sampled and labeled for each population group. Experimental results show that
the proposed solution can efficiently mitigate different types of biases, while
at the same time enhancing the prediction accuracy of the underlying machine
learning model
PennyLane: Automatic differentiation of hybrid quantum-classical computations
PennyLane is a Python 3 software framework for optimization and machine
learning of quantum and hybrid quantum-classical computations. The library
provides a unified architecture for near-term quantum computing devices,
supporting both qubit and continuous-variable paradigms. PennyLane's core
feature is the ability to compute gradients of variational quantum circuits in
a way that is compatible with classical techniques such as backpropagation.
PennyLane thus extends the automatic differentiation algorithms common in
optimization and machine learning to include quantum and hybrid computations. A
plugin system makes the framework compatible with any gate-based quantum
simulator or hardware. We provide plugins for Strawberry Fields, Rigetti
Forest, Qiskit, Cirq, and ProjectQ, allowing PennyLane optimizations to be run
on publicly accessible quantum devices provided by Rigetti and IBM Q. On the
classical front, PennyLane interfaces with accelerated machine learning
libraries such as TensorFlow, PyTorch, and autograd. PennyLane can be used for
the optimization of variational quantum eigensolvers, quantum approximate
optimization, quantum machine learning models, and many other applications.Comment: Code available at https://github.com/XanaduAI/pennylane/ .
Significant contributions to the code (new features, new plugins, etc.) will
be recognized by the opportunity to be a co-author on this pape
An automated ETL for online datasets
While using online datasets for machine learning is commonplace today, the quality of these datasets impacts on the performance
of prediction algorithms. One method for improving the semantics of new data sources is to map these sources to a common
data model or ontology. While semantic and structural heterogeneities must still be resolved, this provides a well established
approach to providing clean datasets, suitable for machine learning and analysis. However, when there is a requirement for a
close to real time usage of online data, a method for dynamic Extract-Transform-Load of new sources data must be developed.
In this work, we present a framework for integrating online and enterprise data sources, in close to real time, to provide
datasets for machine learning and predictive algorithms. An exhaustive evaluation compares a human built data transformation
process with our system’s machine generated ETL process, with very favourable results, illustrating the value and impact of
an automated approach
- …