30,677 research outputs found
Evolving Large-Scale Data Stream Analytics based on Scalable PANFIS
Many distributed machine learning frameworks have recently been built to
speed up the large-scale data learning process. However, most distributed
machine learning used in these frameworks still uses an offline algorithm model
which cannot cope with the data stream problems. In fact, large-scale data are
mostly generated by the non-stationary data stream where its pattern evolves
over time. To address this problem, we propose a novel Evolving Large-scale
Data Stream Analytics framework based on a Scalable Parsimonious Network based
on Fuzzy Inference System (Scalable PANFIS), where the PANFIS evolving
algorithm is distributed over the worker nodes in the cloud to learn
large-scale data stream. Scalable PANFIS framework incorporates the active
learning (AL) strategy and two model fusion methods. The AL accelerates the
distributed learning process to generate an initial evolving large-scale data
stream model (initial model), whereas the two model fusion methods aggregate an
initial model to generate the final model. The final model represents the
update of current large-scale data knowledge which can be used to infer future
data. Extensive experiments on this framework are validated by measuring the
accuracy and running time of four combinations of Scalable PANFIS and other
Spark-based built in algorithms. The results indicate that Scalable PANFIS with
AL improves the training time to be almost two times faster than Scalable
PANFIS without AL. The results also show both rule merging and the voting
mechanisms yield similar accuracy in general among Scalable PANFIS algorithms
and they are generally better than Spark-based algorithms. In terms of running
time, the Scalable PANFIS training time outperforms all Spark-based algorithms
when classifying numerous benchmark datasets.Comment: 20 pages, 5 figure
LF-PPL: A Low-Level First Order Probabilistic Programming Language for Non-Differentiable Models
We develop a new Low-level, First-order Probabilistic Programming Language
(LF-PPL) suited for models containing a mix of continuous, discrete, and/or
piecewise-continuous variables. The key success of this language and its
compilation scheme is in its ability to automatically distinguish parameters
the density function is discontinuous with respect to, while further providing
runtime checks for boundary crossings. This enables the introduction of new
inference engines that are able to exploit gradient information, while
remaining efficient for models which are not everywhere differentiable. We
demonstrate this ability by incorporating a discontinuous Hamiltonian Monte
Carlo (DHMC) inference engine that is able to deliver automated and efficient
inference for non-differentiable models. Our system is backed up by a
mathematical formalism that ensures that any model expressed in this language
has a density with measure zero discontinuities to maintain the validity of the
inference engine.Comment: Published in the proceedings of the 22nd International Conference on
Artificial Intelligence and Statistics (AISTATS
Coarse-to-Fine Lifted MAP Inference in Computer Vision
There is a vast body of theoretical research on lifted inference in
probabilistic graphical models (PGMs). However, few demonstrations exist where
lifting is applied in conjunction with top of the line applied algorithms. We
pursue the applicability of lifted inference for computer vision (CV), with the
insight that a globally optimal (MAP) labeling will likely have the same label
for two symmetric pixels. The success of our approach lies in efficiently
handling a distinct unary potential on every node (pixel), typical of CV
applications. This allows us to lift the large class of algorithms that model a
CV problem via PGM inference. We propose a generic template for coarse-to-fine
(C2F) inference in CV, which progressively refines an initial coarsely lifted
PGM for varying quality-time trade-offs. We demonstrate the performance of C2F
inference by developing lifted versions of two near state-of-the-art CV
algorithms for stereo vision and interactive image segmentation. We find that,
against flat algorithms, the lifted versions have a much superior anytime
performance, without any loss in final solution quality.Comment: Published in IJCAI 201
A network inference method for large-scale unsupervised identification of novel drug-drug interactions
Characterizing interactions between drugs is important to avoid potentially
harmful combinations, to reduce off-target effects of treatments and to fight
antibiotic resistant pathogens, among others. Here we present a network
inference algorithm to predict uncharacterized drug-drug interactions. Our
algorithm takes, as its only input, sets of previously reported interactions,
and does not require any pharmacological or biochemical information about the
drugs, their targets or their mechanisms of action. Because the models we use
are abstract, our approach can deal with adverse interactions,
synergistic/antagonistic/suppressing interactions, or any other type of drug
interaction. We show that our method is able to accurately predict
interactions, both in exhaustive pairwise interaction data between small sets
of drugs, and in large-scale databases. We also demonstrate that our algorithm
can be used efficiently to discover interactions of new drugs as part of the
drug discovery process
Energy performance forecasting of residential buildings using fuzzy approaches
The energy consumption used for domestic purposes in Europe is, to a considerable extent, due to heating and cooling. This energy is produced mostly by burning fossil fuels, which has a high negative environmental impact. The characteristics of a building are an important factor to determine the necessities of heating and cooling loads. Therefore, the study of the relevant characteristics of the buildings, regarding the heating and cooling needed to maintain comfortable indoor air conditions, could be very useful in order to design and construct energy-efficient buildings. In previous studies, different machine-learning approaches have been used to predict heating and cooling loads from the set of variables: relative compactness, surface area, wall area, roof area, overall height, orientation, glazing area and glazing area distribution. However, none of these methods are based on fuzzy logic. In this research, we study two fuzzy logic approaches, i.e., fuzzy inductive reasoning (FIR) and adaptive neuro fuzzy inference system (ANFIS), to deal with the same problem. Fuzzy approaches obtain very good results, outperforming all the methods described in previous studies except one. In this work, we also study the feature selection process of FIR methodology as a pre-processing tool to select the more relevant variables before the use of any predictive modelling methodology. It is proven that FIR feature selection provides interesting insights into the main building variables causally related to heating and cooling loads. This allows better decision making and design strategies, since accurate cooling and heating load estimations and correct identification of parameters that affect building energy demands are of high importance to optimize building designs and equipment specifications.Peer ReviewedPostprint (published version
- …