301,835 research outputs found
Energy performance forecasting of residential buildings using fuzzy approaches
The energy consumption used for domestic purposes in Europe is, to a considerable extent, due to heating and cooling. This energy is produced mostly by burning fossil fuels, which has a high negative environmental impact. The characteristics of a building are an important factor to determine the necessities of heating and cooling loads. Therefore, the study of the relevant characteristics of the buildings, regarding the heating and cooling needed to maintain comfortable indoor air conditions, could be very useful in order to design and construct energy-efficient buildings. In previous studies, different machine-learning approaches have been used to predict heating and cooling loads from the set of variables: relative compactness, surface area, wall area, roof area, overall height, orientation, glazing area and glazing area distribution. However, none of these methods are based on fuzzy logic. In this research, we study two fuzzy logic approaches, i.e., fuzzy inductive reasoning (FIR) and adaptive neuro fuzzy inference system (ANFIS), to deal with the same problem. Fuzzy approaches obtain very good results, outperforming all the methods described in previous studies except one. In this work, we also study the feature selection process of FIR methodology as a pre-processing tool to select the more relevant variables before the use of any predictive modelling methodology. It is proven that FIR feature selection provides interesting insights into the main building variables causally related to heating and cooling loads. This allows better decision making and design strategies, since accurate cooling and heating load estimations and correct identification of parameters that affect building energy demands are of high importance to optimize building designs and equipment specifications.Peer ReviewedPostprint (published version
Machine Learning in Automated Text Categorization
The automated categorization (or classification) of texts into predefined
categories has witnessed a booming interest in the last ten years, due to the
increased availability of documents in digital form and the ensuing need to
organize them. In the research community the dominant approach to this problem
is based on machine learning techniques: a general inductive process
automatically builds a classifier by learning, from a set of preclassified
documents, the characteristics of the categories. The advantages of this
approach over the knowledge engineering approach (consisting in the manual
definition of a classifier by domain experts) are a very good effectiveness,
considerable savings in terms of expert manpower, and straightforward
portability to different domains. This survey discusses the main approaches to
text categorization that fall within the machine learning paradigm. We will
discuss in detail issues pertaining to three different problems, namely
document representation, classifier construction, and classifier evaluation.Comment: Accepted for publication on ACM Computing Survey
On Cognitive Preferences and the Plausibility of Rule-based Models
It is conventional wisdom in machine learning and data mining that logical
models such as rule sets are more interpretable than other models, and that
among such rule-based models, simpler models are more interpretable than more
complex ones. In this position paper, we question this latter assumption by
focusing on one particular aspect of interpretability, namely the plausibility
of models. Roughly speaking, we equate the plausibility of a model with the
likeliness that a user accepts it as an explanation for a prediction. In
particular, we argue that, all other things being equal, longer explanations
may be more convincing than shorter ones, and that the predominant bias for
shorter models, which is typically necessary for learning powerful
discriminative models, may not be suitable when it comes to user acceptance of
the learned models. To that end, we first recapitulate evidence for and against
this postulate, and then report the results of an evaluation in a
crowd-sourcing study based on about 3.000 judgments. The results do not reveal
a strong preference for simple rules, whereas we can observe a weak preference
for longer rules in some domains. We then relate these results to well-known
cognitive biases such as the conjunction fallacy, the representative heuristic,
or the recogition heuristic, and investigate their relation to rule length and
plausibility.Comment: V4: Another rewrite of section on interpretability to clarify focus
on plausibility and relation to interpretability, comprehensibility, and
justifiabilit
Highly Relevant Routing Recommendation Systems for Handling Few Data Using MDL Principle and Embedded Relevance Boosting Factors
A route recommendation system can provide better recommendation if it also
takes collected user reviews into account, e.g. places that generally get
positive reviews may be preferred. However, to classify sentiment, many
classification algorithms existing today suffer in handling small data items
such as short written reviews. In this paper we propose a model for a strongly
relevant route recommendation system that is based on an MDL-based (Minimum
Description Length) sentiment classification and show that such a system is
capable of handling small data items (short user reviews). Another highlight of
the model is the inclusion of a set of boosting factors in the relevance
calculation to improve the relevance in any recommendation system that
implements the model.Comment: ACM SIGIR 2018 Workshop on Learning from Limited or Noisy Data for
Information Retrieval (LND4IR'18), July 12, 2018, Ann Arbor, Michigan, USA, 8
pages, 9 figure
Variable selection for BART: An application to gene regulation
We consider the task of discovering gene regulatory networks, which are
defined as sets of genes and the corresponding transcription factors which
regulate their expression levels. This can be viewed as a variable selection
problem, potentially with high dimensionality. Variable selection is especially
challenging in high-dimensional settings, where it is difficult to detect
subtle individual effects and interactions between predictors. Bayesian
Additive Regression Trees [BART, Ann. Appl. Stat. 4 (2010) 266-298] provides a
novel nonparametric alternative to parametric regression approaches, such as
the lasso or stepwise regression, especially when the number of relevant
predictors is sparse relative to the total number of available predictors and
the fundamental relationships are nonlinear. We develop a principled
permutation-based inferential approach for determining when the effect of a
selected predictor is likely to be real. Going further, we adapt the BART
procedure to incorporate informed prior information about variable importance.
We present simulations demonstrating that our method compares favorably to
existing parametric and nonparametric procedures in a variety of data settings.
To demonstrate the potential of our approach in a biological context, we apply
it to the task of inferring the gene regulatory network in yeast (Saccharomyces
cerevisiae). We find that our BART-based procedure is best able to recover the
subset of covariates with the largest signal compared to other variable
selection methods. The methods developed in this work are readily available in
the R package bartMachine.Comment: Published in at http://dx.doi.org/10.1214/14-AOAS755 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
- …