302,338 research outputs found
On the role of pre and post-processing in environmental data mining
The quality of discovered knowledge is highly depending on data quality. Unfortunately real data use to contain noise, uncertainty, errors, redundancies or even irrelevant information. The more complex is the reality to be analyzed, the higher the risk of getting low quality data. Knowledge Discovery from Databases (KDD) offers a global framework to prepare data in the right form to perform correct analyses. On the other hand, the quality of decisions taken upon KDD results, depend not only on the quality of the results themselves, but on the capacity of the system to communicate those results in an understandable form. Environmental systems are particularly complex and environmental users particularly require clarity in their results. In this paper some details about how this can be achieved are provided. The role of the pre and post processing in the whole process of Knowledge Discovery in environmental systems is discussed
Transforming Graph Representations for Statistical Relational Learning
Relational data representations have become an increasingly important topic
due to the recent proliferation of network datasets (e.g., social, biological,
information networks) and a corresponding increase in the application of
statistical relational learning (SRL) algorithms to these domains. In this
article, we examine a range of representation issues for graph-based relational
data. Since the choice of relational data representation for the nodes, links,
and features can dramatically affect the capabilities of SRL algorithms, we
survey approaches and opportunities for relational representation
transformation designed to improve the performance of these algorithms. This
leads us to introduce an intuitive taxonomy for data representation
transformations in relational domains that incorporates link transformation and
node transformation as symmetric representation tasks. In particular, the
transformation tasks for both nodes and links include (i) predicting their
existence, (ii) predicting their label or type, (iii) estimating their weight
or importance, and (iv) systematically constructing their relevant features. We
motivate our taxonomy through detailed examples and use it to survey and
compare competing approaches for each of these tasks. We also discuss general
conditions for transforming links, nodes, and features. Finally, we highlight
challenges that remain to be addressed
Interpretable Aircraft Engine Diagnostic via Expert Indicator Aggregation
Detecting early signs of failures (anomalies) in complex systems is one of
the main goal of preventive maintenance. It allows in particular to avoid
actual failures by (re)scheduling maintenance operations in a way that
optimizes maintenance costs. Aircraft engine health monitoring is one
representative example of a field in which anomaly detection is crucial.
Manufacturers collect large amount of engine related data during flights which
are used, among other applications, to detect anomalies. This article
introduces and studies a generic methodology that allows one to build automatic
early signs of anomaly detection in a way that builds upon human expertise and
that remains understandable by human operators who make the final maintenance
decision. The main idea of the method is to generate a very large number of
binary indicators based on parametric anomaly scores designed by experts,
complemented by simple aggregations of those scores. A feature selection method
is used to keep only the most discriminant indicators which are used as inputs
of a Naive Bayes classifier. This give an interpretable classifier based on
interpretable anomaly detectors whose parameters have been optimized indirectly
by the selection process. The proposed methodology is evaluated on simulated
data designed to reproduce some of the anomaly types observed in real world
engines.Comment: arXiv admin note: substantial text overlap with arXiv:1408.6214,
arXiv:1409.4747, arXiv:1407.088
- …