2,213 research outputs found
Relational Data Mining Through Extraction of Representative Exemplars
With the growing interest on Network Analysis, Relational Data Mining is
becoming an emphasized domain of Data Mining. This paper addresses the problem
of extracting representative elements from a relational dataset. After defining
the notion of degree of representativeness, computed using the Borda
aggregation procedure, we present the extraction of exemplars which are the
representative elements of the dataset. We use these concepts to build a
network on the dataset. We expose the main properties of these notions and we
propose two typical applications of our framework. The first application
consists in resuming and structuring a set of binary images and the second in
mining co-authoring relation in a research team
Chapter A Framework for Learning System for Complex Industrial Processes
Due to the intense price-based global competition, rising operating cost, rapidly changing economic conditions and stringent environmental regulations, modern process and energy industries are confronting unprecedented challenges to maintain profitability. Therefore, improving the product quality and process efficiency while reducing the production cost and plant downtime are matters of utmost importance. These objectives are somewhat counteracting, and to satisfy them, optimal operation and control of the plant components are essential. Use of optimization not only improves the control and monitoring of assets, but also offers better coordination among different assets. Thus, it can lead to extensive savings in the energy and resource consumption, and consequently offer reduction in operational costs, by offering better control, diagnostics and decision support. This is one of the main driving forces behind developing new methods, tools and frameworks. In this chapter, a generic learning system architecture is presented that can be retrofitted to existing automation platforms of different industrial plants. The architecture offers flexibility and modularity, so that relevant functionalities can be selected for a specific plant on an as-needed basis. Various functionalities such as soft-sensors, outputs prediction, model adaptation, control optimization, anomaly detection, diagnostics and decision supports are discussed in detail
Contributions to time series data mining towards the detection of outliers/anomalies
148 p.Los recientes avances tecnológicos han supuesto un gran progreso en la recogida de datos, permitiendo recopilar una gran cantidad de datos a lo largo del tiempo. Estos datos se presentan comúnmente en forma de series temporales, donde las observaciones se han registrado de forma cronológica y están correlacionadas en el tiempo. A menudo, estas dependencias temporales contienen información significativa y útil, por lo que, en los últimos años, ha surgido un gran interés por extraer dicha información. En particular, el área de investigación que se centra en esta tarea se denomina minería de datos de series temporales.La comunidad de investigadores de esta área se ha dedicado a resolver diferentes tareas como por ejemplo la clasificación, la predicción, el clustering o agrupamiento y la detección de valores atípicos/anomalías. Los valores atípicos o anomalías son aquellas observaciones que no siguen el comportamiento esperado en una serie temporal. Estos valores atípicos o anómalos suelen representar mediciones no deseadas o eventos de interés, y, por lo tanto, detectarlos suele ser relevante ya que pueden empeorar la calidad de los datos o reflejar fenómenos interesantes para el analista.Esta tesis presenta varias contribuciones en el campo de la minería de datos de series temporales, más específicamente sobre la detección de valores atípicos o anomalías. Estas contribuciones se pueden dividir en dos partes o bloques. Por una parte, la tesis presenta contribuciones en el campo de la detección de valores atípicos o anomalías en series temporales. Para ello, se ofrece una revisión de las técnicas en la literatura, y se presenta una nueva técnica de detección de anomalías en series temporales univariantes para la detección de fugas de agua, basada en el aprendizaje autosupervisado. Por otra parte, la tesis también introduce contribuciones relacionadas con el tratamiento de las series temporales con valores perdidos y demuestra su aplicabilidad en el campo de la detección de anomalías
Active Collaborative Ensemble Tracking
A discriminative ensemble tracker employs multiple classifiers, each of which
casts a vote on all of the obtained samples. The votes are then aggregated in
an attempt to localize the target object. Such method relies on collective
competence and the diversity of the ensemble to approach the target/non-target
classification task from different views. However, by updating all of the
ensemble using a shared set of samples and their final labels, such diversity
is lost or reduced to the diversity provided by the underlying features or
internal classifiers' dynamics. Additionally, the classifiers do not exchange
information with each other while striving to serve the collective goal, i.e.,
better classification. In this study, we propose an active collaborative
information exchange scheme for ensemble tracking. This, not only orchestrates
different classifier towards a common goal but also provides an intelligent
update mechanism to keep the diversity of classifiers and to mitigate the
shortcomings of one with the others. The data exchange is optimized with regard
to an ensemble uncertainty utility function, and the ensemble is updated via
co-training. The evaluations demonstrate promising results realized by the
proposed algorithm for the real-world online tracking.Comment: AVSS 2017 Submissio
Graph Embedding with Data Uncertainty
spectral-based subspace learning is a common data preprocessing step in many
machine learning pipelines. The main aim is to learn a meaningful low
dimensional embedding of the data. However, most subspace learning methods do
not take into consideration possible measurement inaccuracies or artifacts that
can lead to data with high uncertainty. Thus, learning directly from raw data
can be misleading and can negatively impact the accuracy. In this paper, we
propose to model artifacts in training data using probability distributions;
each data point is represented by a Gaussian distribution centered at the
original data point and having a variance modeling its uncertainty. We
reformulate the Graph Embedding framework to make it suitable for learning from
distributions and we study as special cases the Linear Discriminant Analysis
and the Marginal Fisher Analysis techniques. Furthermore, we propose two
schemes for modeling data uncertainty based on pair-wise distances in an
unsupervised and a supervised contexts.Comment: 20 pages, 4 figure
- …