Search CORE

216 research outputs found

Caracterización e interpretación automática de descripciones conceptuales en dominios poco estructurados usando variables numéricas

Author: Gibert Karina
Vazquez Fernando
Publication venue
Publication date: 01/01/2002
Field of study

La investigación que se presenta en este proyecto, tiene como objetivo fundamental: establecer una metodología formal para la generación automática de descripciones conceptuales de clases construidas en dominios de naturaleza continua, reales y complejos, llamados Dominios poco Estructurados. Si bien, la metodología tiene como punto de partida el estudio del boxplot múltiple, la formalización del procedimiento de interpretación visual pasa por determinar los valores de cada variable donde se producen cambios en la distribución y construir la tabla de frecuencias condicionadas a dichos intervalos. Ello da lugar a una representación difusa de los grados de pertenencia de los valores de la variable a las distintas clases; lo que constituye un cómodo soporte para caracterizar e interpretar automáticamente las descripciones conceptuales de las clases. La metodología aporta un sistema de caracterización de clases, desde un punto de vista semántico, en comparación con otros métodos de cluster, cuando se aplica sobre datos provenientes de un Dominio poco Estructurado; además, de una nueva aproximación para discretizar el espacio de atributos cuantitativos en términos de intervalos de longitud variable como base de la metodología, y contribuciones a la validación de una clasificación, en cuanto a su representación y calidad, en el sentido de que una clasificación es válida si probamos que las clases obtenidas tienen sentido o utilidad y a la generación automática de clases resultantes como base del proceso predicción y/o diagnóstico. La metodología representa una nueva forma para extraer conocimiento útil y comprensible por el usuario usando una combinación de herramientas estadísticas (boxplot múltiple, análisis de datos), inteligencia artificial (aprendizaje automático, sistemas basados en el conocimiento) y lógica difusa (modelos y razonamiento difusos). Como caso de estudio se ha aplicado a una base de datos de una depuradora de aguas residuales que se describe en el capítulo 4 usando atributos cuantitativos, los resultados que se han obtenidos son prometedores, constituyendo un primer paso para establecer una metodología formal en la obtención automática de interpretaciones conceptuales de clases, sobre la base de atributos cuantitativos para describir los objetos (días en este caso de estudio). Finalmente, nuestro trabajo cumple todas las fases del proceso KDD (Knowledge Discovery in Databases) descritas por Fayyad et al., enfatizando la fase de generación automática de interpretación, en nuestro caso, de las clases resultantes de una partición de referencia.Postprint (published version

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Knowledge discovery from data as a framework to decision support in medical domains

Author: Gibert Karina
Publication venue: Igitur, Utrecht Publishing & Archiving
Publication date: 01/06/2009
Field of study

Directory of Open Access Journals

PubMed Central

Utrecht University Repository

Classificació automàtica amb KLASS de les dades de procés d'una EDAR

Author: Flores Xavier
Gibert Karina
Rodriguez-Roda Ignasi
Publication venue
Publication date: 01/01/2003
Field of study

In this study an automatic cluster is done on data froma a Waster Water Treatment Plant to obtain specific knowledge of an ill-structured domain like biological wastewater treatment process. The whole process requires the supervision of the expert of the process. This clustering is done using KLASS, which allows to deal simultaneously with numerical and symbolic variables in the description of objects. From this automatic classification, we obtain a set of clusters that can be labelled as typical operational states. All the knowledge and information acquired will be constitute the initial library of a case base reasoning system that together with expert system constitute a decision support system for this WWTP.Postprint (published version

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Determinación de factores influyentes sobre una respuesta en un dominio poco estructurado

Author: Gibert Karina
Rodas Osollo Jorge Enrique
Rojo Emilio
Publication venue
Publication date: 01/01/2001
Field of study

This report focuses on results obtained from a classification technique applied to time series data in a medical ill-structured The statistical analysis and classification --in ill-structured-- of such data are often inadequate because of the intrinsic characteristics of those domains. The database in this analysis contains information relative to patients with major depressive disorders or esquizofrenia; as a consequence, a high quantity of database variables contain data corresponding to measures taken in different instant of time, making curves. For this reason we are motivated about how we can establish a useful classification technique of curves in a medical ill-structured domain.Postprint (published version

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

The role of ontologies in supporting distributed medical systems

Author: Gibert Karina
Valls Aïda
Publication venue: Igitur, Utrecht Publishing & Archiving
Publication date: 01/06/2009
Field of study

Directory of Open Access Journals

PubMed Central

Utrecht University Repository

aTLP: a color-based model of uncertainty to evaluate the risk of decisions based on prototypes

Author: Conti Dante
Gibert Karina
Publication venue: 'IOS Press'
Publication date: 01/01/2015
Field of study

Clustering techniques find homogeneous and distinguishable prototypes. Careful interpretation of these prototypes is crucial to assist the experts to better organize this know-how and to really improve their decision-making processes. The Traffic Lights Panel was introduced in 2009 as a postprocessing tool to provide understanding of clustering prototypes. In this work, annotated Traffic Lights Panel (aTLP) is presented as an enrichment of the TLP to manage the intrinsic uncertainty related with prototypes themselves. The aTLP handles uncertainty through a quantification of the prototypes' purity based on the variation coefficients (VC) and an associated color-based uncertainty model, with two dimensions - tone and saturation - representing nominal trend and purity of the prototype. An application to a waste-water treatment plant in Slovenia, in a discrete and continuous approach, suggests that aTLP seems a useful and friendly tool able to reduce the gap between data mining and effective decision support, towards informed-decisions.Peer ReviewedPostprint (author's final draft

UPCommons. Portal del coneixement obert de la UPC

Bootstrap-CURE clustering: An investigation of impact of shrinking on clustering performance

Author: Gibert Karina
Karna Ashutosh
Publication venue: 'IOS Press'
Publication date: 01/01/2022
Field of study

Hierarchical clustering is one of the most popular techniques in unsupervised segmentation. However, since it has quadratic complexity as it is based on pairwise distance matrix construction, it tends to be less used with really large data cases. CURE clustering tackles this challenge by accelerating the process through a first hierarchical clustering over a smaller sample from which a set of representative points of resulting clusters is obtained and used to estimate the cluster shape. A KNN process with those representative points allows completing the cluster assignment to the remaining points. This clustering technique scales the hierarchical clustering to large datasets. This work is in continuation of the earlier research, Bootstrap-CURE which uses repeated samples in the first part of the process and gains both robustness and representativeness. Also, the proposed approach uses a criterion for automatic identification of the number of clusters from a dendrogram, so that the bootstrap samples can be automatically processed. In this paper, the concept of shrinkage is proposed as a hyperparameter to the Bootstrap-CURE clustering approach. The inclusion of shrinkage brings the proposed clustering technique closer to the original CURE clustering. The impact of shrinkage on the overall performance of Bootstrap-CURE is further explored. A real-life use case from 3D printers is presented to illustrate the performance of the proposed clustering.Peer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC

Finding patterns from a user-centric perspective using knowledge discovery methods

Author: Gibert Karina
Palomino Arturo
Publication venue: Editorial Universitat Politècnica de València
Publication date: 22/09/2023
Field of study

[EN] Chained advertisement involves breaking down a marketing campaign message into multiple banners that are shown to a user in a specific sequence in order to create a less intrusive and more effective campaign. The challenge is determining the most effective sequence of websites and banner order. This study aims to develop a recommendation system to assist with this issue. To address the vast size of the internet and the complexity of the problem, the research uses a data-driven computational approach to estimate the probability of different sequence events and apply this to real user data from a leading company. The proposed method is faster and more efficient than previous approaches.Palomino, A.; Gibert, K. (2023). Finding patterns from a user-centric perspective using knowledge discovery methods. Editorial Universitat Politècnica de València. 307-317. https://doi.org/10.4995/CARMA2023.2023.1604130731

RiuNet

Automatic identification of the number of clusters in hierarchical clustering

Author: Gibert Karina
Karna Ashutosh
Publication venue: Springer Nature
Publication date: 01/01/2022
Field of study

Hierarchical clustering is one of the most suitable tools to discover the underlying true structure of a dataset in the case of unsupervised learning where the ground truth is unknown and classical machine learning classifiers are not suitable. In many real applications, it provides a perspective on inner data structure and is preferred to partitional methods. However, determining the resulting number of clusters in hierarchical clustering requires human expertise to deduce this from the dendrogram and this represents a major challenge in making a fully automatic system such as the ones required for decision support in Industry 4.0. This research proposes a general criterion to perform the cut of a dendrogram automatically, by comparing six original criteria based on the Calinski-Harabasz index. The performance of each criterion on 95 real-life dendrograms of different topologies is evaluated against the number of classes proposed by the experts and a winner criterion is determined. This research is framed in a bigger project to build an Intelligent Decision Support system to assess the performance of 3D printers based on sensor data in real-time, although the proposed criteria can be used in other real applications of hierarchical clustering.The methodology is applied to a real-life dataset from the 3D printers and the huge reduction in CPU time is also shown by comparing the CPU time before and after this modification of the entire clustering method. It also reduces the dependability on human-expert to provide the number of clusters by inspecting the dendrogram. Further, such a process allows applying hierarchical clustering in an automatic mode in real-life industrial applications and allows the continuous monitoring of real 3D printers in production, and helps in building an Intelligent Decision Support System to detect operational modes, anomalies, and other behavioral patterns.Peer ReviewedPostprint (author's final draft

UPCommons. Portal del coneixement obert de la UPC

Classification based on rules and thyroids dysfunctions

Author: Gibert Karina
Sonicki Z
Publication venue
Publication date: 01/10/1999
Field of study

This is the peer reviewed version of the following article: Gibert, K.; Sonicki, Z. Classification based on rules and thyroids dysfunctions. "Applied stochastic models and data analysis", Octubre 1999, vol. 15, núm. 4, p. 319-324, which has been published in final form at http://onlinelibrary.wiley.com/doi/10.1002/(SICI)1526-4025(199910/12)15:4%3C319::AID-ASMB396%3E3.0.CO;2-H/abstract. This article may be used for non-commercial purposes in accordance with Wiley Terms and Conditions for Use of Self-Archived Versions. This article may not be enhanced, enriched or otherwise transformed into a derivative work, without express permission from Wiley or by statutory rights under applicable legislation. Copyright notices must not be removed, obscured or modified. The article must be linked to Wiley’s version of record on Wiley Online Library and any embedding, framing or otherwise making available the article or pages thereof by third parties from platforms, services and websites other than Wiley Online Library must be prohibited.Classification in ill-structured domains (ISD) is a difficult problem for the actual statistical and artificial intelligence techniques, because of the intrinsic characteristics of those domains. Classification based on rules is our proposal to overcome the limitations of Statistics and Artificial Intelligence techniques referred to in this particular context. In this paper, an application of the classification based on rules to a set of real data is presented. Data base is about thyroid function and data was provided by a hospital from Zagreb (Croatia) covering a period of two years.Peer ReviewedPostprint (author's final draft

UPCommons. Portal del coneixement obert de la UPC