216 research outputs found

    Caracterización e interpretación automática de descripciones conceptuales en dominios poco estructurados usando variables numéricas

    Get PDF
    La investigación que se presenta en este proyecto, tiene como objetivo fundamental: establecer una metodología formal para la generación automática de descripciones conceptuales de clases construidas en dominios de naturaleza continua, reales y complejos, llamados Dominios poco Estructurados. Si bien, la metodología tiene como punto de partida el estudio del boxplot múltiple, la formalización del procedimiento de interpretación visual pasa por determinar los valores de cada variable donde se producen cambios en la distribución y construir la tabla de frecuencias condicionadas a dichos intervalos. Ello da lugar a una representación difusa de los grados de pertenencia de los valores de la variable a las distintas clases; lo que constituye un cómodo soporte para caracterizar e interpretar automáticamente las descripciones conceptuales de las clases. La metodología aporta un sistema de caracterización de clases, desde un punto de vista semántico, en comparación con otros métodos de cluster, cuando se aplica sobre datos provenientes de un Dominio poco Estructurado; además, de una nueva aproximación para discretizar el espacio de atributos cuantitativos en términos de intervalos de longitud variable como base de la metodología, y contribuciones a la validación de una clasificación, en cuanto a su representación y calidad, en el sentido de que una clasificación es válida si probamos que las clases obtenidas tienen sentido o utilidad y a la generación automática de clases resultantes como base del proceso predicción y/o diagnóstico. La metodología representa una nueva forma para extraer conocimiento útil y comprensible por el usuario usando una combinación de herramientas estadísticas (boxplot múltiple, análisis de datos), inteligencia artificial (aprendizaje automático, sistemas basados en el conocimiento) y lógica difusa (modelos y razonamiento difusos). Como caso de estudio se ha aplicado a una base de datos de una depuradora de aguas residuales que se describe en el capítulo 4 usando atributos cuantitativos, los resultados que se han obtenidos son prometedores, constituyendo un primer paso para establecer una metodología formal en la obtención automática de interpretaciones conceptuales de clases, sobre la base de atributos cuantitativos para describir los objetos (días en este caso de estudio). Finalmente, nuestro trabajo cumple todas las fases del proceso KDD (Knowledge Discovery in Databases) descritas por Fayyad et al., enfatizando la fase de generación automática de interpretación, en nuestro caso, de las clases resultantes de una partición de referencia.Postprint (published version

    Classificació automàtica amb KLASS de les dades de procés d'una EDAR

    Get PDF
    In this study an automatic cluster is done on data froma a Waster Water Treatment Plant to obtain specific knowledge of an ill-structured domain like biological wastewater treatment process. The whole process requires the supervision of the expert of the process. This clustering is done using KLASS, which allows to deal simultaneously with numerical and symbolic variables in the description of objects. From this automatic classification, we obtain a set of clusters that can be labelled as typical operational states. All the knowledge and information acquired will be constitute the initial library of a case base reasoning system that together with expert system constitute a decision support system for this WWTP.Postprint (published version

    Determinación de factores influyentes sobre una respuesta en un dominio poco estructurado

    Get PDF
    This report focuses on results obtained from a classification technique applied to time series data in a medical ill-structured The statistical analysis and classification --in ill-structured-- of such data are often inadequate because of the intrinsic characteristics of those domains. The database in this analysis contains information relative to patients with major depressive disorders or esquizofrenia; as a consequence, a high quantity of database variables contain data corresponding to measures taken in different instant of time, making curves. For this reason we are motivated about how we can establish a useful classification technique of curves in a medical ill-structured domain.Postprint (published version

    aTLP: a color-based model of uncertainty to evaluate the risk of decisions based on prototypes

    Get PDF
    Clustering techniques find homogeneous and distinguishable prototypes. Careful interpretation of these prototypes is crucial to assist the experts to better organize this know-how and to really improve their decision-making processes. The Traffic Lights Panel was introduced in 2009 as a postprocessing tool to provide understanding of clustering prototypes. In this work, annotated Traffic Lights Panel (aTLP) is presented as an enrichment of the TLP to manage the intrinsic uncertainty related with prototypes themselves. The aTLP handles uncertainty through a quantification of the prototypes' purity based on the variation coefficients (VC) and an associated color-based uncertainty model, with two dimensions - tone and saturation - representing nominal trend and purity of the prototype. An application to a waste-water treatment plant in Slovenia, in a discrete and continuous approach, suggests that aTLP seems a useful and friendly tool able to reduce the gap between data mining and effective decision support, towards informed-decisions.Peer ReviewedPostprint (author's final draft

    Bootstrap-CURE clustering: An investigation of impact of shrinking on clustering performance

    Get PDF
    Hierarchical clustering is one of the most popular techniques in unsupervised segmentation. However, since it has quadratic complexity as it is based on pairwise distance matrix construction, it tends to be less used with really large data cases. CURE clustering tackles this challenge by accelerating the process through a first hierarchical clustering over a smaller sample from which a set of representative points of resulting clusters is obtained and used to estimate the cluster shape. A KNN process with those representative points allows completing the cluster assignment to the remaining points. This clustering technique scales the hierarchical clustering to large datasets. This work is in continuation of the earlier research, Bootstrap-CURE which uses repeated samples in the first part of the process and gains both robustness and representativeness. Also, the proposed approach uses a criterion for automatic identification of the number of clusters from a dendrogram, so that the bootstrap samples can be automatically processed. In this paper, the concept of shrinkage is proposed as a hyperparameter to the Bootstrap-CURE clustering approach. The inclusion of shrinkage brings the proposed clustering technique closer to the original CURE clustering. The impact of shrinkage on the overall performance of Bootstrap-CURE is further explored. A real-life use case from 3D printers is presented to illustrate the performance of the proposed clustering.Peer ReviewedPostprint (published version

    Finding patterns from a user-centric perspective using knowledge discovery methods

    Full text link
    [EN] Chained advertisement involves breaking down a marketing campaign message into multiple banners that are shown to a user in a specific sequence in order to create a less intrusive and more effective campaign. The challenge is determining the most effective sequence of websites and banner order. This study aims to develop a recommendation system to assist with this issue. To address the vast size of the internet and the complexity of the problem, the research uses a data-driven computational approach to estimate the probability of different sequence events and apply this to real user data from a leading company. The proposed method is faster and more efficient than previous approaches.Palomino, A.; Gibert, K. (2023). Finding patterns from a user-centric perspective using knowledge discovery methods. Editorial Universitat Politècnica de València. 307-317. https://doi.org/10.4995/CARMA2023.2023.1604130731

    Automatic identification of the number of clusters in hierarchical clustering

    Get PDF
    Hierarchical clustering is one of the most suitable tools to discover the underlying true structure of a dataset in the case of unsupervised learning where the ground truth is unknown and classical machine learning classifiers are not suitable. In many real applications, it provides a perspective on inner data structure and is preferred to partitional methods. However, determining the resulting number of clusters in hierarchical clustering requires human expertise to deduce this from the dendrogram and this represents a major challenge in making a fully automatic system such as the ones required for decision support in Industry 4.0. This research proposes a general criterion to perform the cut of a dendrogram automatically, by comparing six original criteria based on the Calinski-Harabasz index. The performance of each criterion on 95 real-life dendrograms of different topologies is evaluated against the number of classes proposed by the experts and a winner criterion is determined. This research is framed in a bigger project to build an Intelligent Decision Support system to assess the performance of 3D printers based on sensor data in real-time, although the proposed criteria can be used in other real applications of hierarchical clustering.The methodology is applied to a real-life dataset from the 3D printers and the huge reduction in CPU time is also shown by comparing the CPU time before and after this modification of the entire clustering method. It also reduces the dependability on human-expert to provide the number of clusters by inspecting the dendrogram. Further, such a process allows applying hierarchical clustering in an automatic mode in real-life industrial applications and allows the continuous monitoring of real 3D printers in production, and helps in building an Intelligent Decision Support System to detect operational modes, anomalies, and other behavioral patterns.Peer ReviewedPostprint (author's final draft

    Classification based on rules and thyroids dysfunctions

    Get PDF
    This is the peer reviewed version of the following article: Gibert, K.; Sonicki, Z. Classification based on rules and thyroids dysfunctions. "Applied stochastic models and data analysis", Octubre 1999, vol. 15, núm. 4, p. 319-324, which has been published in final form at http://onlinelibrary.wiley.com/doi/10.1002/(SICI)1526-4025(199910/12)15:4%3C319::AID-ASMB396%3E3.0.CO;2-H/abstract. This article may be used for non-commercial purposes in accordance with Wiley Terms and Conditions for Use of Self-Archived Versions. This article may not be enhanced, enriched or otherwise transformed into a derivative work, without express permission from Wiley or by statutory rights under applicable legislation. Copyright notices must not be removed, obscured or modified. The article must be linked to Wiley’s version of record on Wiley Online Library and any embedding, framing or otherwise making available the article or pages thereof by third parties from platforms, services and websites other than Wiley Online Library must be prohibited.Classification in ill-structured domains (ISD) is a difficult problem for the actual statistical and artificial intelligence techniques, because of the intrinsic characteristics of those domains. Classification based on rules is our proposal to overcome the limitations of Statistics and Artificial Intelligence techniques referred to in this particular context. In this paper, an application of the classification based on rules to a set of real data is presented. Data base is about thyroid function and data was provided by a hospital from Zagreb (Croatia) covering a period of two years.Peer ReviewedPostprint (author's final draft
    corecore