45 research outputs found

    Normality-based validation for crisp clustering

    Full text link
    This is the author’s version of a work that was accepted for publication in Pattern Recognition. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version was subsequently published in Pattern Recognition, 43, 36, (2010) DOI 10.1016/j.patcog.2009.09.018We introduce a new validity index for crisp clustering that is based on the average normality of the clusters. Unlike methods based on inter-cluster and intra-cluster distances, this index emphasizes the cluster shape by using a high order characterization of its probability distribution. The normality of a cluster is characterized by its negentropy, a standard measure of the distance to normality which evaluates the difference between the cluster's entropy and the entropy of a normal distribution with the same covariance matrix. The definition of the negentropy involves the distribution's differential entropy. However, we show that it is possible to avoid its explicit computation by considering only negentropy increments with respect to the initial data distribution, where all the points are assumed to belong to the same cluster. The resulting negentropy increment validity index only requires the computation of covariance matrices. We have applied the new index to an extensive set of artificial and real problems where it provides, in general, better results than other indices, both with respect to the prediction of the correct number of clusters and to the similarity among the real clusters and those inferred.This work has been partially supported with funds from MEC BFU2006-07902/BFI, CAM S-SEM-0255-2006 and CAM/UAM CCG08-UAM/TIC-442

    Information processing in neural systems: oscillations, network topologies and optimal representations

    Full text link
    Tesis doctoral inédita leída en la Universidad Autónoma de Madrid. Escuela Politécnica Superior, Departamento de Ingeniería informática. Fecha de lectura: 1-07-200

    The effect of low number of points in clustering validation via the negentropy increment

    Full text link
    This is the author’s version of a work that was accepted for publication in Neurocomputing. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version was subsequently published in Neurocomputing, 74, 16, (2011) DOI: 10.1016/j.neucom.2011.03.023Selected papers of the 10th International Work-Conference on Artificial Neural Networks (IWANN2009)We recently introduced the negentropy increment, a validity index for crisp clustering that quantifies the average normality of the clustering partitions using the negentropy. This index can satisfactorily deal with clusters with heterogeneous orientations, scales and densities. One of the main advantages of the index is the simplicity of its calculation, which only requires the computation of the log-determinants of the covariance matrices and the prior probabilities of each cluster. The negentropy increment provides validation results which are in general better than those from other classic cluster validity indices. However, when the number of data points in a partition region is small, the quality in the estimation of the log-determinant of the covariance matrix can be very poor. This affects the proper quantification of the index and therefore the quality of the clustering, so additional requirements such as limitations on the minimum number of points in each region are needed. Although this kind of constraints can provide good results, they need to be adjusted depending on parameters such as the dimension of the data space. In this article we investigate how the estimation of the negentropy increment of a clustering partition is affected by the presence of regions with small number of points. We find that the error in this estimation depends on the number of points in each region, but not on the scale or orientation of their distribution, and show how to correct this error in order to obtain an unbiased estimator of the negentropy increment. We also quantify the amount of uncertainty in the estimation. As we show, both for 2D synthetic problems and multidimensional real benchmark problems, these results can be used to validate clustering partitions with a substantial improvement.This work has been funded by DGUI-CAM/UAM (Project CCG10-UAM/TIC-5864

    A compression-based method for detecting anomalies in textual data

    Full text link
    Nowadays, information and communications technology systems are fundamental assets of our social and economical model, and thus they should be properly protected against the malicious activity of cybercriminals. Defence mechanisms are generally articulated around tools that trace and store information in several ways, the simplest one being the generation of plain text files coined as security logs. Such log files are usually inspected, in a semi-automatic way, by security analysts to detect events that may affect system integrity, confidentiality and availability. On this basis, we propose a parameter-free method to detect security incidents from structured text regardless its nature. We use the Normalized Compression Distance to obtain a set of features that can be used by a Support Vector Machine to classify events from a heterogeneous cybersecurity environment. In particular, we explore and validate the application of our method in four different cybersecurity domains: HTTP anomaly identification, spam detection, Domain Generation Algorithms tracking and sentiment analysis. The results obtained show the validity and flexibility of our approach in different security scenarios with a low configuration burdenThis research has received funding from the European Union’s Horizon 2020 Research and Innovation Programme under grant agreement No. 872855 (TRESCA project), from the Comunidad de Madrid (Spain) under the projects CYNAMON (P2018/TCS-4566) and S2017/BMD-3688, co-financed with FSE and FEDER EU funds, by the Consejo Superior de Investigaciones Científicas (CSIC) under the project LINKA20216 (“Advancing in cybersecurity technologies”, i-LINK+ program), and by Spanish project MINECO/FEDER TIN2017-84452-

    Fast response and temporal coherent oscillations in small-world networks

    Full text link
    We have investigated the role that different connectivity regimes play in the dynamics of a network of Hodgkin-Huxley neurons by computer simulations. The different connectivity topologies exhibit the following features: random topologies give rise to fast system response yet are unable to produce coherent oscillations in the average activity of the network; on the other hand, regular topologies give rise to coherent oscillations, but in a temporal scale that is not in accordance with fast signal processing. Finally, small-world topologies, which fall between random and regular ones, take advantage of the best features of both, giving rise to fast system response with coherent oscillations.We acknowledge G. Laurent, A. Bäcker, M. Bazhenov, M. Rabinovich, and H. Abarbanel for insightful discussions. We thank the Dirección General de Enseñanza Superior e Investigación Científica for financial support (PB97-1448), the CAM for financial support to L. F. L., and the CCCFC (UAM) for the use of computation resources

    Fast response and coherent oscillations in small-world Hodgkin-Huxley neural networks

    Full text link
    This is an electronic version of the paper presented at the I Jornadas Técnicas de la ETS de Informática, held in Madrid on 200

    Evaluation of negentropy-based cluster validation techniques in problems with increasing dimensionality

    Full text link
    The aim of a crisp cluster validity index is to quantify the quality of a given data partition. It allows to select the best partition out of a set of potential ones, and to determine the number of clusters. Recently, negentropy-based cluster validation has been introduced. This new approach seems to perform better than other state of the art techniques, and its computation is quite simple. However, like many other cluster validation approaches, it presents problems when some partition regions have a small number of points. Different heuristics have been proposed to cope with this problem. In this article we systematically analyze the performance of different negentropy-based validation approaches, including a new heuristic, in clustering problems of increasing dimensionality, and compare them to reference criteria such as AIC and BIC. Our results on synthetic data suggest that the newly proposed negentropy-based validation strategy can outperform AIC and BIC when the ratio of the number of points to the dimension is not high, which is a very common situation in most real applications.The authors thank the financial support from DGUI-CAM/UAM (Project CCG10-UAM/TIC-5864

    A comparison of techniques for robust gender recognition

    Full text link
    Reprinted, with permission, from [Rojas Bello, R.N., Lago Fernández, L.F., Martínez Muñoz, G., y Sánchez Montañés, M.A., A comparision of techniques for robust gender recognition, IEEE International Conference on Image Processing, ICIP 2011]. This material is posted here with permission of the IEEE. Such permission of the IEEE does not in any way imply IEEE endorsement of any of the Universidad Autónoma de Madrid's products or services. Internal or personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution must be obtained from the IEEE by writing to [email protected]. By choosing to view this document, you agree to all provisions of the copyright laws protecting it.Proceedings of 2011 18th IEEE International Conference on Image Processing (ICIP), 11-14 Sept. 2011, BrusselsAutomatic gender classification of face images is an area of growing interest with multiple applications. Appropriate classifiers should be robust against variations such as illumination, scale and orientation that occur in real world applications. This can be achieved by normalizing the images in order to reduce those variations (alignment, re-scaling, histogram-equalization, etc.), or by extracting features from the original images which are invariant respect to those variations. In this work we perform a robust comparison of eight different classifiers across 100 random partitions of a set of frontal face images. Four of them are state-of-the-art methods in automatic gender classification that use image normalization (SVMs, Neural Networks, ADABOOST and PCA+LDA). The other four strategies use invariant features extracted by SIFT (BOW, Evidence Random Trees, NBNN and Voted Nearest-Neighbor). The best strategies are SVM using normalized images and NBNN, the latter having the advantage that no strong image pre-processing is needed.This work has been supported by CDTI (project INTEGRA) and DGUICAM/UAM (project CCG10-UAM/TIC-5864
    corecore