14 research outputs found

    On semi-supervised fuzzy c-means clustering for data with clusterwise tolerance by opposite criteria

    No full text
    This paper presents a new semi-supervised fuzzy c-means clustering for data with clusterwise tolerance by opposite criteria. In semi-supervised clustering, pairwise constraints, that is, must-link and cannot-link, are frequently used in order to improve clustering performances. From the viewpoint of handling pairwise constraints, a new semi-supervised fuzzy c-means clustering is proposed by introducing clusterwise tolerance-based pairwise constraints. First, a concept of clusterwise tolerance-based pairwise constraints is introduced. Second, the optimization problems of the proposed method are formulated. Especially, must-link and cannot-link are handled by opposite criteria in our proposed method. Third, a new clustering algorithm is constructed based on the above discussions. Finally, the effectiveness of the proposed algorithm is verified through numerical examples

    An exploration of methodologies to improve semi-supervised hierarchical clustering with knowledge-based constraints

    Get PDF
    Clustering algorithms with constraints (also known as semi-supervised clustering algorithms) have been introduced to the field of machine learning as a significant variant to the conventional unsupervised clustering learning algorithms. They have been demonstrated to achieve better performance due to integrating prior knowledge during the clustering process, that enables uncovering relevant useful information from the data being clustered. However, the research conducted within the context of developing semi-supervised hierarchical clustering techniques are still an open and active investigation area. Majority of current semi-supervised clustering algorithms are developed as partitional clustering (PC) methods and only few research efforts have been made on developing semi-supervised hierarchical clustering methods. The aim of this research is to enhance hierarchical clustering (HC) algorithms based on prior knowledge, by adopting novel methodologies. [Continues.

    Molecular Bronchiolitis Obliterans Syndrome Risk Monitoring: A Systems-Based Approach

    Get PDF
    The combination of high throughput omics (i.e. genomics or proteomics) and machine learning offers new possibilities for clinical diagnostics and the detection of biomarkers. One disease for which no reliable prognostic marker has been found yet is bronchiolitis obliterans (BO), a clinical manifestation of chronic rejection after lung transplantation. BO is the major limiting factor for long-term survival after lung transplantation, and manifests as a chronic bronchiolar inammation accompanied by progressive sub-mucosal fibrosis leading to gradual obliteration of the bronchiolar lumen. The resulting reduction in forced expiratory volume per second (FEV 1 ) is defined as the bronchiolitis obliterans syndrome (BOS). As chronic lung transplant failure occurs more frequently than in other organ transplants, molecular markers for early BO and BOS detection are urgently required to adapt the patients immunosuppressive regimen when airway damage is minimal. To achieve this goal, gene expression in bronchial epithelial cells (microarray anaylsis) and on the proteome level in bronchoalveolar lavage fluid (BALF)(mass spectrometry profiling) were monitored. Analysis of the obtained data sets was performed using novel and established methods from the fields of machine learning and statistics. This thesis also introduces a novel clustering algorithm. In the analysis of gene expression microarrays one problem is the unsupervised discovery of stable and biologically relevant patient subgroups. To this end I developed a novel clustering algorithm. This algorithm focuses on the discovery of a set of patient clusters defined by the consistent up- and down-regulation of a subset of genes. Assessment of cluster stability is done using a bootstrap resampling scheme. This makes it possible to rank the genes in accordance with their clusterwise importance. The algorithm was applied to a publicly available B-cell lymphoma microarray data set and compared to other commonly used clustering algorithms

    From metaheuristics to learnheuristics: Applications to logistics, finance, and computing

    Get PDF
    Un gran nombre de processos de presa de decisions en sectors estratègics com el transport i la producció representen problemes NP-difícils. Sovint, aquests processos es caracteritzen per alts nivells d'incertesa i dinamisme. Les metaheurístiques són mètodes populars per a resoldre problemes d'optimització difícils en temps de càlcul raonables. No obstant això, sovint assumeixen que els inputs, les funcions objectiu, i les restriccions són deterministes i conegudes. Aquests constitueixen supòsits forts que obliguen a treballar amb problemes simplificats. Com a conseqüència, les solucions poden conduir a resultats pobres. Les simheurístiques integren la simulació a les metaheurístiques per resoldre problemes estocàstics d'una manera natural. Anàlogament, les learnheurístiques combinen l'estadística amb les metaheurístiques per fer front a problemes en entorns dinàmics, en què els inputs poden dependre de l'estructura de la solució. En aquest context, les principals contribucions d'aquesta tesi són: el disseny de les learnheurístiques, una classificació dels treballs que combinen l'estadística / l'aprenentatge automàtic i les metaheurístiques, i diverses aplicacions en transport, producció, finances i computació.Un gran número de procesos de toma de decisiones en sectores estratégicos como el transporte y la producción representan problemas NP-difíciles. Frecuentemente, estos problemas se caracterizan por altos niveles de incertidumbre y dinamismo. Las metaheurísticas son métodos populares para resolver problemas difíciles de optimización de manera rápida. Sin embargo, suelen asumir que los inputs, las funciones objetivo y las restricciones son deterministas y se conocen de antemano. Estas fuertes suposiciones conducen a trabajar con problemas simplificados. Como consecuencia, las soluciones obtenidas pueden tener un pobre rendimiento. Las simheurísticas integran simulación en metaheurísticas para resolver problemas estocásticos de una manera natural. De manera similar, las learnheurísticas combinan aprendizaje estadístico y metaheurísticas para abordar problemas en entornos dinámicos, donde los inputs pueden depender de la estructura de la solución. En este contexto, las principales aportaciones de esta tesis son: el diseño de las learnheurísticas, una clasificación de trabajos que combinan estadística / aprendizaje automático y metaheurísticas, y varias aplicaciones en transporte, producción, finanzas y computación.A large number of decision-making processes in strategic sectors such as transport and production involve NP-hard problems, which are frequently characterized by high levels of uncertainty and dynamism. Metaheuristics have become the predominant method for solving challenging optimization problems in reasonable computing times. However, they frequently assume that inputs, objective functions and constraints are deterministic and known in advance. These strong assumptions lead to work on oversimplified problems, and the solutions may demonstrate poor performance when implemented. Simheuristics, in turn, integrate simulation into metaheuristics as a way to naturally solve stochastic problems, and, in a similar fashion, learnheuristics combine statistical learning and metaheuristics to tackle problems in dynamic environments, where inputs may depend on the structure of the solution. The main contributions of this thesis include (i) a design for learnheuristics; (ii) a classification of works that hybridize statistical and machine learning and metaheuristics; and (iii) several applications for the fields of transport, production, finance and computing

    A comparison of the CAR and DAGAR spatial random effects models with an application to diabetics rate estimation in Belgium

    Get PDF
    When hierarchically modelling an epidemiological phenomenon on a finite collection of sites in space, one must always take a latent spatial effect into account in order to capture the correlation structure that links the phenomenon to the territory. In this work, we compare two autoregressive spatial models that can be used for this purpose: the classical CAR model and the more recent DAGAR model. Differently from the former, the latter has a desirable property: its ρ parameter can be naturally interpreted as the average neighbor pair correlation and, in addition, this parameter can be directly estimated when the effect is modelled using a DAGAR rather than a CAR structure. As an application, we model the diabetics rate in Belgium in 2014 and show the adequacy of these models in predicting the response variable when no covariates are available

    A Statistical Approach to the Alignment of fMRI Data

    Get PDF
    Multi-subject functional Magnetic Resonance Image studies are critical. The anatomical and functional structure varies across subjects, so the image alignment is necessary. We define a probabilistic model to describe functional alignment. Imposing a prior distribution, as the matrix Fisher Von Mises distribution, of the orthogonal transformation parameter, the anatomical information is embedded in the estimation of the parameters, i.e., penalizing the combination of spatially distant voxels. Real applications show an improvement in the classification and interpretability of the results compared to various functional alignment methods

    Methods for Modelling Response Styles

    Get PDF
    Abstract Ratings scales are ubiquitous in empirical research, especially in the social sciences, where they are used for measuring abstract concepts such as opinion or attitude. Survey questions typically employ rating scales, for example when persons are asked to self-report their perceptions of films or their job satisfaction. Yet, using a rating scale is subjective. Some persons may use only the middle of the rating scale, whilst others choose to use only the extremes. Consequently, persons with the same opinion may very well answer the same survey question using different ratings. This leads to the response style problem: How can we take into account that different ratings can potentially have different meanings to different persons when analyzing such data? This dissertation makes methodological and empirical contributions towards modelling rating scale data while accounting for such differences in response styles. The general approach is to identify individuals in the data which exhibit similar response styles, and to extract substantive information only within such groups. These elements naturally lead to the synthesis of cluster analysis and dimensionality reduction methods. In order to identify these response styles, responses to multiple survey questions are used to assess within-subject rating scale usage. Both non-parametric and parametric approaches are formulated and studied, and accompanying open-source software implementations are made available. The added value of using the developed algorithms is illustrated by applying these to empirical data. Applications range from sensometrics and brand studies, to psychology and political science

    SIS 2017. Statistics and Data Science: new challenges, new generations

    Get PDF
    The 2017 SIS Conference aims to highlight the crucial role of the Statistics in Data Science. In this new domain of ‘meaning’ extracted from the data, the increasing amount of produced and available data in databases, nowadays, has brought new challenges. That involves different fields of statistics, machine learning, information and computer science, optimization, pattern recognition. These afford together a considerable contribute in the analysis of ‘Big data’, open data, relational and complex data, structured and no-structured. The interest is to collect the contributes which provide from the different domains of Statistics, in the high dimensional data quality validation, sampling extraction, dimensional reduction, pattern selection, data modelling, testing hypotheses and confirming conclusions drawn from the data
    corecore