38 research outputs found

    MaxMin Linear Initialization for Fuzzy C-Means

    Get PDF
    International audienceClustering is an extensive research area in data science. The aim of clustering is to discover groups and to identify interesting patterns in datasets. Crisp (hard) clustering considers that each data point belongs to one and only one cluster. However, it is inadequate as some data points may belong to several clusters, as is the case in text categorization. Thus, we need more flexible clustering. Fuzzy clustering methods, where each data point can belong to several clusters, are an interesting alternative. Yet, seeding iterative fuzzy algorithms to achieve high quality clustering is an issue. In this paper, we propose a new linear and efficient initialization algorithm MaxMin Linear to deal with this problem. Then, we validate our theoretical results through extensive experiments on a variety of numerical real-world and artificial datasets. We also test several validity indices, including a new validity index that we propose, Transformed Standardized Fuzzy Difference (TSFD)

    Evaluation of multiple variate selection methods from a biological perspective: a nutrigenomics case study

    Get PDF
    Genomics-based technologies produce large amounts of data. To interpret the results and identify the most important variates related to phenotypes of interest, various multivariate regression and variate selection methods are used. Although inspected for statistical performance, the relevance of multivariate models in interpreting biological data sets often remains elusive. We compare various multivariate regression and variate selection methods applied to a nutrigenomics data set in terms of performance, utility and biological interpretability. The studied data set comprised hepatic transcriptome (10,072 predictor variates) and plasma protein concentrations [2 dependent variates: Leptin (LEP) and Tissue inhibitor of metalloproteinase 1 (TIMP-1)] collected during a high-fat diet study in ApoE3Leiden mice. The multivariate regression methods used were: partial least squares “PLS”; a genetic algorithm-based multiple linear regression, “GA-MLR”; two least-angle shrinkage methods, “LASSO” and “ELASTIC NET”; and a variant of PLS that uses covariance-based variate selection, “CovProc.” Two methods of ranking the genes for Gene Set Enrichment Analysis (GSEA) were also investigated: either by their correlation with the protein data or by the stability of the PLS regression coefficients. The regression methods performed similarly, with CovProc and GA performing the best and worst, respectively (R-squared values based on “double cross-validation” predictions of 0.762 and 0.451 for LEP; and 0.701 and 0.482 for TIMP-1). CovProc, LASSO and ELASTIC NET all produced parsimonious regression models and consistently identified small subsets of variates, with high commonality between the methods. Comparison of the gene ranking approaches found a high degree of agreement, with PLS-based ranking finding fewer significant gene sets. We recommend the use of CovProc for variate selection, in tandem with univariate methods, and the use of correlation-based ranking for GSEA-like pathway analysis methods

    Measuring Clients’ Perception of Functional Limitations Using the Perceived Functioning & Health Questionnaire

    Get PDF
    Background The Perceived Functioning & Health (PFH) questionnaire was developed to collect, in a standardized manner, which work activities are limited due to health conditions according to the perception of the client. In this study the questionnaire’s reliability and validity are investigated. Methods The PFH questionnaire is comprised of 147 questions, distributed over 33 scales, pertaining to the client’s psychosocial and physical work limitations. The PFH data of 800 respondents were analyzed: 254 healthy employees, 408 workers on sick leave and 138 recipients of a disability pension. Internal consistency (Cronbach’s α) for the scales was established. The test–retest reliability was examined for the data of 52 recipients of a disability pension who filled out the PFH twice within an interval of 1 month. Validation was established by taking the nature of the limitations as a criterion: mental limitations, physical limitations or a mix of both. To this end, the respondents were divided into groups distinguished on the basis of self-classification, as well as classification on the basis of disease codes given by insurance and occupational health physicians: a “healthy” group, subjects with only physical (“physical” group) or mental limitations (“mental” group) or mixed limitations (“mixed” group). The scale scores of these groups were compared and tested using analyses-of-variance and discriminant analyses. Results The scales were found to have sufficient to good internal consistency (mean Cronbach’s-α = 0.79) and test–retest reliability (mean correlation r = 0.76). Analyses-of-variance demonstrated significant differences between the scores of the mental, physical and healthy groups on most of the expected scales. These results were found both in groups defined by self-classification as well as in groups based on disease codes. Moreover, discriminant analyses revealed that the a priori classification of the respondents into three groups (mental, physical, healthy) for more than 75% of them corresponded with the classification on the basis of scale scores obtained from the questionnaire. Furthermore, limitations due to specific types of complaints (low back pain, fatigue, concentration problems) or diagnosed disorders (musculoskeletal disorders, reactive disorders, endogenous disorders) were clearly reflected in the scores of the related scales of the PFH. Conclusion The psychometric properties of the PFH with respect to reliability and validity were satisfactory. The PFH would appear to be an appropriate instrument for systematically measuring functional limitations in subjects on sick leave and in those receiving disability pensions, and could be used as a starting point in a disability claim procedure

    Health-Promoting and Health-Risk Behaviors: Theory-Driven Analyses of Multiple Health Behavior Change in Three International Samples

    Get PDF
    Background: Co-occurrence of different behaviors was investigated using the theoretical underpinnings of the Transtheoretical Model, the Theory of Triadic Influence and the concept of Transfer. Purpose: To investigate relationships between different health behaviors' stages of change, how behaviors group, and whether study participants cluster in terms of their behaviors. Method: Relationships across stages for different behaviors were assessed in three studies with N = 3,519, 965, and 310 individuals from the USA and Germany by telephone and internet surveys using correlational analyses, factor analyses, and cluster analyses. Results: Consistently stronger correlations were found between nutrition and physical activity (r = 0.16-0.26, p < 0.01) than between non-smoking and nutrition (r = 0.08-0.16, p < 0.03), or non-smoking and physical activity (r = 0.01-0.21). Principal component analyses of investigated behaviors indicated two factors: a "health-promoting" factor and a "health-risk" factor. Three distinct behavioral patterns were found in the cluster analyses. Conclusion: Our results support the assumption that individuals who are in a higher stage for one behavior are more likely to be in a higher stage for another behavior as well. If the aim is to improve a healthy lifestyle, success in one behavior can be used to facilitate changes in other behaviors--especially if the two behaviors are both health-promoting or health-risky. Moreover, interventions should be targeted towards the different behavioral patterns rather than to single behaviors. This might be achieved by addressing transfer between behaviors

    Global warming and Bergmann’s rule: do central European passerines adjust their body size to rising temperatures?

    Get PDF
    Recent climate change has caused diverse ecological responses in plants and animals. However, relatively little is known about homeothermic animals’ ability to adapt to changing temperature regimes through changes in body size, in accordance with Bergmann’s rule. We used fluctuations in mean annual temperatures in south-west Germany since 1972 in order to look for direct links between temperature and two aspects of body size: body mass and flight feather length. Data from regionally born juveniles of 12 passerine bird species were analysed. Body mass and feather length varied significantly among years in eight and nine species, respectively. Typically the inter-annual changes in morphology were complexly non-linear, as was inter-annual variation in temperature. For six (body mass) and seven species (feather length), these inter-annual fluctuations were significantly correlated with temperature fluctuations. However, negative correlations consistent with Bergmann’s rule were only found for five species, either for body mass or feather length. In several of the species for which body mass and feather length was significantly associated with temperature, morphological responses were better predicted by temperature data that were smoothed across multiple years than by the actual mean breeding season temperatures of the year of birth. This was found in five species for body mass and three species for feather length. These results suggest that changes in body size may not merely be the result of phenotypic plasticity but may hint at genetically based microevolutionary adaptations
    corecore