4 research outputs found

    On mathematical optimization for clustering categories in contingency tables

    Get PDF
    Many applications in data analysis study whether two categorical variables are independent using a function of the entries of their contingency table. Often, the categories of the variables, associated with the rows and columns of the table, are grouped, yielding a less granular representation of the categorical variables. The purpose of this is to attain reasonable sample sizes in the cells of the table and, more importantly, to incorporate expert knowledge on the allowable groupings. However, it is known that the conclusions on independence depend, in general, on the chosen granularity, as in the Simpson paradox. In this paper we propose a methodology to, for a given contingency table and a fixed granularity, find a clustered table with the highest χ2 statistic. Repeating this procedure for different values of the granularity, we can either identify an extreme grouping, namely the largest granularity for which the statistical dependence is still detected, or conclude that it does not exist and that the two variables are dependent regardless of the size of the clustered table. For this problem, we propose an assignment mathematical formulation and a set partitioning one. Our approach is flexible enough to include constraints on the desirable structure of the clusters, such as must-link or cannot-link constraints on the categories that can, or cannot, be merged together, and ensure reasonable sample sizes in the cells of the clustered table from which trustful statistical conclusions can be derived. We illustrate the usefulness of our methodology using a dataset of a medical study.Open Access funding provided thanks to the CRUE-CSIC agreement with Springer Nature. This research has been financed in part by research projects EC H2020 MSCA RISE NeEDS (Grant agreement ID: 822214), FQM-329, P18-FR-2369 and US-1381178 (Junta de Andalucía, with FEDER Funds), PID2019-110886RB-I00 and PID2019-104901RB-I00 (funded by MCIN/AEI/10.13039/501100011033). This support is gratefully acknowledged

    Variable neighborhood search for minimum sum-of-squares clustering on networks

    No full text
    Euclidean Minimum Sum-of-Squares Clustering amounts to finding p prototypes by minimizing the sum of the squared Euclidean distances from a set of points to their closest prototype. In recent years related clustering problems have been extensively analyzed under the assumption that the space is a network, and not any more the Euclidean space. This allows one to properly address community detection problems, of significant relevance in diverse phenomena in biological, technological and social systems. However, the problem of minimizing the sum of squared distances on networks have not yet been addressed. Two versions of the problem are possible: either the p prototypes are sought among the set of nodes of the network, or also points along edges are taken into account as possible prototypes. While the first problem is transformed into a classical discrete p-median problem, the latter is new in the literature, and solved in this paper with the Variable Neighborhood Search heuristic. The solutions of the two problems are compared in a series of test examples

    The dial-a-ride problem with electric vehicles and battery swapping stations

    Get PDF
    The Dial-a-Ride Problem (DARP) consists of designing vehicle routes and schedules for customers with special needs and/or disabilities. The DARP with Electric Vehicles and battery swapping stations (DARP-EV) concerns scheduling a fleet of EVs to serve a set of pre-specified transport requests during a certain planning horizon. In addition, EVs can be recharged by swapping their batteries with charged ones from any battery-swap stations. We propose three enhanced Evolutionary Variable Neighborhood Search (EVO-VNS) algorithms to solve the DARP-EV. Extensive computational experiments highlight the relevance of the problem and confirm the efficiency of the proposed EVO-VNS algorithms in producing high quality solutions
    corecore