8 research outputs found

    Balanced k-Means and Min-Cut Clustering

    Full text link
    Clustering is an effective technique in data mining to generate groups that are the matter of interest. Among various clustering approaches, the family of k-means algorithms and min-cut algorithms gain most popularity due to their simplicity and efficacy. The classical k-means algorithm partitions a number of data points into several subsets by iteratively updating the clustering centers and the associated data points. By contrast, a weighted undirected graph is constructed in min-cut algorithms which partition the vertices of the graph into two sets. However, existing clustering algorithms tend to cluster minority of data points into a subset, which shall be avoided when the target dataset is balanced. To achieve more accurate clustering for balanced dataset, we propose to leverage exclusive lasso on k-means and min-cut to regulate the balance degree of the clustering results. By optimizing our objective functions that build atop the exclusive lasso, we can make the clustering result as much balanced as possible. Extensive experiments on several large-scale datasets validate the advantage of the proposed algorithms compared to the state-of-the-art clustering algorithms

    Gerrymandering and computational redistricting

    Get PDF
    Partisan gerrymandering poses a threat to democracy. Moreover, the complexity of the districting task may exceed human capacities. One potential solution is using computational models to automate the districting process by optimizing objective and open criteria, such as how spatially compact districts are. We formulated one such model that minimised pairwise distance between voters within a district. Using US Census Bureau data, we confirmed our prediction that the difference in compactness between the computed and actual districts would be greatest for states that are large and, therefore, difficult for humans to properly district given their limited capacities. The computed solutions highlighted differences in how humans and machines solve this task with machine solutions more fully optimised and displaying emergent properties not evident in human solutions. These results suggest a division of labour in which humans debate and formulate districting criteria whereas machines optimise the criteria to draw the district boundaries. We discuss how criteria can be expanded beyond notions of compactness to include other factors, such as respecting municipal boundaries, historic communities, and relevant legislation

    MCFS: Min-cut-based feature-selection

    Get PDF
    In this paper, MCFS (Min-Cut-based feature-selection) is presented, which is a feature-selection algorithm based on the representation of the features in a dataset by means of a directed graph. The main contribution of our work is to show the usefulness of a general graph-processing technique in the feature-selection problem for classification datasets. The vertices of the graphs used herein are the features together with two special-purpose vertices (one of which denotes high correlation to the feature class of the dataset, and the other denotes a low correlation to the feature class). The edges are functions of the correlations among the features and also between the features and the classes. A classic max-flow min-cut algorithm is applied to this graph. The cut returned by this algorithm provides the selected features. We have compared the results of our proposal with well-known feature-selection techniques. Our algorithm obtains results statistically similar to those achieved by the other techniques in terms of number of features selected, while additionally significantly improving the accuracy.Ministerio de Ciencia, Innovación y Universidades RTI2018-098 062-A-I00Ministerio de Economía y Competitividad TIN2017-82113-C2-1-

    Geoinformatics in Citizen Science

    Get PDF
    The book features contributions that report original research in the theoretical, technological, and social aspects of geoinformation methods, as applied to supporting citizen science. Specifically, the book focuses on the technological aspects of the field and their application toward the recruitment of volunteers and the collection, management, and analysis of geotagged information to support volunteer involvement in scientific projects. Internationally renowned research groups share research in three areas: First, the key methods of geoinformatics within citizen science initiatives to support scientists in discovering new knowledge in specific application domains or in performing relevant activities, such as reliable geodata filtering, management, analysis, synthesis, sharing, and visualization; second, the critical aspects of citizen science initiatives that call for emerging or novel approaches of geoinformatics to acquire and handle geoinformation; and third, novel geoinformatics research that could serve in support of citizen science
    corecore