11,572 research outputs found

    Clustering and Community Detection with Imbalanced Clusters

    Full text link
    Spectral clustering methods which are frequently used in clustering and community detection applications are sensitive to the specific graph constructions particularly when imbalanced clusters are present. We show that ratio cut (RCut) or normalized cut (NCut) objectives are not tailored to imbalanced cluster sizes since they tend to emphasize cut sizes over cut values. We propose a graph partitioning problem that seeks minimum cut partitions under minimum size constraints on partitions to deal with imbalanced cluster sizes. Our approach parameterizes a family of graphs by adaptively modulating node degrees on a fixed node set, yielding a set of parameter dependent cuts reflecting varying levels of imbalance. The solution to our problem is then obtained by optimizing over these parameters. We present rigorous limit cut analysis results to justify our approach and demonstrate the superiority of our method through experiments on synthetic and real datasets for data clustering, semi-supervised learning and community detection.Comment: Extended version of arXiv:1309.2303 with new applications. Accepted to IEEE TSIP

    Learning to Auto Weight: Entirely Data-driven and Highly Efficient Weighting Framework

    Full text link
    Example weighting algorithm is an effective solution to the training bias problem, however, most previous typical methods are usually limited to human knowledge and require laborious tuning of hyperparameters. In this paper, we propose a novel example weighting framework called Learning to Auto Weight (LAW). The proposed framework finds step-dependent weighting policies adaptively, and can be jointly trained with target networks without any assumptions or prior knowledge about the dataset. It consists of three key components: Stage-based Searching Strategy (3SM) is adopted to shrink the huge searching space in a complete training process; Duplicate Network Reward (DNR) gives more accurate supervision by removing randomness during the searching process; Full Data Update (FDU) further improves the updating efficiency. Experimental results demonstrate the superiority of weighting policy explored by LAW over standard training pipeline. Compared with baselines, LAW can find a better weighting schedule which achieves much more superior accuracy on both biased CIFAR and ImageNet.Comment: Accepted by AAAI 202

    Optimal Phase Swapping in Low Voltage Distribution Networks Based on Smart Meter Data and Optimization Heuristics

    Get PDF
    In this paper a modified version of the Harmony Search algorithm is proposed as a novel tool for phase swapping in Low Voltage Distribution Networks where the objective is to determine to which phase each load should be connected in order to reduce the unbalance when all phases are added into the neutral conductor. Unbalanced loads deteriorate power quality and increase costs of investment and operation. A correct assignment is a direct, effective alternative to prevent voltage peaks and network outages. The main contribution of this paper is the proposal of an optimization model for allocating phases consumers according to their individual consumption in the network of low-voltage distribution considering mono and bi-phase connections using real hourly load patterns, which implies that the computational complexity of the defined combinatorial optimization problem is heavily increased. For this purpose a novel metric function is defined in the proposed scheme. The performance of the HS algorithm has been compared with classical Genetic Algorithm. Presented results show that HS outperforms GA not only on terms of quality but on the convergence rate, reducing the computational complexity of the proposed scheme while provide mono and bi phase connections.This paper includes partial results of the UPGRID project. This project has re- ceived funding from the European Unions Horizon 2020 research and innovation programme under grant agreement No 646.531), for further information check the website: http://upgrid.eu. As well as by the Basque Government through the ELKARTEK programme (BID3A and BID3ABI projects)

    SZZ Unleashed: An Open Implementation of the SZZ Algorithm -- Featuring Example Usage in a Study of Just-in-Time Bug Prediction for the Jenkins Project

    Full text link
    Numerous empirical software engineering studies rely on detailed information about bugs. While issue trackers often contain information about when bugs were fixed, details about when they were introduced to the system are often absent. As a remedy, researchers often rely on the SZZ algorithm as a heuristic approach to identify bug-introducing software changes. Unfortunately, as reported in a recent systematic literature review, few researchers have made their SZZ implementations publicly available. Consequently, there is a risk that research effort is wasted as new projects based on SZZ output need to initially reimplement the approach. Furthermore, there is a risk that newly developed (closed source) SZZ implementations have not been properly tested, thus conducting research based on their output might introduce threats to validity. We present SZZ Unleashed, an open implementation of the SZZ algorithm for git repositories. This paper describes our implementation along with a usage example for the Jenkins project, and conclude with an illustrative study on just-in-time bug prediction. We hope to continue evolving SZZ Unleashed on GitHub, and warmly invite the community to contribute

    Massive Open Online Courses Temporal Profiling for Dropout Prediction

    Get PDF
    Massive Open Online Courses (MOOCs) are attracting the attention of people all over the world. Regardless the platform, numbers of registrants for online courses are impressive but in the same time, completion rates are disappointing. Understanding the mechanisms of dropping out based on the learner profile arises as a crucial task in MOOCs, since it will allow intervening at the right moment in order to assist the learner in completing the course. In this paper, the dropout behaviour of learners in a MOOC is thoroughly studied by first extracting features that describe the behavior of learners within the course and then by comparing three classifiers (Logistic Regression, Random Forest and AdaBoost) in two tasks: predicting which users will have dropped out by a certain week and predicting which users will drop out on a specific week. The former has showed to be considerably easier, with all three classifiers performing equally well. However, the accuracy for the second task is lower, and Logistic Regression tends to perform slightly better than the other two algorithms. We found that features that reflect an active attitude of the user towards the MOOC, such as submitting their assignment, posting on the Forum and filling their Profile, are strong indicators of persistence.Comment: 8 pages, ICTAI1
    • …
    corecore