72,451 research outputs found

    Selecting the number of clusters, clustering models, and algorithms. A unifying approach based on the quadratic discriminant score

    Get PDF
    Cluster analysis requires many decisions: the clustering method and the implied reference model, the number of clusters and, often, several hyper-parameters and algorithms' tunings. In practice, one produces several partitions, and a final one is chosen based on validation or selection criteria. There exist an abundance of validation methods that, implicitly or explicitly, assume a certain clustering notion. Moreover, they are often restricted to operate on partitions obtained from a specific method. In this paper, we focus on groups that can be well separated by quadratic or linear boundaries. The reference cluster concept is defined through the quadratic discriminant score function and parameters describing clusters' size, center and scatter. We develop two cluster-quality criteria called quadratic scores. We show that these criteria are consistent with groups generated from a general class of elliptically-symmetric distributions. The quest for this type of groups is common in applications. The connection with likelihood theory for mixture models and model-based clustering is investigated. Based on bootstrap resampling of the quadratic scores, we propose a selection rule that allows choosing among many clustering solutions. The proposed method has the distinctive advantage that it can compare partitions that cannot be compared with other state-of-the-art methods. Extensive numerical experiments and the analysis of real data show that, even if some competing methods turn out to be superior in some setups, the proposed methodology achieves a better overall performance.Comment: Supplemental materials are included at the end of the pape

    Murnaghan-Nakayama Rule The Explanation and Usage of the Algorithm

    Get PDF
    Character values are not the easiest to calculate, so it is important to find good algorithms that can help ease these calculations. In the 20th century, the two mathematicians Murnaghan and Nakayama developed a rule that calculates character values for partitions on some computations. This rule has later been given the name The Murnaghan-Nakayama rule, after these two authors. The Murnaghan-Nakayama rule is a combinatorial method for computing character values of irreducible representations of symmetric groups. This makes this rule an important part of representation theory. One of the versions of this rule is stated in the recursive Murnaghan-Nakayama rule. Where, in this version, we can use border strips and diagrams to calculate the character values of representations on a given composition. This algorithm is quite fast in these calculations. The Murnaghan-Nakayama rule can also be considered a central algorithm in representation theory over symmetric groups. It is a fascinating and powerful algorithm that has a strong connection to both combinatorics and representation theory

    Partitioning of Uniform Dependency Algorithms for Parallel Execution on MIMD/ Systolic Systems

    Get PDF
    An algorithm can be modeled as an index set and a set of dependence vectors. Each index vector in the index set indexes a computation of the algorithm. If the execution of a computation depends on the execution of another computation, then this dependency is represented as the difference between the index vectors of the computations. The dependence matrix corresponds to a matrix where each column is a dependence vector. An independent partition of the index set is such that there are no dependencies between computations that belong to different blocks of the partition. This report considers uniform dependence algorithms with any arbitrary kind of index set and proposes two very simple methods to find independent partitions of the index set. Each method has advantages over the other one for certain kind of application, and they both outperform previously proposed approaches in terms of computational complexity and/or optimality. Also, lower bounds and upper bounds of the cardinality of the maximal independent partitions are given. For some algorithms it is shown that the cardinality of the maximal partition is equal to the greatest common divisor of some subdeterminants of the dependence matrix. In an MIMD/multiple systolic array computation environment, if different blocks of ail independent partition are assigned to different processors/arrays, the communications between processors/arrays will be minimized to zero. This is significant because the communications usually dominate the overhead in MIMD machines. Some issues of mapping partitioned algorithms into MIMD/systolic systems are addressed. Based on the theory of partitioning, a new method is proposed to test if a system of linear Diophantine equations has integer solutions

    Mesoscopic Community Structure of Financial Markets Revealed by Price and Sign Fluctuations

    Get PDF
    The mesoscopic organization of complex systems, from financial markets to the brain, is an intermediate between the microscopic dynamics of individual units (stocks or neurons, in the mentioned cases), and the macroscopic dynamics of the system as a whole. The organization is determined by "communities" of units whose dynamics, represented by time series of activity, is more strongly correlated internally than with the rest of the system. Recent studies have shown that the binary projections of various financial and neural time series exhibit nontrivial dynamical features that resemble those of the original data. This implies that a significant piece of information is encoded into the binary projection (i.e. the sign) of such increments. Here, we explore whether the binary signatures of multiple time series can replicate the same complex community organization of the financial market, as the original weighted time series. We adopt a method that has been specifically designed to detect communities from cross-correlation matrices of time series data. Our analysis shows that the simpler binary representation leads to a community structure that is almost identical with that obtained using the full weighted representation. These results confirm that binary projections of financial time series contain significant structural information.Comment: 15 pages, 7 figure

    Dynamic programming for graphs on surfaces

    Get PDF
    We provide a framework for the design and analysis of dynamic programming algorithms for surface-embedded graphs on n vertices and branchwidth at most k. Our technique applies to general families of problems where standard dynamic programming runs in 2O(k·log k). Our approach combines tools from topological graph theory and analytic combinatorics.Postprint (updated version

    Efficient Algorithms for Searching the Minimum Information Partition in Integrated Information Theory

    Full text link
    The ability to integrate information in the brain is considered to be an essential property for cognition and consciousness. Integrated Information Theory (IIT) hypothesizes that the amount of integrated information (Φ\Phi) in the brain is related to the level of consciousness. IIT proposes that to quantify information integration in a system as a whole, integrated information should be measured across the partition of the system at which information loss caused by partitioning is minimized, called the Minimum Information Partition (MIP). The computational cost for exhaustively searching for the MIP grows exponentially with system size, making it difficult to apply IIT to real neural data. It has been previously shown that if a measure of Φ\Phi satisfies a mathematical property, submodularity, the MIP can be found in a polynomial order by an optimization algorithm. However, although the first version of Φ\Phi is submodular, the later versions are not. In this study, we empirically explore to what extent the algorithm can be applied to the non-submodular measures of Φ\Phi by evaluating the accuracy of the algorithm in simulated data and real neural data. We find that the algorithm identifies the MIP in a nearly perfect manner even for the non-submodular measures. Our results show that the algorithm allows us to measure Φ\Phi in large systems within a practical amount of time
    • …
    corecore