72,451 research outputs found
Selecting the number of clusters, clustering models, and algorithms. A unifying approach based on the quadratic discriminant score
Cluster analysis requires many decisions: the clustering method and the
implied reference model, the number of clusters and, often, several
hyper-parameters and algorithms' tunings. In practice, one produces several
partitions, and a final one is chosen based on validation or selection
criteria. There exist an abundance of validation methods that, implicitly or
explicitly, assume a certain clustering notion. Moreover, they are often
restricted to operate on partitions obtained from a specific method. In this
paper, we focus on groups that can be well separated by quadratic or linear
boundaries. The reference cluster concept is defined through the quadratic
discriminant score function and parameters describing clusters' size, center
and scatter. We develop two cluster-quality criteria called quadratic scores.
We show that these criteria are consistent with groups generated from a general
class of elliptically-symmetric distributions. The quest for this type of
groups is common in applications. The connection with likelihood theory for
mixture models and model-based clustering is investigated. Based on bootstrap
resampling of the quadratic scores, we propose a selection rule that allows
choosing among many clustering solutions. The proposed method has the
distinctive advantage that it can compare partitions that cannot be compared
with other state-of-the-art methods. Extensive numerical experiments and the
analysis of real data show that, even if some competing methods turn out to be
superior in some setups, the proposed methodology achieves a better overall
performance.Comment: Supplemental materials are included at the end of the pape
Murnaghan-Nakayama Rule The Explanation and Usage of the Algorithm
Character values are not the easiest to calculate, so it is important to find good algorithms that can help ease these calculations. In the 20th century, the two mathematicians Murnaghan and Nakayama developed a rule that calculates character values for partitions on some computations. This rule has later been given the name The Murnaghan-Nakayama rule, after these two authors.
The Murnaghan-Nakayama rule is a combinatorial method for computing character values of irreducible representations of symmetric groups. This makes this rule an important part of representation theory. One of the versions of this rule is stated in the recursive Murnaghan-Nakayama rule. Where, in this version, we can use border strips and diagrams to calculate the character values of representations on a given composition. This algorithm is quite fast in these calculations.
The Murnaghan-Nakayama rule can also be considered a central algorithm in representation theory over symmetric groups. It is a fascinating and powerful algorithm that has a strong connection to both combinatorics and representation theory
Partitioning of Uniform Dependency Algorithms for Parallel Execution on MIMD/ Systolic Systems
An algorithm can be modeled as an index set and a set of dependence vectors. Each index vector in the index set indexes a computation of the algorithm. If the execution of a computation depends on the execution of another computation, then this dependency is represented as the difference between the index vectors of the computations. The dependence matrix corresponds to a matrix where each column is a dependence vector. An independent partition of the index set is such that there are no dependencies between computations that belong to different blocks of the partition. This report considers uniform dependence algorithms with any arbitrary kind of index set and proposes two very simple methods to find independent partitions of the index set. Each method has advantages over the other one for certain kind of application, and they both outperform previously proposed approaches in terms of computational complexity and/or optimality. Also, lower bounds and upper bounds of the cardinality of the maximal independent partitions are given. For some algorithms it is shown that the cardinality of the maximal partition is equal to the greatest common divisor of some subdeterminants of the dependence matrix. In an MIMD/multiple systolic array computation environment, if different blocks of ail independent partition are assigned to different processors/arrays, the communications between processors/arrays will be minimized to zero. This is significant because the communications usually dominate the overhead in MIMD machines. Some issues of mapping partitioned algorithms into MIMD/systolic systems are addressed. Based on the theory of partitioning, a new method is proposed to test if a system of linear Diophantine equations has integer solutions
Mesoscopic Community Structure of Financial Markets Revealed by Price and Sign Fluctuations
The mesoscopic organization of complex systems, from financial markets to the
brain, is an intermediate between the microscopic dynamics of individual units
(stocks or neurons, in the mentioned cases), and the macroscopic dynamics of
the system as a whole. The organization is determined by "communities" of units
whose dynamics, represented by time series of activity, is more strongly
correlated internally than with the rest of the system. Recent studies have
shown that the binary projections of various financial and neural time series
exhibit nontrivial dynamical features that resemble those of the original data.
This implies that a significant piece of information is encoded into the binary
projection (i.e. the sign) of such increments. Here, we explore whether the
binary signatures of multiple time series can replicate the same complex
community organization of the financial market, as the original weighted time
series. We adopt a method that has been specifically designed to detect
communities from cross-correlation matrices of time series data. Our analysis
shows that the simpler binary representation leads to a community structure
that is almost identical with that obtained using the full weighted
representation. These results confirm that binary projections of financial time
series contain significant structural information.Comment: 15 pages, 7 figure
Dynamic programming for graphs on surfaces
We provide a framework for the design and analysis of dynamic
programming algorithms for surface-embedded graphs on n vertices
and branchwidth at most k. Our technique applies to general families
of problems where standard dynamic programming runs in 2O(k·log k).
Our approach combines tools from topological graph theory and
analytic combinatorics.Postprint (updated version
Efficient Algorithms for Searching the Minimum Information Partition in Integrated Information Theory
The ability to integrate information in the brain is considered to be an
essential property for cognition and consciousness. Integrated Information
Theory (IIT) hypothesizes that the amount of integrated information () in
the brain is related to the level of consciousness. IIT proposes that to
quantify information integration in a system as a whole, integrated information
should be measured across the partition of the system at which information loss
caused by partitioning is minimized, called the Minimum Information Partition
(MIP). The computational cost for exhaustively searching for the MIP grows
exponentially with system size, making it difficult to apply IIT to real neural
data. It has been previously shown that if a measure of satisfies a
mathematical property, submodularity, the MIP can be found in a polynomial
order by an optimization algorithm. However, although the first version of
is submodular, the later versions are not. In this study, we empirically
explore to what extent the algorithm can be applied to the non-submodular
measures of by evaluating the accuracy of the algorithm in simulated
data and real neural data. We find that the algorithm identifies the MIP in a
nearly perfect manner even for the non-submodular measures. Our results show
that the algorithm allows us to measure in large systems within a
practical amount of time
- …