5,733 research outputs found
Local Guarantees in Graph Cuts and Clustering
Correlation Clustering is an elegant model that captures fundamental graph
cut problems such as Min Cut, Multiway Cut, and Multicut, extensively
studied in combinatorial optimization. Here, we are given a graph with edges
labeled or and the goal is to produce a clustering that agrees with the
labels as much as possible: edges within clusters and edges across
clusters. The classical approach towards Correlation Clustering (and other
graph cut problems) is to optimize a global objective. We depart from this and
study local objectives: minimizing the maximum number of disagreements for
edges incident on a single node, and the analogous max min agreements
objective. This naturally gives rise to a family of basic min-max graph cut
problems. A prototypical representative is Min Max Cut: find an cut
minimizing the largest number of cut edges incident on any node. We present the
following results: an -approximation for the problem of
minimizing the maximum total weight of disagreement edges incident on any node
(thus providing the first known approximation for the above family of min-max
graph cut problems), a remarkably simple -approximation for minimizing
local disagreements in complete graphs (improving upon the previous best known
approximation of ), and a -approximation for
maximizing the minimum total weight of agreement edges incident on any node,
hence improving upon the -approximation that follows from
the study of approximate pure Nash equilibria in cut and party affiliation
games
Recommender Systems
The ongoing rapid expansion of the Internet greatly increases the necessity
of effective recommender systems for filtering the abundant information.
Extensive research for recommender systems is conducted by a broad range of
communities including social and computer scientists, physicists, and
interdisciplinary researchers. Despite substantial theoretical and practical
achievements, unification and comparison of different approaches are lacking,
which impedes further advances. In this article, we review recent developments
in recommender systems and discuss the major challenges. We compare and
evaluate available algorithms and examine their roles in the future
developments. In addition to algorithms, physical aspects are described to
illustrate macroscopic behavior of recommender systems. Potential impacts and
future directions are discussed. We emphasize that recommendation has a great
scientific depth and combines diverse research fields which makes it of
interests for physicists as well as interdisciplinary researchers.Comment: 97 pages, 20 figures (To appear in Physics Reports
Incorporating peak grouping information for alignment of multiple liquid chromatography-mass spectrometry datasets
Motivation: The combination of liquid chromatography and mass spectrometry (LC/MS) has been widely used for large-scale comparative studies in systems biology, including proteomics, glycomics and metabolomics. In almost all experimental design, it is necessary to compare chromatograms across biological or technical replicates and across sample groups. Central to this is the peak alignment step, which is one of the most important but challenging preprocessing steps. Existing alignment tools do not take into account the structural dependencies between related peaks that co-elute and are derived from the same metabolite or peptide. We propose a direct matching peak alignment method for LC/MS data that incorporates related peaks information (within each LC/MS run) and investigate its effect on alignment performance (across runs). The groupings of related peaks necessary for our method can be obtained from any peak clustering method and are built into a pairwise peak similarity score function. The similarity score matrix produced is used by an approximation algorithm for the weighted matching problem to produce the actual alignment result.<p></p>
Results:
We demonstrate that related peak information can improve alignment performance. The performance is evaluated on a set of benchmark datasets, where our method performs competitively compared to other popular alignment tools.<p></p>
Availability: The proposed alignment method has been implemented
as a stand-alone application in Python, available for download at
http://github.com/joewandy/peak-grouping-alignment.<p></p>
Soft clustering analysis of galaxy morphologies: A worked example with SDSS
Context: The huge and still rapidly growing amount of galaxies in modern sky
surveys raises the need of an automated and objective classification method.
Unsupervised learning algorithms are of particular interest, since they
discover classes automatically. Aims: We briefly discuss the pitfalls of
oversimplified classification methods and outline an alternative approach
called "clustering analysis". Methods: We categorise different classification
methods according to their capabilities. Based on this categorisation, we
present a probabilistic classification algorithm that automatically detects the
optimal classes preferred by the data. We explore the reliability of this
algorithm in systematic tests. Using a small sample of bright galaxies from the
SDSS, we demonstrate the performance of this algorithm in practice. We are able
to disentangle the problems of classification and parametrisation of galaxy
morphologies in this case. Results: We give physical arguments that a
probabilistic classification scheme is necessary. The algorithm we present
produces reasonable morphological classes and object-to-class assignments
without any prior assumptions. Conclusions: There are sophisticated automated
classification algorithms that meet all necessary requirements, but a lot of
work is still needed on the interpretation of the results.Comment: 18 pages, 19 figures, 2 tables, submitted to A
Network analysis of online bidding activity
With the advent of digital media, people are increasingly resorting to online
channels for commercial transactions. Online auction is a prototypical example.
In such online transactions, the pattern of bidding activity is more complex
than traditional online transactions; this is because the number of bidders
participating in a given transaction is not bounded and the bidders can also
easily respond to the bidding instantaneously. By using the recently developed
network theory, we study the interaction patterns between bidders (items) who
(that) are connected when they bid for the same item (if the item is bid by the
same bidder). The resulting network is analyzed by using the hierarchical
clustering algorithm, which is used for clustering analysis for expression data
from DNA microarrays. A dendrogram is constructed for the item subcategories;
this dendrogram is compared with a traditional classification scheme. The
implication of the difference between the two is discussed.Comment: 8 pages and 11 figure
- …