45,039 research outputs found
Network analysis of online bidding activity
With the advent of digital media, people are increasingly resorting to online
channels for commercial transactions. Online auction is a prototypical example.
In such online transactions, the pattern of bidding activity is more complex
than traditional online transactions; this is because the number of bidders
participating in a given transaction is not bounded and the bidders can also
easily respond to the bidding instantaneously. By using the recently developed
network theory, we study the interaction patterns between bidders (items) who
(that) are connected when they bid for the same item (if the item is bid by the
same bidder). The resulting network is analyzed by using the hierarchical
clustering algorithm, which is used for clustering analysis for expression data
from DNA microarrays. A dendrogram is constructed for the item subcategories;
this dendrogram is compared with a traditional classification scheme. The
implication of the difference between the two is discussed.Comment: 8 pages and 11 figure
Nonparametric Feature Extraction from Dendrograms
We propose feature extraction from dendrograms in a nonparametric way. The
Minimax distance measures correspond to building a dendrogram with single
linkage criterion, with defining specific forms of a level function and a
distance function over that. Therefore, we extend this method to arbitrary
dendrograms. We develop a generalized framework wherein different distance
measures can be inferred from different types of dendrograms, level functions
and distance functions. Via an appropriate embedding, we compute a vector-based
representation of the inferred distances, in order to enable many numerical
machine learning algorithms to employ such distances. Then, to address the
model selection problem, we study the aggregation of different dendrogram-based
distances respectively in solution space and in representation space in the
spirit of deep representations. In the first approach, for example for the
clustering problem, we build a graph with positive and negative edge weights
according to the consistency of the clustering labels of different objects
among different solutions, in the context of ensemble methods. Then, we use an
efficient variant of correlation clustering to produce the final clusters. In
the second approach, we investigate the sequential combination of different
distances and features sequentially in the spirit of multi-layered
architectures to obtain the final features. Finally, we demonstrate the
effectiveness of our approach via several numerical studies
Link Clustering with Extended Link Similarity and EQ Evaluation Division.
Link Clustering (LC) is a relatively new method for detecting overlapping communities in networks. The basic principle of LC is to derive a transform matrix whose elements are composed of the link similarity of neighbor links based on the Jaccard distance calculation; then it applies hierarchical clustering to the transform matrix and uses a measure of partition density on the resulting dendrogram to determine the cut level for best community detection. However, the original link clustering method does not consider the link similarity of non-neighbor links, and the partition density tends to divide the communities into many small communities. In this paper, an Extended Link Clustering method (ELC) for overlapping community detection is proposed. The improved method employs a new link similarity, Extended Link Similarity (ELS), to produce a denser transform matrix, and uses the maximum value of EQ (an extended measure of quality of modularity) as a means to optimally cut the dendrogram for better partitioning of the original network space. Since ELS uses more link information, the resulting transform matrix provides a superior basis for clustering and analysis. Further, using the EQ value to find the best level for the hierarchical clustering dendrogram division, we obtain communities that are more sensible and reasonable than the ones obtained by the partition density evaluation. Experimentation on five real-world networks and artificially-generated networks shows that the ELC method achieves higher EQ and In-group Proportion (IGP) values. Additionally, communities are more realistic than those generated by either of the original LC method or the classical CPM method
Hierarchically nested factor model from multivariate data
We show how to achieve a statistical description of the hierarchical
structure of a multivariate data set. Specifically we show that the similarity
matrix resulting from a hierarchical clustering procedure is the correlation
matrix of a factor model, the hierarchically nested factor model. In this
model, factors are mutually independent and hierarchically organized. Finally,
we use a bootstrap based procedure to reduce the number of factors in the model
with the aim of retaining only those factors significantly robust with respect
to the statistical uncertainty due to the finite length of data records.Comment: 7 pages, 5 figures; accepted for publication in Europhys. Lett. ; the
Appendix corresponds to the additional material of the accepted letter
- …
