18,440 research outputs found
Graph Summarization
The continuous and rapid growth of highly interconnected datasets, which are
both voluminous and complex, calls for the development of adequate processing
and analytical techniques. One method for condensing and simplifying such
datasets is graph summarization. It denotes a series of application-specific
algorithms designed to transform graphs into more compact representations while
preserving structural patterns, query answers, or specific property
distributions. As this problem is common to several areas studying graph
topologies, different approaches, such as clustering, compression, sampling, or
influence detection, have been proposed, primarily based on statistical and
optimization methods. The focus of our chapter is to pinpoint the main graph
summarization methods, but especially to focus on the most recent approaches
and novel research trends on this topic, not yet covered by previous surveys.Comment: To appear in the Encyclopedia of Big Data Technologie
Clustering and Community Detection in Directed Networks: A Survey
Networks (or graphs) appear as dominant structures in diverse domains,
including sociology, biology, neuroscience and computer science. In most of the
aforementioned cases graphs are directed - in the sense that there is
directionality on the edges, making the semantics of the edges non symmetric.
An interesting feature that real networks present is the clustering or
community structure property, under which the graph topology is organized into
modules commonly called communities or clusters. The essence here is that nodes
of the same community are highly similar while on the contrary, nodes across
communities present low similarity. Revealing the underlying community
structure of directed complex networks has become a crucial and
interdisciplinary topic with a plethora of applications. Therefore, naturally
there is a recent wealth of research production in the area of mining directed
graphs - with clustering being the primary method and tool for community
detection and evaluation. The goal of this paper is to offer an in-depth review
of the methods presented so far for clustering directed networks along with the
relevant necessary methodological background and also related applications. The
survey commences by offering a concise review of the fundamental concepts and
methodological base on which graph clustering algorithms capitalize on. Then we
present the relevant work along two orthogonal classifications. The first one
is mostly concerned with the methodological principles of the clustering
algorithms, while the second one approaches the methods from the viewpoint
regarding the properties of a good cluster in a directed network. Further, we
present methods and metrics for evaluating graph clustering results,
demonstrate interesting application domains and provide promising future
research directions.Comment: 86 pages, 17 figures. Physics Reports Journal (To Appear
EXPLOITING N-GRAM IMPORTANCE AND ADDITIONAL KNOWEDGE BASED ON WIKIPEDIA FOR IMPROVEMENTS IN GAAC BASED DOCUMENT CLUSTERING
This paper provides a solution to the issue: “How can we use Wikipedia based concepts in document\ud
clustering with lesser human involvement, accompanied by effective improvements in result?” In the\ud
devised system, we propose a method to exploit the importance of N-grams in a document and use\ud
Wikipedia based additional knowledge for GAAC based document clustering. The importance of N-grams\ud
in a document depends on several features including, but not limited to: frequency, position of their\ud
occurrence in a sentence and the position of the sentence in which they occur, in the document. First, we\ud
introduce a new similarity measure, which takes the weighted N-gram importance into account, in the\ud
calculation of similarity measure while performing document clustering. As a result, the chances of topical similarity in clustering are improved. Second, we use Wikipedia as an additional knowledge base both, to remove noisy entries from the extracted N-grams and to reduce the information gap between N-grams that are conceptually-related, which do not have a match owing to differences in writing scheme or strategies. Our experimental results on the publicly available text dataset clearly show that our devised system has a significant improvement in performance over bag-of-words based state-of-the-art systems in this area
An evaluation of the signature extension approach to large area crop inventories utilizing space image data
The author has identified the following significant results. Two examples of haze correction algorithms were tested: CROP-A and XSTAR. The CROP-A was tested in a unitemporal mode on data collected in 1973-74 over ten sample segments in Kansas. Because of the uniformly low level of haze present in these segments, no conclusion could be reached about CROP-A's ability to compensate for haze. It was noted, however, that in some cases CROP-A made serious errors which actually degraded classification performance. The haze correction algorithm XSTAR was tested in a multitemporal mode on 1975-76 LACIE sample segment data over 23 blind sites in Kansas and 18 sample segments in North Dakota, providing wide range of haze levels and other conditions for algorithm evaluation. It was found that this algorithm substantially improved signature extension classification accuracy when a sum-of-likelihoods classifier was used with an alien rejection threshold
On Geometric Alignment in Low Doubling Dimension
In real-world, many problems can be formulated as the alignment between two
geometric patterns. Previously, a great amount of research focus on the
alignment of 2D or 3D patterns, especially in the field of computer vision.
Recently, the alignment of geometric patterns in high dimension finds several
novel applications, and has attracted more and more attentions. However, the
research is still rather limited in terms of algorithms. To the best of our
knowledge, most existing approaches for high dimensional alignment are just
simple extensions of their counterparts for 2D and 3D cases, and often suffer
from the issues such as high complexities. In this paper, we propose an
effective framework to compress the high dimensional geometric patterns and
approximately preserve the alignment quality. As a consequence, existing
alignment approach can be applied to the compressed geometric patterns and thus
the time complexity is significantly reduced. Our idea is inspired by the
observation that high dimensional data often has a low intrinsic dimension. We
adopt the widely used notion "doubling dimension" to measure the extents of our
compression and the resulting approximation. Finally, we test our method on
both random and real datasets, the experimental results reveal that running the
alignment algorithm on compressed patterns can achieve similar qualities,
comparing with the results on the original patterns, but the running times
(including the times cost for compression) are substantially lower
- …