Search CORE

27 research outputs found

Survey of Document Clustering Approach for Real World Objects (Documents)

Author: Sandeep Kumar, Associate Prof. Sanjay Pandey
Publication venue: 'Auricle Technologies, Pvt., Ltd.'
Publication date: 31/08/2015
Field of study

Since the amount of text data stored in computer repositories is growing every day, we need more than ever a reliable way to assemble or classify text documents. Clustering can provide a means of introducing some form of organization to the data, which can also serve to highlight significant patterns and trends. Document clustering is used in many fields such as data mining and information retrieval. This thesis presents the results of an experimental study of some common document clustering techniques. In particular, we compare the two main approaches of document clustering, agglomerative hierarchical clustering BIRCH and Partitional clustering algorithm K-means. As a result of comparing both algorithms we attempt to establish appropriate clustering technique to generate qualitative clustering of real world document. DOI: 10.17762/ijritcc2321-8169.15080

International Journal on Recent and Innovation Trends in Computing and Communication

On the Role of Social Identity and Cohesion in Characterizing Online Social Communities

Author: Fuhry David
Parthasarathy Srinivasan
Purohit Hemant
Ruan Yiye
Sheth Amit
Publication venue
Publication date: 01/01/2012
Field of study

Two prevailing theories for explaining social group or community structure are cohesion and identity. The social cohesion approach posits that social groups arise out of an aggregation of individuals that have mutual interpersonal attraction as they share common characteristics. These characteristics can range from common interests to kinship ties and from social values to ethnic backgrounds. In contrast, the social identity approach posits that an individual is likely to join a group based on an intrinsic self-evaluation at a cognitive or perceptual level. In other words group members typically share an awareness of a common category membership. In this work we seek to understand the role of these two contrasting theories in explaining the behavior and stability of social communities in Twitter. A specific focal point of our work is to understand the role of these theories in disparate contexts ranging from disaster response to socio-political activism. We extract social identity and social cohesion features-of-interest for large scale datasets of five real-world events and examine the effectiveness of such features in capturing behavioral characteristics and the stability of groups. We also propose a novel measure of social group sustainability based on the divergence in group discussion. Our main findings are: 1) Sharing of social identities (especially physical location) among group members has a positive impact on group sustainability, 2) Structural cohesion (represented by high group density and low average shortest path length) is a strong indicator of group sustainability, and 3) Event characteristics play a role in shaping group sustainability, as social groups in transient events behave differently from groups in events that last longer

arXiv.org e-Print Archive

Scholar Commons - Institutional Repository of the University of South Carolina

CORE

Lossless digraph signal processing via polar decomposition

Author: Ji Feng
Publication venue
Publication date: 29/12/2023
Field of study

In this paper, we present a signal processing framework for directed graphs. Unlike undirected graphs, a graph shift operator such as the adjacency matrix associated with a directed graph usually does not admit an orthogonal eigenbasis. This makes it challenging to define the Fourier transform. Our methodology leverages the polar decomposition to define two distinct eigendecompositions, each associated with different matrices derived from this decomposition. We propose to extend the frequency domain and introduce a Fourier transform that jointly encodes the spectral response of a signal for the two eigenbases from the polar decomposition. This allows us to define convolution following a standard routine. Our approach has two features: it is lossless as the shift operator can be fully recovered from factors of the polar decomposition. Moreover, it subsumes the traditional graph signal processing if the graph is directed. We present numerical results to show how the framework can be applied

arXiv.org e-Print Archive

Detecting anomalies in heterogeneous population-scale VAT networks

Author: Alexopoulos Angelos
Dellaportas Petros
Gyoshev Stanley
Kotsogiannis Christos
Olhede Sofia C.
Pavkov Trifon
Publication venue
Publication date: 26/06/2021
Field of study

Anomaly detection in network science is the method to determine aberrant edges, nodes, subgraphs or other network events. Heterogeneous networks typically contain information going beyond the observed network itself. Value Added Tax (VAT, a tax on goods and services) networks, defined from pairwise interactions of VAT registered taxpayers, are analysed at a population-scale requiring scalable algorithms. By adopting a quantitative understanding of the nature of VAT-anomalies, we define a method that identifies them utilising information from micro-scale, meso-scale and global-scale patterns that can be interpreted, and efficiently implemented, as population-scale network analysis. The proposed method is automatable, and implementable in real time, enabling revenue authorities to prevent large losses of tax revenues through performing early identification of fraud within the VAT system.Comment: 14 pages, 5 figures, 3 table

arXiv.org e-Print Archive

Towards Specificationless Monitoring of Provenance-Emitting Systems

Author: Stoffers Martin
Weinert Alexander
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2022
Field of study

Monitoring often requires insight into the monitored system as well as concrete specifications of expected behavior. More and more systems, however, provide information about their inner procedures by emitting provenance information in a W3C-standardized graph format. In this work, we present an approach to monitor such provenance data for anomalous behavior by performing spectral graph analysis on slices of the constructed provenance graph and by comparing the characteristics of each slice with those of a sliding window over recently seen slices. We argue that this approach not only simplifies the monitoring of heterogeneous distributed systems, but also enables applying a host of well-studied techniques to monitor such systems

Institute of Transport Research:Publications