38 research outputs found

    Multi-resolution community structure of text and citation networks

    No full text
    The recent availability of large datasets from online social media and other sources presents many new research avenues to diverse scientific disciplines. We focus on analyzing the text and citation (or relational) datasets that arise naturally from human online interactions and many other collections of texts. A useful first step in any analysis of such datasets is to discover subsets whose items share common characteristics. We take a network science viewpoint and propose an extension of community detection methodologies based on the maximization of modularity that have been widely used in analyzing citations. We view the text as another complex network and combine it with the citations or other available data. Since the resulting networks are of very different structural characteristics, multi-resolution modularity is used to uncover partitions where the networks share overlapping community structure. We consider also practical problems of performance and efficiency around constructing the text and citation networks from data, and compare various methods from Information Retrieval and Citation Analysis against a text and citations dataset with a known ground truth. We use the same dataset and simulations to demonstrate the effectiveness of our methodology. Finally we analyze and demonstrate the method on two datasets: one from the academic papers of the arXiv pre-prints HEP-Th section and one from a large set of posts on twitter.com related to the terrorist attack in the Charlie Hebdo offices in Paris. These datasets highlight the potential similarities or differences that emerge in the community structure of the text vs the citation networks.Open Acces
    corecore