4,430 research outputs found

    Identifying Overlapping and Hierarchical Thematic Structures in Networks of Scholarly Papers: A Comparison of Three Approaches

    Get PDF
    We implemented three recently proposed approaches to the identification of overlapping and hierarchical substructures in graphs and applied the corresponding algorithms to a network of 492 information-science papers coupled via their cited sources. The thematic substructures obtained and overlaps produced by the three hierarchical cluster algorithms were compared to a content-based categorisation, which we based on the interpretation of titles and keywords. We defined sets of papers dealing with three topics located on different levels of aggregation: h-index, webometrics, and bibliometrics. We identified these topics with branches in the dendrograms produced by the three cluster algorithms and compared the overlapping topics they detected with one another and with the three pre-defined paper sets. We discuss the advantages and drawbacks of applying the three approaches to paper networks in research fields.Comment: 18 pages, 9 figure

    Self-supervised automated wrapper generation for weblog data extraction

    Get PDF
    Data extraction from the web is notoriously hard. Of the types of resources available on the web, weblogs are becoming increasingly important due to the continued growth of the blogosphere, but remain poorly explored. Past approaches to data extraction from weblogs have often involved manual intervention and suffer from low scalability. This paper proposes a fully automated information extraction methodology based on the use of web feeds and processing of HTML. The approach includes a model for generating a wrapper that exploits web feeds for deriving a set of extraction rules automatically. Instead of performing a pairwise comparison between posts, the model matches the values of the web feeds against their corresponding HTML elements retrieved from multiple weblog posts. It adopts a probabilistic approach for deriving a set of rules and automating the process of wrapper generation. An evaluation of the model is conducted on a dataset of 2,393 posts and the results (92% accuracy) show that the proposed technique enables robust extraction of weblog properties and can be applied across the blogosphere for applications such as improved information retrieval and more robust web preservation initiatives

    Analysis of group evolution prediction in complex networks

    Full text link
    In the world, in which acceptance and the identification with social communities are highly desired, the ability to predict evolution of groups over time appears to be a vital but very complex research problem. Therefore, we propose a new, adaptable, generic and mutli-stage method for Group Evolution Prediction (GEP) in complex networks, that facilitates reasoning about the future states of the recently discovered groups. The precise GEP modularity enabled us to carry out extensive and versatile empirical studies on many real-world complex / social networks to analyze the impact of numerous setups and parameters like time window type and size, group detection method, evolution chain length, prediction models, etc. Additionally, many new predictive features reflecting the group state at a given time have been identified and tested. Some other research problems like enriching learning evolution chains with external data have been analyzed as well

    Software tools for conducting bibliometric analysis in science: An up-to-date review

    Get PDF
    Bibliometrics has become an essential tool for assessing and analyzing the output of scientists, cooperation between universities, the effect of state-owned science funding on national research and development performance and educational efficiency, among other applications. Therefore, professionals and scientists need a range of theoretical and practical tools to measure experimental data. This review aims to provide an up-to-date review of the various tools available for conducting bibliometric and scientometric analyses, including the sources of data acquisition, performance analysis and visualization tools. The included tools were divided into three categories: general bibliometric and performance analysis, science mapping analysis, and libraries; a description of all of them is provided. A comparative analysis of the database sources support, pre-processing capabilities, analysis and visualization options were also provided in order to facilitate its understanding. Although there are numerous bibliometric databases to obtain data for bibliometric and scientometric analysis, they have been developed for a different purpose. The number of exportable records is between 500 and 50,000 and the coverage of the different science fields is unequal in each database. Concerning the analyzed tools, Bibliometrix contains the more extensive set of techniques and suitable for practitioners through Biblioshiny. VOSviewer has a fantastic visualization and is capable of loading and exporting information from many sources. SciMAT is the tool with a powerful pre-processing and export capability. In views of the variability of features, the users need to decide the desired analysis output and chose the option that better fits into their aims
    • …
    corecore