4,430 research outputs found
Identifying Overlapping and Hierarchical Thematic Structures in Networks of Scholarly Papers: A Comparison of Three Approaches
We implemented three recently proposed approaches to the identification of
overlapping and hierarchical substructures in graphs and applied the
corresponding algorithms to a network of 492 information-science papers coupled
via their cited sources. The thematic substructures obtained and overlaps
produced by the three hierarchical cluster algorithms were compared to a
content-based categorisation, which we based on the interpretation of titles
and keywords. We defined sets of papers dealing with three topics located on
different levels of aggregation: h-index, webometrics, and bibliometrics. We
identified these topics with branches in the dendrograms produced by the three
cluster algorithms and compared the overlapping topics they detected with one
another and with the three pre-defined paper sets. We discuss the advantages
and drawbacks of applying the three approaches to paper networks in research
fields.Comment: 18 pages, 9 figure
Self-supervised automated wrapper generation for weblog data extraction
Data extraction from the web is notoriously hard. Of the types of resources available on the web, weblogs are becoming increasingly important due to the continued growth of the blogosphere, but remain poorly explored. Past approaches to data extraction from weblogs have often involved manual intervention and suffer from low scalability. This paper proposes a fully automated information extraction methodology based on the use of web feeds and processing of HTML. The approach includes a model for generating a wrapper that exploits web feeds for deriving a set of extraction rules automatically. Instead of performing a pairwise comparison between posts, the model matches the values of the web feeds against their corresponding HTML elements retrieved from multiple weblog posts. It adopts a probabilistic approach for deriving a set of rules and automating the process of wrapper generation. An evaluation of the model is conducted on a dataset of 2,393 posts and the results (92% accuracy) show that the proposed technique enables robust extraction of weblog properties and can be applied across the blogosphere for applications such as improved information retrieval and more robust web preservation initiatives
Analysis of group evolution prediction in complex networks
In the world, in which acceptance and the identification with social
communities are highly desired, the ability to predict evolution of groups over
time appears to be a vital but very complex research problem. Therefore, we
propose a new, adaptable, generic and mutli-stage method for Group Evolution
Prediction (GEP) in complex networks, that facilitates reasoning about the
future states of the recently discovered groups. The precise GEP modularity
enabled us to carry out extensive and versatile empirical studies on many
real-world complex / social networks to analyze the impact of numerous setups
and parameters like time window type and size, group detection method,
evolution chain length, prediction models, etc. Additionally, many new
predictive features reflecting the group state at a given time have been
identified and tested. Some other research problems like enriching learning
evolution chains with external data have been analyzed as well
Software tools for conducting bibliometric analysis in science: An up-to-date review
Bibliometrics has become an essential tool for assessing and analyzing the output of scientists, cooperation between
universities, the effect of state-owned science funding on national research and development performance and educational
efficiency, among other applications. Therefore, professionals and scientists need a range of theoretical and practical
tools to measure experimental data. This review aims to provide an up-to-date review of the various tools available
for conducting bibliometric and scientometric analyses, including the sources of data acquisition, performance analysis
and visualization tools. The included tools were divided into three categories: general bibliometric and performance
analysis, science mapping analysis, and libraries; a description of all of them is provided. A comparative analysis of the
database sources support, pre-processing capabilities, analysis and visualization options were also provided in order to
facilitate its understanding. Although there are numerous bibliometric databases to obtain data for bibliometric and
scientometric analysis, they have been developed for a different purpose. The number of exportable records is between
500 and 50,000 and the coverage of the different science fields is unequal in each database. Concerning the analyzed
tools, Bibliometrix contains the more extensive set of techniques and suitable for practitioners through Biblioshiny.
VOSviewer has a fantastic visualization and is capable of loading and exporting information from many sources. SciMAT
is the tool with a powerful pre-processing and export capability. In views of the variability of features, the users need to
decide the desired analysis output and chose the option that better fits into their aims
- …