85,741 research outputs found
git2net - Mining Time-Stamped Co-Editing Networks from Large git Repositories
Data from software repositories have become an important foundation for the
empirical study of software engineering processes. A recurring theme in the
repository mining literature is the inference of developer networks capturing
e.g. collaboration, coordination, or communication from the commit history of
projects. Most of the studied networks are based on the co-authorship of
software artefacts defined at the level of files, modules, or packages. While
this approach has led to insights into the social aspects of software
development, it neglects detailed information on code changes and code
ownership, e.g. which exact lines of code have been authored by which
developers, that is contained in the commit log of software projects.
Addressing this issue, we introduce git2net, a scalable python software that
facilitates the extraction of fine-grained co-editing networks in large git
repositories. It uses text mining techniques to analyse the detailed history of
textual modifications within files. This information allows us to construct
directed, weighted, and time-stamped networks, where a link signifies that one
developer has edited a block of source code originally written by another
developer. Our tool is applied in case studies of an Open Source and a
commercial software project. We argue that it opens up a massive new source of
high-resolution data on human collaboration patterns.Comment: MSR 2019, 12 pages, 10 figure
StructMatrix: large-scale visualization of graphs by means of structure detection and dense matrices
Given a large-scale graph with millions of nodes and edges, how to reveal
macro patterns of interest, like cliques, bi-partite cores, stars, and chains?
Furthermore, how to visualize such patterns altogether getting insights from
the graph to support wise decision-making? Although there are many algorithmic
and visual techniques to analyze graphs, none of the existing approaches is
able to present the structural information of graphs at large-scale. Hence,
this paper describes StructMatrix, a methodology aimed at high-scalable visual
inspection of graph structures with the goal of revealing macro patterns of
interest. StructMatrix combines algorithmic structure detection and adjacency
matrix visualization to present cardinality, distribution, and relationship
features of the structures found in a given graph. We performed experiments in
real, large-scale graphs with up to one million nodes and millions of edges.
StructMatrix revealed that graphs of high relevance (e.g., Web, Wikipedia and
DBLP) have characterizations that reflect the nature of their corresponding
domains; our findings have not been seen in the literature so far. We expect
that our technique will bring deeper insights into large graph mining,
leveraging their use for decision making.Comment: To appear: 8 pages, paper to be published at the Fifth IEEE ICDM
Workshop on Data Mining in Networks, 2015 as Hugo Gualdron, Robson Cordeiro,
Jose Rodrigues (2015) StructMatrix: Large-scale visualization of graphs by
means of structure detection and dense matrices In: The Fifth IEEE ICDM
Workshop on Data Mining in Networks 1--8, IEE
Intelligent Management and Efficient Operation of Big Data
This chapter details how Big Data can be used and implemented in networking
and computing infrastructures. Specifically, it addresses three main aspects:
the timely extraction of relevant knowledge from heterogeneous, and very often
unstructured large data sources, the enhancement on the performance of
processing and networking (cloud) infrastructures that are the most important
foundational pillars of Big Data applications or services, and novel ways to
efficiently manage network infrastructures with high-level composed policies
for supporting the transmission of large amounts of data with distinct
requisites (video vs. non-video). A case study involving an intelligent
management solution to route data traffic with diverse requirements in a wide
area Internet Exchange Point is presented, discussed in the context of Big
Data, and evaluated.Comment: In book Handbook of Research on Trends and Future Directions in Big
Data and Web Intelligence, IGI Global, 201
Recommended from our members
Detecting Important Life Events on Twitter Using Frequent Semantic and Syntactic Subgraphs
Identifying global events from social media has been the focus of much research in recent years. However, the identification of personal life events poses new requirements and challenges that have received relatively little research attention. In this paper we explore a new approach for life event identification, where we expand social media posts into both semantic, and syntactic networks of content. Frequent graph patterns are mined from these networks and used as features to enrich life-event classifiers. Results show that our approach significantly outperforms the best performing baseline in accuracy (by 4.48% points) and F-measure (by 4.54% points) when used to identify five major life events identified from the psychology literature: Getting Married, Having Children, Death of a Parent, Starting School, and Falling in Love. In addition, our results show that, while semantic graphs are effective at discriminating the theme of the post (e.g. the topic of marriage), syntactic graphs help identify whether the post describes a personal event (e.g. someone getting married)
- …