142,240 research outputs found
Recommended from our members
A new approach to ontology-based semantic modelling for opinion mining
With the fast growth of World Wide Web 2.0, a great number of opinions about a variety of products have been published in blogs, forums, and social networks. Opinion mining tools are needed to enable users to efficiently process a large number of reviews found online, in order to determine the underlying opinions. This paper presents a new methodology for semantic modelling of the domain knowledge for opinion mining. In particular, the new methodology focuses on modelling the domain knowledge in such a way that it can be translated to a formal ontology, which can then be automatically enriched with ground facts obtained from public Linked Open Data resources. The methodology also considers procedures to link between the formal ontology and Natural Language Processing. Our approach successfully enriches the ontology with the relevant ground facts. This ontology can then be used to perform a variety of data mining tasks including sentiment analysis and information retrieval
Node similarity as a basic principle behind connectivity in complex networks
How are people linked in a highly connected society? Since in many networks a
power-law (scale-free) node-degree distribution can be observed, power-law
might be seen as a universal characteristics of networks. But this study of
communication in the Flickr social online network reveals that power-law
node-degree distributions are restricted to only sparsely connected networks.
More densely connected networks, by contrast, show an increasing divergence
from power-law. This work shows that this observation is consistent with the
classic idea from social sciences that similarity is the driving factor behind
communication in social networks. The strong relation between communication
strength and node similarity could be confirmed by analyzing the Flickr
network. It also is shown that node similarity as a network formation model can
reproduce the characteristics of different network densities and hence can be
used as a model for describing the topological transition from weakly to
strongly connected societies.Comment: 6 pages in Journal of Data Mining & Digital Humanities (2015)
jdmdh:3
git2net - Mining Time-Stamped Co-Editing Networks from Large git Repositories
Data from software repositories have become an important foundation for the
empirical study of software engineering processes. A recurring theme in the
repository mining literature is the inference of developer networks capturing
e.g. collaboration, coordination, or communication from the commit history of
projects. Most of the studied networks are based on the co-authorship of
software artefacts defined at the level of files, modules, or packages. While
this approach has led to insights into the social aspects of software
development, it neglects detailed information on code changes and code
ownership, e.g. which exact lines of code have been authored by which
developers, that is contained in the commit log of software projects.
Addressing this issue, we introduce git2net, a scalable python software that
facilitates the extraction of fine-grained co-editing networks in large git
repositories. It uses text mining techniques to analyse the detailed history of
textual modifications within files. This information allows us to construct
directed, weighted, and time-stamped networks, where a link signifies that one
developer has edited a block of source code originally written by another
developer. Our tool is applied in case studies of an Open Source and a
commercial software project. We argue that it opens up a massive new source of
high-resolution data on human collaboration patterns.Comment: MSR 2019, 12 pages, 10 figure
Recommended from our members
Transnational Activism in Support of National Protest: Questions of Identity and Organization
This article considers the question of whether transnational activism supporting national protest attains a cohesive collective identity on social media whilst organizationally remaining localized. It examines a corpus of social media data collected in the course of two months of rolling protests in 2013 against the largest proposed open-cast gold mine at Roşia Montană, Romania, which echoed among Romanian expatriates. A network text analysis of the data supplemented with interview findings revealed concerns with protest logistics as common across the transnational networks of protest localities on both Facebook and Twitter, a finding that testified to the coordinated character of the protests. On the other hand, collective identity emerged as the fruit of attempts to surmount localized protest experiences of geographically disparate but civically-minded social media users
When Things Matter: A Data-Centric View of the Internet of Things
With the recent advances in radio-frequency identification (RFID), low-cost
wireless sensor devices, and Web technologies, the Internet of Things (IoT)
approach has gained momentum in connecting everyday objects to the Internet and
facilitating machine-to-human and machine-to-machine communication with the
physical world. While IoT offers the capability to connect and integrate both
digital and physical entities, enabling a whole new class of applications and
services, several significant challenges need to be addressed before these
applications and services can be fully realized. A fundamental challenge
centers around managing IoT data, typically produced in dynamic and volatile
environments, which is not only extremely large in scale and volume, but also
noisy, and continuous. This article surveys the main techniques and
state-of-the-art research efforts in IoT from data-centric perspectives,
including data stream processing, data storage models, complex event
processing, and searching in IoT. Open research issues for IoT data management
are also discussed
Mining Threat Intelligence about Open-Source Projects and Libraries from Code Repository Issues and Bug Reports
Open-Source Projects and Libraries are being used in software development
while also bearing multiple security vulnerabilities. This use of third party
ecosystem creates a new kind of attack surface for a product in development. An
intelligent attacker can attack a product by exploiting one of the
vulnerabilities present in linked projects and libraries.
In this paper, we mine threat intelligence about open source projects and
libraries from bugs and issues reported on public code repositories. We also
track library and project dependencies for installed software on a client
machine. We represent and store this threat intelligence, along with the
software dependencies in a security knowledge graph. Security analysts and
developers can then query and receive alerts from the knowledge graph if any
threat intelligence is found about linked libraries and projects, utilized in
their products
- …