19,738 research outputs found
A Measure for Data Set Editing by Ordered Projections
In this paper we study a measure, named weakness of an
example, which allows us to establish the importance of an example to
find representative patterns for the data set editing problem. Our ap proach consists in reducing the database size without losing information,
using algorithm patterns by ordered projections. The idea is to relax the
reduction factor with a new parameter, λ, removing all examples of the
database whose weakness verify a condition over this λ. We study how
to establish this new parameter. Our experiments have been carried out
using all databases from UCI-Repository and they show that is possible
a size reduction in complex databases without notoriously increase of the
error rate
git2net - Mining Time-Stamped Co-Editing Networks from Large git Repositories
Data from software repositories have become an important foundation for the
empirical study of software engineering processes. A recurring theme in the
repository mining literature is the inference of developer networks capturing
e.g. collaboration, coordination, or communication from the commit history of
projects. Most of the studied networks are based on the co-authorship of
software artefacts defined at the level of files, modules, or packages. While
this approach has led to insights into the social aspects of software
development, it neglects detailed information on code changes and code
ownership, e.g. which exact lines of code have been authored by which
developers, that is contained in the commit log of software projects.
Addressing this issue, we introduce git2net, a scalable python software that
facilitates the extraction of fine-grained co-editing networks in large git
repositories. It uses text mining techniques to analyse the detailed history of
textual modifications within files. This information allows us to construct
directed, weighted, and time-stamped networks, where a link signifies that one
developer has edited a block of source code originally written by another
developer. Our tool is applied in case studies of an Open Source and a
commercial software project. We argue that it opens up a massive new source of
high-resolution data on human collaboration patterns.Comment: MSR 2019, 12 pages, 10 figure
Databases Reduction Simultaneously by Ordered Projection
In this paper, a new algorithm Database Reduction Simulta neously by Ordered Projections (RESOP) is introduced. This algorithm
reduces databases in two directions: editing examples and feature se lection simultaneously. Ordered projections techniques have been used
to design RESOP taking advantage of symmetrical ideas for two dif ferent task. Experimental results have been made with UCI Repository
databases and the performance for the latter application of classification
techniques has been satisfactor
Mapping bilateral information interests using the activity of Wikipedia editors
We live in a global village where electronic communication has eliminated the
geographical barriers of information exchange. The road is now open to
worldwide convergence of information interests, shared values, and
understanding. Nevertheless, interests still vary between countries around the
world. This raises important questions about what today's world map of in-
formation interests actually looks like and what factors cause the barriers of
information exchange between countries. To quantitatively construct a world map
of information interests, we devise a scalable statistical model that
identifies countries with similar information interests and measures the
countries' bilateral similarities. From the similarities we connect countries
in a global network and find that countries can be mapped into 18 clusters with
similar information interests. Through regression we find that language and
religion best explain the strength of the bilateral ties and formation of
clusters. Our findings provide a quantitative basis for further studies to
better understand the complex interplay between shared interests and conflict
on a global scale. The methodology can also be extended to track changes over
time and capture important trends in global information exchange.Comment: 11 pages, 3 figures in Palgrave Communications 1 (2015
On Recursive Edit Distance Kernels with Application to Time Series Classification
This paper proposes some extensions to the work on kernels dedicated to
string or time series global alignment based on the aggregation of scores
obtained by local alignments. The extensions we propose allow to construct,
from classical recursive definition of elastic distances, recursive edit
distance (or time-warp) kernels that are positive definite if some sufficient
conditions are satisfied. The sufficient conditions we end-up with are original
and weaker than those proposed in earlier works, although a recursive
regularizing term is required to get the proof of the positive definiteness as
a direct consequence of the Haussler's convolution theorem. The classification
experiment we conducted on three classical time warp distances (two of which
being metrics), using Support Vector Machine classifier, leads to conclude
that, when the pairwise distance matrix obtained from the training data is
\textit{far} from definiteness, the positive definite recursive elastic kernels
outperform in general the distance substituting kernels for the classical
elastic distances we have tested.Comment: 14 page
Fast Hadamard transforms for compressive sensing of joint systems: measurement of a 3.2 million-dimensional bi-photon probability distribution
We demonstrate how to efficiently implement extremely high-dimensional
compressive imaging of a bi-photon probability distribution. Our method uses
fast-Hadamard-transform Kronecker-based compressive sensing to acquire the
joint space distribution. We list, in detail, the operations necessary to
enable fast-transform-based matrix-vector operations in the joint space to
reconstruct a 16.8 million-dimensional image in less than 10 minutes. Within a
subspace of that image exists a 3.2 million-dimensional bi-photon probability
distribution. In addition, we demonstrate how the marginal distributions can
aid in the accuracy of joint space distribution reconstructions
Text Line Segmentation of Historical Documents: a Survey
There is a huge amount of historical documents in libraries and in various
National Archives that have not been exploited electronically. Although
automatic reading of complete pages remains, in most cases, a long-term
objective, tasks such as word spotting, text/image alignment, authentication
and extraction of specific fields are in use today. For all these tasks, a
major step is document segmentation into text lines. Because of the low quality
and the complexity of these documents (background noise, artifacts due to
aging, interfering lines),automatic text line segmentation remains an open
research field. The objective of this paper is to present a survey of existing
methods, developed during the last decade, and dedicated to documents of
historical interest.Comment: 25 pages, submitted version, To appear in International Journal on
Document Analysis and Recognition, On line version available at
http://www.springerlink.com/content/k2813176280456k3
Ontology mapping: the state of the art
Ontology mapping is seen as a solution provider in today's landscape of ontology research. As the number of ontologies that are made publicly available and accessible on the Web increases steadily, so does the need for applications to use them. A single ontology is no longer enough to support the tasks envisaged by a distributed environment like the Semantic Web. Multiple ontologies need to be accessed from several applications. Mapping could provide a common layer from which several ontologies could be accessed and hence could exchange information in semantically sound manners. Developing such mapping has beeb the focus of a variety of works originating from diverse communities over a number of years. In this article we comprehensively review and present these works. We also provide insights on the pragmatics of ontology mapping and elaborate on a theoretical approach for defining ontology mapping
- …