29,531 research outputs found
Improving Textual Merge Result
International audienceIn asynchronous collaborative systems, merging is an essential component. It allows to reconcile modifications made concurrently as well as managing software change through branching. The collaborative system is in charge to propose a merge result that includes user's modifications. The users now have to check and adapt this result. The adaptation should be as effort-less as possible, otherwise, the users may get frustrated and will quit the collaboration. The objective of this paper is to improve the result quality of the textual merge tool that constitutes the default merge tool of distributed version control systems. The basic idea is to study the behavior of the concurrent modifications during merge procedure. We identified when the existing merge techniques under-perform, and we propose solutions to improve the quality of the merge. We finally compare with the traditional merge tool through a large corpus of collaborative editing
Incremental Entity Resolution from Linked Documents
In many government applications we often find that information about
entities, such as persons, are available in disparate data sources such as
passports, driving licences, bank accounts, and income tax records. Similar
scenarios are commonplace in large enterprises having multiple customer,
supplier, or partner databases. Each data source maintains different aspects of
an entity, and resolving entities based on these attributes is a well-studied
problem. However, in many cases documents in one source reference those in
others; e.g., a person may provide his driving-licence number while applying
for a passport, or vice-versa. These links define relationships between
documents of the same entity (as opposed to inter-entity relationships, which
are also often used for resolution). In this paper we describe an algorithm to
cluster documents that are highly likely to belong to the same entity by
exploiting inter-document references in addition to attribute similarity. Our
technique uses a combination of iterative graph-traversal, locality-sensitive
hashing, iterative match-merge, and graph-clustering to discover unique
entities based on a document corpus. A unique feature of our technique is that
new sets of documents can be added incrementally while having to re-resolve
only a small subset of a previously resolved entity-document collection. We
present performance and quality results on two data-sets: a real-world database
of companies and a large synthetically generated `population' database. We also
demonstrate benefit of using inter-document references for clustering in the
form of enhanced recall of documents for resolution.Comment: 15 pages, 8 figures, patented wor
Text Extraction from Web Images Based on A Split-and-Merge Segmentation Method Using Color Perception
This paper describes a complete approach to the segmentation and extraction of text from Web images for subsequent recognition, to ultimately achieve both effective indexing and presentation by non-visual means (e.g., audio). The method described here (the first in the authors’ systematic approach to exploit human colour perception) enables the extraction of text in complex situations such as in the presence of varying colour (characters and background). More precisely, in addition to using structural features, the segmentation follows a split-and-merge strategy based on the Hue-Lightness- Saturation (HLS) representation of colour as a first approximation of an anthropocentric expression of the differences in chromaticity and lightness. Character-like components are then extracted as forming textlines in a number of orientations and along curves
- …