19,738 research outputs found

    A Measure for Data Set Editing by Ordered Projections

    Get PDF
    In this paper we study a measure, named weakness of an example, which allows us to establish the importance of an example to find representative patterns for the data set editing problem. Our ap proach consists in reducing the database size without losing information, using algorithm patterns by ordered projections. The idea is to relax the reduction factor with a new parameter, λ, removing all examples of the database whose weakness verify a condition over this λ. We study how to establish this new parameter. Our experiments have been carried out using all databases from UCI-Repository and they show that is possible a size reduction in complex databases without notoriously increase of the error rate

    git2net - Mining Time-Stamped Co-Editing Networks from Large git Repositories

    Full text link
    Data from software repositories have become an important foundation for the empirical study of software engineering processes. A recurring theme in the repository mining literature is the inference of developer networks capturing e.g. collaboration, coordination, or communication from the commit history of projects. Most of the studied networks are based on the co-authorship of software artefacts defined at the level of files, modules, or packages. While this approach has led to insights into the social aspects of software development, it neglects detailed information on code changes and code ownership, e.g. which exact lines of code have been authored by which developers, that is contained in the commit log of software projects. Addressing this issue, we introduce git2net, a scalable python software that facilitates the extraction of fine-grained co-editing networks in large git repositories. It uses text mining techniques to analyse the detailed history of textual modifications within files. This information allows us to construct directed, weighted, and time-stamped networks, where a link signifies that one developer has edited a block of source code originally written by another developer. Our tool is applied in case studies of an Open Source and a commercial software project. We argue that it opens up a massive new source of high-resolution data on human collaboration patterns.Comment: MSR 2019, 12 pages, 10 figure

    Databases Reduction Simultaneously by Ordered Projection

    Get PDF
    In this paper, a new algorithm Database Reduction Simulta neously by Ordered Projections (RESOP) is introduced. This algorithm reduces databases in two directions: editing examples and feature se lection simultaneously. Ordered projections techniques have been used to design RESOP taking advantage of symmetrical ideas for two dif ferent task. Experimental results have been made with UCI Repository databases and the performance for the latter application of classification techniques has been satisfactor

    Mapping bilateral information interests using the activity of Wikipedia editors

    Full text link
    We live in a global village where electronic communication has eliminated the geographical barriers of information exchange. The road is now open to worldwide convergence of information interests, shared values, and understanding. Nevertheless, interests still vary between countries around the world. This raises important questions about what today's world map of in- formation interests actually looks like and what factors cause the barriers of information exchange between countries. To quantitatively construct a world map of information interests, we devise a scalable statistical model that identifies countries with similar information interests and measures the countries' bilateral similarities. From the similarities we connect countries in a global network and find that countries can be mapped into 18 clusters with similar information interests. Through regression we find that language and religion best explain the strength of the bilateral ties and formation of clusters. Our findings provide a quantitative basis for further studies to better understand the complex interplay between shared interests and conflict on a global scale. The methodology can also be extended to track changes over time and capture important trends in global information exchange.Comment: 11 pages, 3 figures in Palgrave Communications 1 (2015

    On Recursive Edit Distance Kernels with Application to Time Series Classification

    Get PDF
    This paper proposes some extensions to the work on kernels dedicated to string or time series global alignment based on the aggregation of scores obtained by local alignments. The extensions we propose allow to construct, from classical recursive definition of elastic distances, recursive edit distance (or time-warp) kernels that are positive definite if some sufficient conditions are satisfied. The sufficient conditions we end-up with are original and weaker than those proposed in earlier works, although a recursive regularizing term is required to get the proof of the positive definiteness as a direct consequence of the Haussler's convolution theorem. The classification experiment we conducted on three classical time warp distances (two of which being metrics), using Support Vector Machine classifier, leads to conclude that, when the pairwise distance matrix obtained from the training data is \textit{far} from definiteness, the positive definite recursive elastic kernels outperform in general the distance substituting kernels for the classical elastic distances we have tested.Comment: 14 page

    Fast Hadamard transforms for compressive sensing of joint systems: measurement of a 3.2 million-dimensional bi-photon probability distribution

    Get PDF
    We demonstrate how to efficiently implement extremely high-dimensional compressive imaging of a bi-photon probability distribution. Our method uses fast-Hadamard-transform Kronecker-based compressive sensing to acquire the joint space distribution. We list, in detail, the operations necessary to enable fast-transform-based matrix-vector operations in the joint space to reconstruct a 16.8 million-dimensional image in less than 10 minutes. Within a subspace of that image exists a 3.2 million-dimensional bi-photon probability distribution. In addition, we demonstrate how the marginal distributions can aid in the accuracy of joint space distribution reconstructions

    Text Line Segmentation of Historical Documents: a Survey

    Full text link
    There is a huge amount of historical documents in libraries and in various National Archives that have not been exploited electronically. Although automatic reading of complete pages remains, in most cases, a long-term objective, tasks such as word spotting, text/image alignment, authentication and extraction of specific fields are in use today. For all these tasks, a major step is document segmentation into text lines. Because of the low quality and the complexity of these documents (background noise, artifacts due to aging, interfering lines),automatic text line segmentation remains an open research field. The objective of this paper is to present a survey of existing methods, developed during the last decade, and dedicated to documents of historical interest.Comment: 25 pages, submitted version, To appear in International Journal on Document Analysis and Recognition, On line version available at http://www.springerlink.com/content/k2813176280456k3

    Ontology mapping: the state of the art

    No full text
    Ontology mapping is seen as a solution provider in today's landscape of ontology research. As the number of ontologies that are made publicly available and accessible on the Web increases steadily, so does the need for applications to use them. A single ontology is no longer enough to support the tasks envisaged by a distributed environment like the Semantic Web. Multiple ontologies need to be accessed from several applications. Mapping could provide a common layer from which several ontologies could be accessed and hence could exchange information in semantically sound manners. Developing such mapping has beeb the focus of a variety of works originating from diverse communities over a number of years. In this article we comprehensively review and present these works. We also provide insights on the pragmatics of ontology mapping and elaborate on a theoretical approach for defining ontology mapping
    corecore