39,793 research outputs found

    A comparative evaluation of name-matching algorithms

    Full text link
    Name matching—recognizing when two different strings are likely to denote the same entity—is an important task in many legal information systems, such as case-management systems. The naming conventions peculiar to legal cases limit the effectiveness of generic approximate string-matching algorithms in this task. This paper proposes a three-stage framework for name matching, identifies how each stage in the framework addresses the naming variations that typically arise in legal cases, describes several alternative approaches to each stage, and evaluates the performance of various combinations of the alternatives on a representative collection of names drawn from a United States District Court case management system. The best tradeoff between accuracy and efficiency in this collection was achieved by algorithms that standardize capitalization, spacing, and punctuation; filter redundant terms; index using an abstraction function that is both order-insensitive and tolerant of small numbers of omissions or additions; and compare names in a symmetrical, word-by-word fashion. 1

    CHORUS Deliverable 2.1: State of the Art on Multimedia Search Engines

    Get PDF
    Based on the information provided by European projects and national initiatives related to multimedia search as well as domains experts that participated in the CHORUS Think-thanks and workshops, this document reports on the state of the art related to multimedia content search from, a technical, and socio-economic perspective. The technical perspective includes an up to date view on content based indexing and retrieval technologies, multimedia search in the context of mobile devices and peer-to-peer networks, and an overview of current evaluation and benchmark inititiatives to measure the performance of multimedia search engines. From a socio-economic perspective we inventorize the impact and legal consequences of these technical advances and point out future directions of research

    Distributed human computation framework for linked data co-reference resolution

    No full text
    Distributed Human Computation (DHC) is a technique used to solve computational problems by incorporating the collaborative effort of a large number of humans. It is also a solution to AI-complete problems such as natural language processing. The Semantic Web with its root in AI is envisioned to be a decentralised world-wide information space for sharing machine-readable data with minimal integration costs. There are many research problems in the Semantic Web that are considered as AI-complete problems. An example is co-reference resolution, which involves determining whether different URIs refer to the same entity. This is considered to be a significant hurdle to overcome in the realisation of large-scale Semantic Web applications. In this paper, we propose a framework for building a DHC system on top of the Linked Data Cloud to solve various computational problems. To demonstrate the concept, we are focusing on handling the co-reference resolution in the Semantic Web when integrating distributed datasets. The traditional way to solve this problem is to design machine-learning algorithms. However, they are often computationally expensive, error-prone and do not scale. We designed a DHC system named iamResearcher, which solves the scientific publication author identity co-reference problem when integrating distributed bibliographic datasets. In our system, we aggregated 6 million bibliographic data from various publication repositories. Users can sign up to the system to audit and align their own publications, thus solving the co-reference problem in a distributed manner. The aggregated results are published to the Linked Data Cloud

    CHORUS Deliverable 2.2: Second report - identification of multi-disciplinary key issues for gap analysis toward EU multimedia search engines roadmap

    Get PDF
    After addressing the state-of-the-art during the first year of Chorus and establishing the existing landscape in multimedia search engines, we have identified and analyzed gaps within European research effort during our second year. In this period we focused on three directions, notably technological issues, user-centred issues and use-cases and socio- economic and legal aspects. These were assessed by two central studies: firstly, a concerted vision of functional breakdown of generic multimedia search engine, and secondly, a representative use-cases descriptions with the related discussion on requirement for technological challenges. Both studies have been carried out in cooperation and consultation with the community at large through EC concertation meetings (multimedia search engines cluster), several meetings with our Think-Tank, presentations in international conferences, and surveys addressed to EU projects coordinators as well as National initiatives coordinators. Based on the obtained feedback we identified two types of gaps, namely core technological gaps that involve research challenges, and “enablers”, which are not necessarily technical research challenges, but have impact on innovation progress. New socio-economic trends are presented as well as emerging legal challenges

    Using Lessons from Health Care to Protect the Privacy of Library Users: Guidelines for the De-Identification of Library Data based on HIPAA

    Get PDF
    While libraries have employed policies to protect the data about use of their services, these policies are rarely specific or standardized. Since 1996 the U.S. healthcare system has been grappling with the Health Insurance Portability and Accountability Act (HIPAA), which is designed to provide those handling personal health information with standardized, definitive instructions as to the protection of data. In this work, the authors briefly discuss the present situation of privacy policies about library use data, outline the HIPAA guidelines to understand parallels between the two, and finally propose methods to create a de-identified library data warehouse based on HIPAA for the protection of user privacy
    • …
    corecore