258,870 research outputs found
A comparison of personal name matching: Techniques and practical issues
Finding and matching personal names is at the core of an increasing number of applications: from text and Web mining, information retrieval and extraction, search engines, to deduplication and data linkage systems. Variations and errors in names make exact string matching problematic, and approximate matching techniques based on phonetic encoding or pattern matching have to be applied. When compared to general text, however, personal names have different characteristics that need to be considered.
¶
In this paper we discuss the characteristics of personal names and present potential sources of variations and errors. We overview a comprehensive number of commonly used, as well as some recently developed name matching techniques. Experimental comparisons on four large name data sets indicate that there is no clear best technique. We provide a series of recommendations that will help researchers and practitioners to select a name matching technique suitable for a given data set
Searching by approximate personal-name matching
We discuss the design, building and evaluation of a method to access theinformation of a person, using his name as a search key, even if it has deformations. We present a similarity function, the DEA function, based
on the probabilities of the edit operations accordingly to the involved
letters and their position, and using a variable threshold. The efficacy
of DEA is quantitatively evaluated, without human relevance judgments,
very superior to the efficacy of known methods. A very efficient
approximate search technique for the DEA function is also presented
based on a compacted trie-tree structure.Postprint (published version
Recommended from our members
Lexical patterns, features and knowledge resources for coreference resolution in clinical notes
Generation of entity coreference chains provides a means to extract linked narrative events from clinical notes, but despite being a well-researched topic in natural language processing, general- purpose coreference tools perform poorly on clinical texts. This paper presents a knowledge-centric and pattern-based approach to resolving coreference across a wide variety of clinical records comprising discharge summaries, progress notes, pathology, radiology and surgical reports from two corpora (Ontology Development and Information Extraction (ODIE) and i2b2/VA). In addition, a method for generating coreference chains using progressively pruned linked lists is demonstrated that reduces the search space and facilitates evaluation by a number of metrics. Independent evaluation results show an F-measure for each corpus of 79.2% and 87.5%, respectively, which offers performance at least as good as human annotators, greatly increased performance over general- purpose tools, and improvement on previously reported clinical coreference systems. The system uses a number of open-source components that are available to download
When Things Matter: A Data-Centric View of the Internet of Things
With the recent advances in radio-frequency identification (RFID), low-cost
wireless sensor devices, and Web technologies, the Internet of Things (IoT)
approach has gained momentum in connecting everyday objects to the Internet and
facilitating machine-to-human and machine-to-machine communication with the
physical world. While IoT offers the capability to connect and integrate both
digital and physical entities, enabling a whole new class of applications and
services, several significant challenges need to be addressed before these
applications and services can be fully realized. A fundamental challenge
centers around managing IoT data, typically produced in dynamic and volatile
environments, which is not only extremely large in scale and volume, but also
noisy, and continuous. This article surveys the main techniques and
state-of-the-art research efforts in IoT from data-centric perspectives,
including data stream processing, data storage models, complex event
processing, and searching in IoT. Open research issues for IoT data management
are also discussed
First Author Advantage: Citation Labeling in Research
Citations among research papers, and the networks they form, are the primary
object of study in scientometrics. The act of making a citation reflects the
citer's knowledge of the related literature, and of the work being cited. We
aim to gain insight into this process by studying citation keys: user-chosen
labels to identify a cited work. Our main observation is that the first listed
author is disproportionately represented in such labels, implying a strong
mental bias towards the first author.Comment: Computational Scientometrics: Theory and Applications at The 22nd
CIKM 201
Personalised correction, feedback, and guidance in an automated tutoring system for skills training
In addition to knowledge, in various domains skills are equally important. Active learning and training are effective forms of education. We present an automated skills training system for a database programming environment that promotes procedural knowledge acquisition
and skills training. The system provides support features such as correction of solutions, feedback and personalised guidance, similar to interactions with a human tutor. Specifically, we address synchronous feedback and guidance based on personalised assessment. Each of these features is automated and includes a level of personalisation and adaptation. At the core of the system is a pattern-based error classification and correction component that analyses
student input
- âŠ