14,601 research outputs found
Enhanced Integrated Scoring for Cleaning Dirty Texts
An increasing number of approaches for ontology engineering from text are
gearing towards the use of online sources such as company intranet and the
World Wide Web. Despite such rise, not much work can be found in aspects of
preprocessing and cleaning dirty texts from online sources. This paper presents
an enhancement of an Integrated Scoring for Spelling error correction,
Abbreviation expansion and Case restoration (ISSAC). ISSAC is implemented as
part of a text preprocessing phase in an ontology engineering system. New
evaluations performed on the enhanced ISSAC using 700 chat records reveal an
improved accuracy of 98% as compared to 96.5% and 71% based on the use of only
basic ISSAC and of Aspell, respectively.Comment: More information is available at
http://explorer.csse.uwa.edu.au/reference
On palimpsests in neural memory: an information theory viewpoint
The finite capacity of neural memory and the
reconsolidation phenomenon suggest it is important to be able
to update stored information as in a palimpsest, where new
information overwrites old information. Moreover, changing
information in memory is metabolically costly. In this paper, we
suggest that information-theoretic approaches may inform the
fundamental limits in constructing such a memory system. In
particular, we define malleable coding, that considers not only
representation length but also ease of representation update,
thereby encouraging some form of recycling to convert an old
codeword into a new one. Malleability cost is the difficulty of
synchronizing compressed versions, and malleable codes are of
particular interest when representing information and modifying
the representation are both expensive. We examine the tradeoff
between compression efficiency and malleability cost, under a
malleability metric defined with respect to a string edit distance.
This introduces a metric topology to the compressed domain. We
characterize the exact set of achievable rates and malleability as
the solution of a subgraph isomorphism problem. This is all done
within the optimization approach to biology framework.Accepted manuscrip
A Survey on Metric Learning for Feature Vectors and Structured Data
The need for appropriate ways to measure the distance or similarity between
data is ubiquitous in machine learning, pattern recognition and data mining,
but handcrafting such good metrics for specific problems is generally
difficult. This has led to the emergence of metric learning, which aims at
automatically learning a metric from data and has attracted a lot of interest
in machine learning and related fields for the past ten years. This survey
paper proposes a systematic review of the metric learning literature,
highlighting the pros and cons of each approach. We pay particular attention to
Mahalanobis distance metric learning, a well-studied and successful framework,
but additionally present a wide range of methods that have recently emerged as
powerful alternatives, including nonlinear metric learning, similarity learning
and local metric learning. Recent trends and extensions, such as
semi-supervised metric learning, metric learning for histogram data and the
derivation of generalization guarantees, are also covered. Finally, this survey
addresses metric learning for structured data, in particular edit distance
learning, and attempts to give an overview of the remaining challenges in
metric learning for the years to come.Comment: Technical report, 59 pages. Changes in v2: fixed typos and improved
presentation. Changes in v3: fixed typos. Changes in v4: fixed typos and new
method
- …