181,774 research outputs found
An Efficient Distribution of Labor in a Two Stage Robust Interpretation Process
Although Minimum Distance Parsing (MDP) offers a theoretically attractive
solution to the problem of extragrammaticality, it is often computationally
infeasible in large scale practical applications. In this paper we present an
alternative approach where the labor is distributed between a more restrictive
partial parser and a repair module. Though two stage approaches have grown in
popularity in recent years because of their efficiency, they have done so at
the cost of requiring hand coded repair heuristics. In contrast, our two stage
approach does not require any hand coded knowledge sources dedicated to repair,
thus making it possible to achieve a similar run time advantage over MDP
without losing the quality of domain independence.Comment: 9 pages, 1 Postscript figure, uses aclap.sty and psfig.tex, In
Proceedings of EMNLP 199
WordRank: Learning Word Embeddings via Robust Ranking
Embedding words in a vector space has gained a lot of attention in recent
years. While state-of-the-art methods provide efficient computation of word
similarities via a low-dimensional matrix embedding, their motivation is often
left unclear. In this paper, we argue that word embedding can be naturally
viewed as a ranking problem due to the ranking nature of the evaluation
metrics. Then, based on this insight, we propose a novel framework WordRank
that efficiently estimates word representations via robust ranking, in which
the attention mechanism and robustness to noise are readily achieved via the
DCG-like ranking losses. The performance of WordRank is measured in word
similarity and word analogy benchmarks, and the results are compared to the
state-of-the-art word embedding techniques. Our algorithm is very competitive
to the state-of-the- arts on large corpora, while outperforms them by a
significant margin when the training set is limited (i.e., sparse and noisy).
With 17 million tokens, WordRank performs almost as well as existing methods
using 7.2 billion tokens on a popular word similarity benchmark. Our multi-node
distributed implementation of WordRank is publicly available for general usage.Comment: Conference on Empirical Methods in Natural Language Processing
(EMNLP), November 1-5, 2016, Austin, Texas, US
A robust high-sensitivity algorithm for automated detection of proteins in two-dimensional electrophoresis gels
The automated interpretation of two-dimensional gel electrophoresis images used in protein separation and analysis presents a formidable problem in the detection and characterization of ill-defined spatial objects. We describe in this paper a hierarchical algorithm that provides a robust, high-sensitivity solution to this problem, which can be easily adapted to a variety of experimental situations. The software implementation of this algorithm functions as part of a complete package designed for general protein gel analysis applications
Digital Image Access & Retrieval
The 33th Annual Clinic on Library Applications of Data Processing, held at the University of Illinois at Urbana-Champaign in March of 1996, addressed the theme of "Digital Image Access & Retrieval." The papers from this conference cover a wide range of topics concerning digital imaging technology for visual resource collections. Papers covered three general areas: (1) systems, planning, and implementation; (2) automatic and semi-automatic indexing; and (3) preservation with the bulk of the conference focusing on indexing and retrieval.published or submitted for publicatio
- …