535,097 research outputs found
Topic Map Generation Using Text Mining
Starting from text corpus analysis with linguistic and statistical analysis algorithms, an infrastructure for text mining is described which uses collocation analysis as a central tool. This text mining method may be applied to different domains as well as languages. Some examples taken form large reference databases motivate the applicability to knowledge management using declarative standards of information structuring and description. The ISO/IEC Topic Map standard is introduced as a candidate for rich metadata description of information resources and it is shown how text mining can be used for automatic topic map generation
Using text search for personal photo collections with the MediAssist system
The MediAssist system enables organisation and searching of personal digital photo collections based on contextual information, content-based analysis and semi-automatic annotation. One mode of user interaction uses automatically extracted features to create text surrogates for photos, which enables text search of photo collections without manual annotation. Our evaluation shows that this text search facility is effective for known-item search
Towards Faithful Neural Table-to-Text Generation with Content-Matching Constraints
Text generation from a knowledge base aims to translate knowledge triples to
natural language descriptions. Most existing methods ignore the faithfulness
between a generated text description and the original table, leading to
generated information that goes beyond the content of the table. In this paper,
for the first time, we propose a novel Transformer-based generation framework
to achieve the goal. The core techniques in our method to enforce faithfulness
include a new table-text optimal-transport matching loss and a table-text
embedding similarity loss based on the Transformer model. Furthermore, to
evaluate faithfulness, we propose a new automatic metric specialized to the
table-to-text generation problem. We also provide detailed analysis on each
component of our model in our experiments. Automatic and human evaluations show
that our framework can significantly outperform state-of-the-art by a large
margin.Comment: Accepted at ACL202
Text Line Segmentation of Historical Documents: a Survey
There is a huge amount of historical documents in libraries and in various
National Archives that have not been exploited electronically. Although
automatic reading of complete pages remains, in most cases, a long-term
objective, tasks such as word spotting, text/image alignment, authentication
and extraction of specific fields are in use today. For all these tasks, a
major step is document segmentation into text lines. Because of the low quality
and the complexity of these documents (background noise, artifacts due to
aging, interfering lines),automatic text line segmentation remains an open
research field. The objective of this paper is to present a survey of existing
methods, developed during the last decade, and dedicated to documents of
historical interest.Comment: 25 pages, submitted version, To appear in International Journal on
Document Analysis and Recognition, On line version available at
http://www.springerlink.com/content/k2813176280456k3
- …