15 research outputs found
Post-OCR Paragraph Recognition by Graph Convolutional Networks
Paragraphs are an important class of document entities. We propose a new
approach for paragraph identification by spatial graph convolutional neural
networks (GCN) applied on OCR text boxes. Two steps, namely line splitting and
line clustering, are performed to extract paragraphs from the lines in OCR
results. Each step uses a beta-skeleton graph constructed from bounding boxes,
where the graph edges provide efficient support for graph convolution
operations. With only pure layout input features, the GCN model size is 3~4
orders of magnitude smaller compared to R-CNN based models, while achieving
comparable or better accuracies on PubLayNet and other datasets. Furthermore,
the GCN models show good generalization from synthetic training data to
real-world images, and good adaptivity for variable document styles
Large language models in machine translation
This paper reports on the benefits of largescale statistical language modeling in machine translation. A distributed infrastructure is proposed which we use to train on up to 2 trillion tokens, resulting in language models having up to 300 billion n-grams. It is capable of providing smoothed probabilities for fast, single-pass decoding. We introduce a new smoothing method, dubbed Stupid Backoff, that is inexpensive to train on large data sets and approaches the quality of Kneser-Ney Smoothing as the amount of training data increases.
Advanced Television Research Program
Contains reports on ten research projects.National Science Foundation Grant MIP 87-14969National Science Foundation FellowshipAdvanced Television Research ProgramAT&T Bell Laboratories Doctoral Support ProgramKodak FellowshipU.S. Air Force - Electronic Systems Division Contract F1 9628-89-K-004
Advanced Television Research Program
Contains an introduction and reports on twelve research projects.Advanced Television Research ProgramNational Science Foundation Grant MIP 87-14969National Science Foundation FellowshipKodak Fellowshi
Search and Retrieval—search processes General Terms
An advanced visual interface system is presented for fluid interaction in a personal digital library system. The system employs a zoomable planar representation of a collection using hybrid continuous/quantum treemap visualizations to facilitate navigation while minimizing cognitive load. By providing both fluidity and a means of reading documents within the same visualization, the system obliterates the traditional boundary separating the acquisition of materials from their use. In addition, the system provides a means of streamlining and largely automating the addition of new documents into a collection. The system is particularly well suited to user tasks which, in the physical world, are normally carried out by laying out a set of related documents on a physical desk — namely, those tasks that require frequent and rapid transfer of attention from one document in the collection to another. Discussed are the design and implementation of the system as well as its relationship to previous work
Making UpLib Useful: Personal Document Engineering ABSTRACT
Any new system must provide significant advantages to users for them to adopt it over their existing practices. In this paper, we discuss changes made over the last two years of use of the UpLib personal digital library system, to provide those advantages in the realm of document management. These changes are concentrated in the document acquisition phase, where document analysis is performed and databases of document information are prepared. However, some changes have been made in the areas of document management and document usage, primarily to allow user better interaction with the improved document projections
Fluid Interface for Personal Digital Libraries
Abstract. An advanced interface is presented for fluid interaction in a personal digital library system. The system employs a zoomable planar representation of a collection using hybrid continuous/quantum treemap visualizations to facilitate navigation while minimizing cognitive load. The system is particularly well suited to user tasks which, in the physical world, are normally carried out by laying out a set of related documents on a physical desk — namely, those tasks that require frequent and rapid transfer of attention from one document in the collection to another. Discussed are the design and implementation of the system as well as its relationship to previous work.