2,617 research outputs found
Basic research planning in mathematical pattern recognition and image analysis
Fundamental problems encountered while attempting to develop automated techniques for applications of remote sensing are discussed under the following categories: (1) geometric and radiometric preprocessing; (2) spatial, spectral, temporal, syntactic, and ancillary digital image representation; (3) image partitioning, proportion estimation, and error models in object scene interference; (4) parallel processing and image data structures; and (5) continuing studies in polarization; computer architectures and parallel processing; and the applicability of "expert systems" to interactive analysis
Table-to-Text: Generating Descriptive Text for Scientific Tables from Randomized Controlled Trials
Unprecedented amounts of data have been generated in the biomedical domain, and the bottleneck for biomedical research has shifted from data generation to data management, interpretation, and communication. Therefore, it is highly desirable to develop systems to assist in text generation from biomedical data, which will greatly improve the dissemination of scientific findings. However, very few studies have investigated issues of data-to-text generation in the biomedical domain. Here I present a systematic study for generating descriptive text from tables in randomized clinical trials (RCT) articles, which includes: (1) an information model for representing RCT tables; (2) annotated corpora containing pairs of RCT table and descriptive text, and labeled structural and semantic information of RCT tables; (3) methods for recognizing structural and semantic information of RCT tables; (4) methods for generating text from RCT tables, evaluated by a user study on three aspects: relevance, grammatical quality, and matching. The proposed hybrid text generation method achieved a low bilingual evaluation understudy (BLEU) score of 5.69; but human review achieved scores of 9.3, 9.9 and 9.3 for relevance, grammatical quality and matching, respectively, which are comparable to review of original human-written text. To the best of our knowledge, this is the first study to generate text from scientific tables in the biomedical domain. The proposed information model, labeled corpora and developed methods for recognizing tables and generating descriptive text could also facilitate other biomedical and informatics research and applications
Distributed Representations for Compositional Semantics
The mathematical representation of semantics is a key issue for Natural
Language Processing (NLP). A lot of research has been devoted to finding ways
of representing the semantics of individual words in vector spaces.
Distributional approaches --- meaning distributed representations that exploit
co-occurrence statistics of large corpora --- have proved popular and
successful across a number of tasks. However, natural language usually comes in
structures beyond the word level, with meaning arising not only from the
individual words but also the structure they are contained in at the phrasal or
sentential level. Modelling the compositional process by which the meaning of
an utterance arises from the meaning of its parts is an equally fundamental
task of NLP.
This dissertation explores methods for learning distributed semantic
representations and models for composing these into representations for larger
linguistic units. Our underlying hypothesis is that neural models are a
suitable vehicle for learning semantically rich representations and that such
representations in turn are suitable vehicles for solving important tasks in
natural language processing. The contribution of this thesis is a thorough
evaluation of our hypothesis, as part of which we introduce several new
approaches to representation learning and compositional semantics, as well as
multiple state-of-the-art models which apply distributed semantic
representations to various tasks in NLP.Comment: DPhil Thesis, University of Oxford, Submitted and accepted in 201
Eyes Wide Open: an interactive learning method for the design of rule-based systems
International audienceWe present in this paper a new general method, the Eyes Wide Open method (EWO) for the design of rule-based document recognition systems. Our contribution is to introduce a learning procedure, through machine learning techniques, in interaction with the user to design the recognition system. Therefore, and unlike many approaches that are manually designed, ours can easily adapt to a new type of documents while taking advantage of the expressiveness of rule-based systems and their ability to convey the hierarchical structure of a document. The EWO method is independent of any existing recognition system. An automatic analysis of an annotated corpus, guided by the user, is made to help the adaption of the recognition system to a new kind of document. The user will then bring sense to the automatically extracted information. In this paper, we validate EWO by producing two rule-based systems: one for the Mau-rdor international competition, on a heterogeneous corpus of documents, containing handwritten and printed documents, written in different languages and another one for the RIMES competition corpus, a homogeneous corpus of French handwritten business letters. On the RIMES corpus, our method allows an assisted design of a grammatical description that gives better results than all the previously proposed statistical systems
- …