7,014 research outputs found
Generating Weather Forecast Texts with Case Based Reasoning
Several techniques have been used to generate weather forecast texts. In this
paper, case based reasoning (CBR) is proposed for weather forecast text
generation because similar weather conditions occur over time and should have
similar forecast texts. CBR-METEO, a system for generating weather forecast
texts was developed using a generic framework (jCOLIBRI) which provides modules
for the standard components of the CBR architecture. The advantage in a CBR
approach is that systems can be built in minimal time with far less human
effort after initial consultation with experts. The approach depends heavily on
the goodness of the retrieval and revision components of the CBR process. We
evaluated CBRMETEO with NIST, an automated metric which has been shown to
correlate well with human judgements for this domain. The system shows
comparable performance with other NLG systems that perform the same task.Comment: 6 page
Language technologies for a multilingual Europe
This volume of the series âTranslation and Multilingual Natural Language Processingâ includes most of the papers presented at the Workshop âLanguage Technology for a Multilingual Europeâ, held at the University of Hamburg on September 27, 2011 in the framework of the conference GSCL 2011 with the topic âMultilingual Resources and Multilingual Applicationsâ, along with several additional contributions. In addition to an overview article on Machine Translation and two contributions on the European initiatives META-NET and Multilingual Web, the volume includes six full research articles. Our intention with this workshop was to bring together various groups concerned with the umbrella topics of multilingualism and language technology, especially multilingual technologies. This encompassed, on the one hand, representatives from research and development in the field of language technologies, and, on the other hand, users from diverse areas such as, among others, industry, administration and funding agencies. The Workshop âLanguage Technology for a Multilingual Europeâ was co-organised by the two GSCL working groups âText Technologyâ and âMachine Translationâ (http://gscl.info) as well as by META-NET (http://www.meta-net.eu)
Change of biomedical domain terminology over time
Biomedical text processing is relying heavily on terminological resources. Independently of the method used for creating terminologies, either automatically extracted from a domain corpus or human crafted, there is one aspect of which is rarely considered â that terms evolve over time. Terms in the domain literature change due to many factors: new factual evidence, proposing new hypothesis or denying old ones, a shift towards increasing specificity, variation in expression, different people working independently on the same novel phenomenon, etc. This paper reports an experimental investigation carried out on biomedical domain literature capturing how specific domain terminology changes over time
Digital Image Access & Retrieval
The 33th Annual Clinic on Library Applications of Data Processing, held at the University of Illinois at Urbana-Champaign in March of 1996, addressed the theme of "Digital Image Access & Retrieval." The papers from this conference cover a wide range of topics concerning digital imaging technology for visual resource collections. Papers covered three general areas: (1) systems, planning, and implementation; (2) automatic and semi-automatic indexing; and (3) preservation with the bulk of the conference focusing on indexing and retrieval.published or submitted for publicatio
Meeting Medical Terminology Needs: The Ontology-enhanced Medical Concept Mapper
This paper describes the development and testing of the Medical Concept Mapper, a tool designed to facilitate access to online medical information sources by providing users with appropriate medical search terms for their personal queries. Our system is valuable for patients whose knowledge of medical vocabularies is inadequate to find the desired information, and for medical experts who search for information outside their field of expertise. The Medical Concept Mapper maps synonyms and semantically related concepts to a user\u27s query. The system is unique because it integrates our natural language processing tool, i.e., the Arizona (AZ) Noun Phraser, with human-created ontologies, the Unified Medical Language System (UMLS) and WordNet, and our computer generated Concept Space, into one system. Our unique contribution results from combining the UMLS Semantic Net with Concept Space in our deep semantic parsing (DSP) algorithm. This algorithm establishes a medical query context based on the UMLS Semantic Net, which allows Concept Space terms to be filtered so as to isolate related terms relevant to the query. We performed two user studies in which Medical Concept Mapper terms were compared against human experts\u27 terms. We conclude that the AZ Noun Phraser is well suited to extract medical phrases from user queries, that WordNet is not well suited to provide strictly medical synonyms, that the UMLS Metathesaurus is well suited to provide medical synonyms, and that Concept Space is well suited to provide related medical s, especially when these terms are limited by our DSP algorithm
Finding answers to questions, in text collections or web, in open domain or specialty domains
International audienceThis chapter is dedicated to factual question answering, i.e. extracting precise and exact answers to question given in natural language from texts. A question in natural language gives more information than a bag of word query (i.e. a query made of a list of words), and provides clues for finding precise answers. We will first focus on the presentation of the underlying problems mainly due to the existence of linguistic variations between questions and their answerable pieces of texts for selecting relevant passages and extracting reliable answers. We will first present how to answer factual question in open domain. We will also present answering questions in specialty domain as it requires dealing with semi-structured knowledge and specialized terminologies, and can lead to different applications, as information management in corporations for example. Searching answers on the Web constitutes another application frame and introduces specificities linked to Web redundancy or collaborative usage. Besides, the Web is also multilingual, and a challenging problem consists in searching answers in target language documents other than the source language of the question. For all these topics, we present main approaches and the remaining problems
Mixed-Language Arabic- English Information Retrieval
Includes abstract.Includes bibliographical references.This thesis attempts to address the problem of mixed querying in CLIR. It proposes mixed-language (language-aware) approaches in which mixed queries are used to retrieve most relevant documents, regardless of their languages. To achieve this goal, however, it is essential firstly to suppress the impact of most problems that are caused by the mixed-language feature in both queries and documents and which result in biasing the final ranked list. Therefore, a cross-lingual re-weighting model was developed. In this cross-lingual model, term frequency, document frequency and document length components in mixed queries are estimated and adjusted, regardless of languages, while at the same time the model considers the unique mixed-language features in queries and documents, such as co-occurring terms in two different languages. Furthermore, in mixed queries, non-technical terms (mostly those in non-English language) would likely overweight and skew the impact of those technical terms (mostly those in English) due to high document frequencies (and thus low weights) of the latter terms in their corresponding collection (mostly the English collection). Such phenomenon is caused by the dominance of the English language in scientific domains. Accordingly, this thesis also proposes reasonable re-weighted Inverse Document Frequency (IDF) so as to moderate the effect of overweighted terms in mixed queries
- âŠ