9 research outputs found
GEMINI: A Natural Language System for Spoken-Language Understanding
Gemini is a natural language understanding system developed for spoken
language applications. The paper describes the architecture of Gemini, paying
particular attention to resolving the tension between robustness and
overgeneration. Gemini features a broad-coverage unification-based grammar of
English, fully interleaved syntactic and semantic processing in an all-paths,
bottom-up parser, and an utterance-level parser to find interpretations of
sentences that might not be analyzable as complete sentences. Gemini also
includes novel components for recognizing and correcting grammatical
disfluencies, and for doing parse preferences. This paper presents a
component-by-component view of Gemini, providing detailed relevant measurements
of size, efficiency, and performance.Comment: 8 pages, postscrip
Recommended from our members
Maintaining computer-based information systems using text-based intelligent systems techniques
In order to incorporate up-to-date quantitative and qualitative information, Computer- Based Information Systems (CBIS) must be able to extract data from unstructured, textual formats such as newspapers and magazines. The process of updating information in a CBIS currently requires large amounts of human effort analyzing and converting data from such sources into formats which information systems can work with. This paper suggests some methods by which the data needs of a CBIS can be handled semi-automatically (Employing both computers and humans) using text-based intelligent systems (TBIS) techniques
Text skimming as a part in paper document understanding
In our document understanding project ALV we analyse incoming paper mail in the domain of single-sided German business letters. These letters are scanned and after several analysis steps the text is recognized. The result may contain gaps, word alternatives, and even illegal words. The subject of this paper is the subsequent phase which concerns the extraction of important information predefined in our "message type model". An expectation driven partial text skimming analysis is proposed focussing on the kernel module, the so-called "predictor". In contrast to traditional text skimming the following aspects are important in our approach. Basically, the input data are fragmentary texts. Rather than having one text analysis module ("substantiator") only, our predictor controls a set of different and partially alternative substantiators. With respect to the usually proposed three working phases of a predictor - start, discrimination, and instantiation - the following differences are remarkable. The starting problem of text skimming is solved by applying specialized substantiators for classifying a business letter into message types. In order to select appropriate expectations within the message type hypotheses a twofold discrimination is performed. A coarse discrimination reduces the number of message type alternatives, and a fine discrimination chooses one expectation within one or a few previously selected message types. According to the expectation selected substantiators are activated. Several rules are applied both for the verification of the substantiator results and for error recovery if the results are insufficient
Text skimming as a part in paper document understanding
In our document understanding project ALV we analyse incoming paper mail in the domain of single-sided German business letters. These letters are scanned and after several analysis steps the text is recognized. The result may contain gaps, word alternatives, and even illegal words. The subject of this paper is the subsequent phase which concerns the extraction of important information predefined in our "message type model". An expectation driven partial text skimming analysis is proposed focussing on the kernel module, the so-called "predictor". In contrast to traditional text skimming the following aspects are important in our approach. Basically, the input data are fragmentary texts. Rather than having one text analysis module ("substantiator") only, our predictor controls a set of different and partially alternative substantiators. With respect to the usually proposed three working phases of a predictor - start, discrimination, and instantiation - the following differences are remarkable. The starting problem of text skimming is solved by applying specialized substantiators for classifying a business letter into message types. In order to select appropriate expectations within the message type hypotheses a twofold discrimination is performed. A coarse discrimination reduces the number of message type alternatives, and a fine discrimination chooses one expectation within one or a few previously selected message types. According to the expectation selected substantiators are activated. Several rules are applied both for the verification of the substantiator results and for error recovery if the results are insufficient
Financial information extraction using pre-defined and user-definable templates in the Lolita system
Financial operators have today access to an extremely large amount of data, both quantitative and qualitative, real-time or historical and can use this information to support their decision-making process. Quantitative data are largely processed by automatic computer programs, often based on artificial intelligence techniques, that produce quantitative analysis, such as historical price analysis or technical analysis of price behaviour. Differently, little progress has been made in the processing of qualitative data, which mainly consists of financial news articles from financial newspapers or on-line news providers. As a result the financial market players are overloaded with qualitative information which is potentially extremely useful but, due to the lack of time, is often ignored. The goal of this work is to reduce the qualitative data-overload of the financial operators. The research involves the identification of the information in the source financial articles which is relevant for the financial operators' investment decision making process and to implement the associated templates in the LOLITA system. The system should process a large number of source articles and extract specific templates according to the relevant information located in the source articles. The project also involves the design and implementation in LOLITA of a user- definable template interface for allowing the users to easily design new templates using sentences in natural language. This allows user-defined information extraction from source texts. This differs from most of existing information extraction systems which require the developers to code the templates directly in the system. The results of the research have shown that the system performed well in the extraction of financial templates from source articles which would allow the financial operator to reduce his qualitative data-overload. The results have also shown that the user-definable template interface is a viable approach to user-defined information extraction. A trade-off has been identified between the ease of use of the user-definable template interface and the loss of performance compared to hand- coded templates
From text mining to knowledge mining: An integrated framework of concept extraction and categorization for domain ontology
Organizations are struggling with the challenges coming from the regulatory, social and economic environment which are complex and changing continuously. They cause increase demand for the management of organizational knowledge, like how to provide employees, the necessary job-specific knowledge in right time and in right format. Employees have to update their knowledge, improve their competencies continuously. Knowledge repositories have key roles from knowledge management aspects, because they contain primarily the organizations’ intellectual assets (it is explicit knowledge) while employees have tacit knowledge, which is difficult to extract and codify. Business processes are also important from the management of organizational knowledge aspects, they have explicit and tacit knowledge elements as well. One of the key questions is how to handle this hidden knowledge in order to improve the organizational knowledge especially employees' knowledge by providing the most appropriate learning and/or training materials and how can we ensure that the knowledge in business processes are the same as in knowledge repositories and employees' head. These are the major themes in this thesis
Robust Processing of Real-World Natural-Language Texts
It. is often assumed that when natural language processing meets the real world, tile ideal of a. iming for complete and correct interpretations ha.s to be abandoned. However, our experience with TACITUS; especially in the MUC-3 evaluation, has shown that principled techniques for syntactic and pragmatic analysis can be bolstered with methods for achieving robustness. We describ