978 research outputs found

    On virtual partitioning of large dictionaries for contextual post-processing to improve character recognition

    Get PDF
    This paper presents a new approach to the partitioning of large dictionaries by virtual views. The basic idea is that additional knowledge sources of text recognition and text analysis are employed for fast dictionary look-up in order to prune search space through static or dynamic views. The heart of the system is a redundant hashing technique which involves a set of hash functions dealing with noisy input efficiently. Currently, the system is composed of two main system components: the dictionary generator and the dictionary controller. While the dictionary generator initially builds the system by using profiles and source dictionaries, the controller allows the flexible integration of different search heuristics. Results prove that our system achieves a respectable speed-up of dictionary access time

    A step towards understanding paper documents

    Get PDF
    This report focuses on analysis steps necessary for a paper document processing. It is divided in three major parts: a document image preprocessing, a knowledge-based geometric classification of the image, and a expectation-driven text recognition. It first illustrates the several low level image processing procedures providing the physical document structure of a scanned document image. Furthermore, it describes a knowledge-based approach, developed for the identification of logical objects (e.g., sender or the footnote of a letter) in a document image. The logical identifiers provide a context-restricted consideration of the containing text. While using specific logical dictionaries, a expectation-driven text recognition is possible to identify text parts of specific interest. The system has been implemented for the analysis of single-sided business letters in Common Lisp on a SUN 3/60 Workstation. It is running for a large population of different letters. The report also illustrates and discusses examples of typical results obtained by the system

    The Big Picture: Using Desktop Imagery for Detection of Insider Threats

    Get PDF
    The insider threat is one of the most difficult problems in information security. Prior research addresses its detection by using machine learning techniques to profile user behavior. User behavior is represented as low level system events, which do not provide sufficient contextual information about the user\u27s intentions, and lead to high error rates. Our system uses video of a user\u27s sessions as the representation of their behavior, and detects moments during which they perform sensitive tasks. Analysis of the video is accomplished using OCR, scene detection algorithms, and basic text classification. The system outputs the results to a web interface, and our results show that using desktop imagery is a viable alternative to using system calls for insider threat detection

    Verbmobil : translation of face-to-face dialogs

    Get PDF
    Verbmobil is a long-term project on the translation of spontaneous language in negotiation dialogs. We describe the goals of the project, the chosen discourse domains and the initial project schedule. We discuss some of the distinguishing features of Verbmobil and introduce the notion of translation on demand and variable depth of processing in speech translation. Finally, the role of anytime modules for efficient dialog translation in close to real time is described

    Text skimming as a part in paper document understanding

    Get PDF
    In our document understanding project ALV we analyse incoming paper mail in the domain of single-sided German business letters. These letters are scanned and after several analysis steps the text is recognized. The result may contain gaps, word alternatives, and even illegal words. The subject of this paper is the subsequent phase which concerns the extraction of important information predefined in our "message type model". An expectation driven partial text skimming analysis is proposed focussing on the kernel module, the so-called "predictor". In contrast to traditional text skimming the following aspects are important in our approach. Basically, the input data are fragmentary texts. Rather than having one text analysis module ("substantiator") only, our predictor controls a set of different and partially alternative substantiators. With respect to the usually proposed three working phases of a predictor - start, discrimination, and instantiation - the following differences are remarkable. The starting problem of text skimming is solved by applying specialized substantiators for classifying a business letter into message types. In order to select appropriate expectations within the message type hypotheses a twofold discrimination is performed. A coarse discrimination reduces the number of message type alternatives, and a fine discrimination chooses one expectation within one or a few previously selected message types. According to the expectation selected substantiators are activated. Several rules are applied both for the verification of the substantiator results and for error recovery if the results are insufficient

    Text skimming as a part in paper document understanding

    Get PDF
    In our document understanding project ALV we analyse incoming paper mail in the domain of single-sided German business letters. These letters are scanned and after several analysis steps the text is recognized. The result may contain gaps, word alternatives, and even illegal words. The subject of this paper is the subsequent phase which concerns the extraction of important information predefined in our "message type model". An expectation driven partial text skimming analysis is proposed focussing on the kernel module, the so-called "predictor". In contrast to traditional text skimming the following aspects are important in our approach. Basically, the input data are fragmentary texts. Rather than having one text analysis module ("substantiator") only, our predictor controls a set of different and partially alternative substantiators. With respect to the usually proposed three working phases of a predictor - start, discrimination, and instantiation - the following differences are remarkable. The starting problem of text skimming is solved by applying specialized substantiators for classifying a business letter into message types. In order to select appropriate expectations within the message type hypotheses a twofold discrimination is performed. A coarse discrimination reduces the number of message type alternatives, and a fine discrimination chooses one expectation within one or a few previously selected message types. According to the expectation selected substantiators are activated. Several rules are applied both for the verification of the substantiator results and for error recovery if the results are insufficient

    Using integrated knowledge acquisition to prepare sophisticated expert plans for their re-use in novel situations

    Get PDF
    Plans which were constructed by human experts and have been repeatedly executed to the complete satisfaction of some customer in a complex real world domain contain very valuable planning knowledge. In order to make this compiled knowledge re-usable for novel situations, a specific integrated knowledge acquisition method has been developed: First, a domain theory is established from documentation materials or texts, which is then used as the foundation for explaining how the plan achieves the planning goal. Secondly, hierarchically structured problem class definitions are obtained from the practitioners\u27 highlevel problem conceptualizations. The descriptions of these problem classes also provide operationality criteria for the various levels in the hierarchy. A skeletal plan is then constructed for each problem class with an explanation-based learning procedure. These skeletal plans consist of a sequence of general plan elements, so that each plan element can be independently refined. The skeletal plan thus accounts for the interactions between the various concrete operations of the plan at a general level. The complexity of the planning problem is thereby factored in a domain-specific way and the compiled knowledge of sophisticated expert plans can be re-used in novel situations

    Mining the Medical and Patent Literature to Support Healthcare and Pharmacovigilance

    Get PDF
    Recent advancements in healthcare practices and the increasing use of information technology in the medical domain has lead to the rapid generation of free-text data in forms of scientific articles, e-health records, patents, and document inventories. This has urged the development of sophisticated information retrieval and information extraction technologies. A fundamental requirement for the automatic processing of biomedical text is the identification of information carrying units such as the concepts or named entities. In this context, this work focuses on the identification of medical disorders (such as diseases and adverse effects) which denote an important category of concepts in the medical text. Two methodologies were investigated in this regard and they are dictionary-based and machine learning-based approaches. Futhermore, the capabilities of the concept recognition techniques were systematically exploited to build a semantic search platform for the retrieval of e-health records and patents. The system facilitates conventional text search as well as semantic and ontological searches. Performance of the adapted retrieval platform for e-health records and patents was evaluated within open assessment challenges (i.e. TRECMED and TRECCHEM respectively) wherein the system was best rated in comparison to several other competing information retrieval platforms. Finally, from the medico-pharma perspective, a strategy for the identification of adverse drug events from medical case reports was developed. Qualitative evaluation as well as an expert validation of the developed system's performance showed robust results. In conclusion, this thesis presents approaches for efficient information retrieval and information extraction from various biomedical literature sources in the support of healthcare and pharmacovigilance. The applied strategies have potential to enhance the literature-searches performed by biomedical, healthcare, and patent professionals. The applied strategies have potential to enhance the literature-searches performed by biomedical, healthcare, and patent professionals. This can promote the literature-based knowledge discovery, improve the safety and effectiveness of medical practices, and drive the research and development in medical and healthcare arena

    Corporate agents

    Get PDF
    The logic of belief and intention in situations with multiple agents is increasingly well understood, but current formal approaches appear to face problems in applications where the number of agents greatly exceeds two. We provide an informal development of Corporate Agents, an intensional approximation of individual and group states which treats groups symmetrically with autonomous agents. Corporate Charters, constraints derived from typical patterns of information flow, replace detailed reasoning about the propagation of attitudes in most contexts. The approximation to an ideal logical formulation is not tight, but the model appears to function well in information-poor environments and fails in ways related to characteristic human errors. It may therefore be particularly appropriate to application in the area of natural language discourse
    corecore