408 research outputs found

    Automatic office document classification and information extraction

    Get PDF
    TEXPR.OS (TEXt PROcessing System) is a document processing system (DPS) to support and assist office workers in their daily work in dealing with information and document management. In this thesis, document classification and information extraction, which are two of the major functional capabilities in TEXPROS, are investigated. Based on the nature of its content, a document is divided into structured and unstructured (i.e., of free text) parts. The conceptual and content structures are introduced to capture the semantics of the structured and unstructured part of the document respectively. The document is classified and information is extracted based on the analyses of conceptual and content structures. In our approach, the layout structure of a document is used to assist the analyses of the conceptual and content structures of the document. By nested segmentation of a document, the layout structure of the document is represented by an ordered labeled tree structure, called Layout Structure Tree (L-S-Tree). Sample-based classification mechanism is adopted in our approach for classifying the documents. A set of pre-classified documents are stored in a document sample base in the form of sample trees. In the layout analysis, an approximate tree matching is used to match the L-S-Tree of a document to be classified against the sample trees. The layout similarities between the document and the sample documents are evaluated based on the edit distance between the L-S-Tree of the document and the sample trees. The document samples which have the similar layout structure to the document are chosen to be used for the conceptual analysis of the document. In the conceptual analysis of the document, based on the mapping between the document and document samples, which was found during the layout analysis, the conceptual similarities between the document and the sample documents are evaluated based on the degree of conceptual closeness degree . The document sample which has the similar conceptual structure to the document is chosen to be used for extracting information. Extracting the information of the structured part of the document is based on the layout locations of key terms appearing in the document and string pattern matching. Based on the information extracted from the structured part of the document the type of the document is identified. In the content analysis of the document, the bottom-up and top-down analyses on the free text are combined to extract information from the unstructured part of the document. In the bottom-up analysis, the sentences of the free text are classified into those which are relevant or irrelevant to the extraction. The sentence classification is based on the semantical relationship between the phrases in the sentences and the attribute names in the corresponding content structure by consulting the thesaurus. Then the thematic roles of the phrases in each relevant sentence are identified based on the syntactic analysis and heuristic thematic analysis. In the top-down analysis, the appropriate content structure is identified based on the document type identified in the conceptual analysis. Then the information is extracted from the unstructured part of the document by evaluating the restrictions specified in the corresponding content structure based on the result of bottom-up analysis. The information extracted from the structured and unstructured parts of the document are stored in the form of a frame like structure (frame instance) in the data base for information retrieval in TEXPROS

    TD-MPC2: Scalable, Robust World Models for Continuous Control

    Full text link
    TD-MPC is a model-based reinforcement learning (RL) algorithm that performs local trajectory optimization in the latent space of a learned implicit (decoder-free) world model. In this work, we present TD-MPC2: a series of improvements upon the TD-MPC algorithm. We demonstrate that TD-MPC2 improves significantly over baselines across 104 online RL tasks spanning 4 diverse task domains, achieving consistently strong results with a single set of hyperparameters. We further show that agent capabilities increase with model and data size, and successfully train a single 317M parameter agent to perform 80 tasks across multiple task domains, embodiments, and action spaces. We conclude with an account of lessons, opportunities, and risks associated with large TD-MPC2 agents. Explore videos, models, data, code, and more at https://nicklashansen.github.io/td-mpc2Comment: Explore videos, models, data, code, and more at https://nicklashansen.github.io/td-mpc

    Some Weak Convergence Theorems for a Family of Asymptotically Nonexpansive Nonself Mappings

    Full text link
    A one-step iteration with errors is considered for a family of asymptotically nonexpansive nonself mappings. Weak convergence of the purposed iteration is obtained in a Banach space

    Matrix Li-Yau-Hamilton Estimates under K\"ahler-Ricci Flow

    Full text link
    We prove matrix Li-Yau-Hamilton estimates for positive solutions to the heat equation and the backward conjugate heat equation, both coupled with the K\"ahler-Ricci flow. As an application, we obtain a monotonicity formula.Comment: 13 pages. Comments are welcom

    Spin-dependent Andreev reflection tunneling through a quantum dot with intradot spin-flip scattering

    Full text link
    We study Andreev reflection (AR) tunneling through a quantum dot (QD) connected to a ferromagnet and a superconductor, in which the intradot spin-flip interaction is included. By using the nonequibrium-Green-function method, the formula of the linear AR conductance is derived at zero temperature. It is found that competition between the intradot spin-flip scattering and the tunneling coupling to the leads dominantes resonant behaviours of the AR conductance versus the gate voltage.A weak spin-flip scattering leads to a single peak resonance.However, with the spin-flip scattering strength increasing, the AR conductance will develop into a double peak resonannce implying a novel structure in the tunneling spectrum of the AR conductance. Besides, the effect of the spin-dependent tunneling couplings, the matching of Fermi velocity, and the spin polarization of the ferromagnet on the AR conductance is eximined in detail.Comment: 14 pages, 4 figure