3,571 research outputs found

    Russian Lexicographic Landscape: a Tale of 12 Dictionaries

    Full text link
    The paper reports on quantitative analysis of 12 Russian dictionaries at three levels: 1) headwords: The size and overlap of word lists, coverage of large corpora, and presence of neologisms; 2) synonyms: Overlap of synsets in different dictionaries; 3) definitions: Distribution of definition lengths and numbers of senses, as well as textual similarity of same-headword definitions in different dictionaries. The total amount of data in the study is 805,900 dictionary entries, 892,900 definitions, and 84,500 synsets. The study reveals multiple connections and mutual influences between dictionaries, uncovers differences in modern electronic vs. traditional printed resources, as well as suggests directions for development of new and improvement of existing lexical semantic resources

    Towards a Semantic-based Approach for Modeling Regulatory Documents in Building Industry

    Get PDF
    Regulations in the Building Industry are becoming increasingly complex and involve more than one technical area. They cover products, components and project implementation. They also play an important role to ensure the quality of a building, and to minimize its environmental impact. In this paper, we are particularly interested in the modeling of the regulatory constraints derived from the Technical Guides issued by CSTB and used to validate Technical Assessments. We first describe our approach for modeling regulatory constraints in the SBVR language, and formalizing them in the SPARQL language. Second, we describe how we model the processes of compliance checking described in the CSTB Technical Guides. Third, we show how we implement these processes to assist industrials in drafting Technical Documents in order to acquire a Technical Assessment; a compliance report is automatically generated to explain the compliance or noncompliance of this Technical Documents

    Thesauri on the Web: current developments and trends

    Get PDF
    This article provides an overview of recent developments relating to the application of thesauri in information organisation and retrieval on the World Wide Web. It describes some recent thesaurus projects undertaken to facilitate resource description and discovery and access to wide-ranging information resources on the Internet. Types of thesauri available on the Web, thesauri integrated in databases and information retrieval systems, and multiple-thesaurus systems for cross-database searching are also discussed. Collective efforts and events in addressing the standardisation and novel applications of thesauri are briefly reviewed

    Machine Learning of User Profiles: Representational Issues

    Full text link
    As more information becomes available electronically, tools for finding information of interest to users becomes increasingly important. The goal of the research described here is to build a system for generating comprehensible user profiles that accurately capture user interest with minimum user interaction. The research described here focuses on the importance of a suitable generalization hierarchy and representation for learning profiles which are predictively accurate and comprehensible. In our experiments we evaluated both traditional features based on weighted term vectors as well as subject features corresponding to categories which could be drawn from a thesaurus. Our experiments, conducted in the context of a content-based profiling system for on-line newspapers on the World Wide Web (the IDD News Browser), demonstrate the importance of a generalization hierarchy and the promise of combining natural language processing techniques with machine learning (ML) to address an information retrieval (IR) problem.Comment: 6 page

    From Frequency to Meaning: Vector Space Models of Semantics

    Full text link
    Computers understand very little of the meaning of human language. This profoundly limits our ability to give instructions to computers, the ability of computers to explain their actions to us, and the ability of computers to analyse and process text. Vector space models (VSMs) of semantics are beginning to address these limits. This paper surveys the use of VSMs for semantic processing of text. We organize the literature on VSMs according to the structure of the matrix in a VSM. There are currently three broad classes of VSMs, based on term-document, word-context, and pair-pattern matrices, yielding three classes of applications. We survey a broad range of applications in these three categories and we take a detailed look at a specific open source project in each category. Our goal in this survey is to show the breadth of applications of VSMs for semantics, to provide a new perspective on VSMs for those who are already familiar with the area, and to provide pointers into the literature for those who are less familiar with the field
    corecore