Search CORE

588 research outputs found

Automatic thesaurus construction

Author: Powers David Martin
Yang Dongqiang
Publication venue: Australian Computer Society
Publication date: 01/01/2008
Field of study

Sydney, NS

CiteSeerX

Flinders Academic Commons

Ontology Population via NLP Techniques in Risk Management

Author: Anne-Marie Alquier
Jawad Makki
Violaine Prince
Publication venue
Publication date
Field of study

In this paper we propose an NLP-based method for Ontology Population from texts and apply it to semi automatic instantiate a Generic Knowledge Base (Generic Domain Ontology) in the risk management domain. The approach is semi-automatic and uses a domain expert intervention for validation. The proposed approach relies on a set of Instances Recognition Rules based on syntactic structures, and on the predicative power of verbs in the instantiation process. It is not domain dependent since it heavily relies on linguistic knowledge. A description of an experiment performed on a part of the ontology of the PRIMA project (supported by the European community) is given. A first validation of the method is done by populating this ontology with Chemical Fact Sheets from Environmental Protection Agency . The results of this experiment complete the paper and support the hypothesis that relying on the predicative power of verbs in the instantiation process improves the performance.Information Extraction, Instance Recognition Rules, Ontology Population, Risk Management, Semantic Analysis

Research Papers in Economics

Synonym Detection Using Syntactic Dependency And Neural Embeddings

Author: Li Ning
Sun Xiaodong
Wang Pikun
Yang Dongqiang
Publication venue
Publication date: 29/09/2022
Field of study

Recent advances on the Vector Space Model have significantly improved some NLP applications such as neural machine translation and natural language generation. Although word co-occurrences in context have been widely used in counting-/predicting-based distributional models, the role of syntactic dependencies in deriving distributional semantics has not yet been thoroughly investigated. By comparing various Vector Space Models in detecting synonyms in TOEFL, we systematically study the salience of syntactic dependencies in accounting for distributional similarity. We separate syntactic dependencies into different groups according to their various grammatical roles and then use context-counting to construct their corresponding raw and SVD-compressed matrices. Moreover, using the same training hyperparameters and corpora, we study typical neural embeddings in the evaluation. We further study the effectiveness of injecting human-compiled semantic knowledge into neural embeddings on computing distributional similarity. Our results show that the syntactically conditioned contexts can interpret lexical semantics better than the unconditioned ones, whereas retrofitting neural embeddings with semantic knowledge can significantly improve synonym detection

arXiv.org e-Print Archive

A review of the state of the art in Machine Learning on the Semantic Web: Technical Report CSTR-05-003

Author: Price S
Publication venue: Department of Computer Science, University of Bristol
Publication date: 01/01/2004
Field of study

Explore Bristol Research

Rječnik suvremenoga slovenskog jezika: od slovenske leksičke baze do digitalne rječničke baze

Author: Polona Gantar
Publication venue: 'Institute of Croatian Language and Linguistics'
Publication date: 01/01/2020
Field of study

The ability to process language data has become fundamental to the development of technologies in various areas of human life in the digital world. The development of digitally readable linguistic resources, methods, and tools is, therefore, also a key challenge for the contemporary Slovene language. This challenge has been recognized in the Slovene language community both at the professional and state level and has been the subject of many activities over the past ten years, which will be presented in this paper. The idea of a comprehensive dictionary database covering all levels of linguistic description in modern Slovene, from the morphological and lexical levels to the syntactic level, has already formulated within the framework of the European Social Fund’s Communication in Slovene (2008-2013) project; the Slovene Lexical Database was also created within the framework of this project. Two goals were pursued in designing the Slovene Lexical Database (SLD): creating linguistic descriptions of Slovene intended for human users that would also be useful for the machine processing of Slovene. Ever since the construction of the first Slovene corpus, it has become evident that there is a need for a description of modern Slovene based on real language data, and that it is necessary to understand the needs of language users to create useful language reference works. It also became apparent that only the digital medium enables the comprehensiveness of language description and that the design of the database must be adapted to it from the start. Also, the description must follow best practices as closely as possible in terms of formats and international standards, as this enables the inclusion of Slovene into a wider network of resources, such as Open Linked Data, babelNet and ELExIS. Due to time pressures and trends in lexicography, procedures to automate the extraction of linguistic data from corpora and the inclusion of crowdsourcing into the lexicographic process were taken into consideration. Following the essential idea of creating an all-inclusive digital dictionary database for Slovene, a few independent databases have been created over the past two years: the Collocations Dictionary of Modern Slovene, and the automatically generated Thesaurus of Modern Slovene, both of which also exist as independent online dictionary portals. One of the novelties that we put forward together with both dictionaries is the ‘responsive dictionary’ concept, which includes crowdsourcing methods. Ultimately, the Digital Dictionary Database provides all (other) levels of linguistic description: the morphological level with the Sloleks database upgrade, the phraseological level with the construction of a multi-word expressions lexicon, and the syntactic level with the formalization of Slovene verb valency patterns. Each of these databases contains its specific language data that will ultimately be included in the comprehensive Slovene Digital Dictionary Database, which will represent basic linguistic descriptions of Slovene both for the human and machine user.Ideja sveobuhvatne rječničke baze koja uključuje sve razine jezičnoga opisa suvremenoga slovenskog jezika od morfološke i leksičke do sintaktičke prvotno je formulirana u okviru projekta Sporazumijevanje na slovenskomu jeziku (2008. – 2013.). U cilju ostvarenja ideje o stvaranju sveobuhvatne digitalne rječničke baze stvorene su dvije neovisne baze podataka: Kolokacijski rječnik suvremenoga slovenskoga jezika i automatski generiran Tezaurus modernoga slovenskoga jezika. Jedna od novina u obama rječnicima koncept je responzivnoga rječnika, koji uključuje masovnu podršku. Digitalna rječnička baza sadržava sve razine jezičnoga opisa: morfološku nadograđenu Sloleksom, izraznu s opisom konstrukcija višerječnih jedinica te sintaktičku s formalizacijom modela glagolskih valencija. Svaka od postojećih baza podataka sadržava specifične jezične podatke koji će biti uključeni u sveobuhvatnu Slovensku digitalnu rječničku bazu podataka, koja će sadržavati temeljni jezikoslovni opis slovenskoga jezika čiji korisnici mogu biti ljudi i strojevi

HRČAK - Portal of Croatian Scientific and Professional Journals

Hrčak - Portal of scientific journals of Croatia

The Role of Indexing in Subject Retrieval

Author: Austin Derek
Publication venue: Graduate School of Library Science. University of Illinois at Urbana-Champaign.
Publication date: 01/01/1975
Field of study

On first reading the list of speakers proposed for this institute, I became aware of being rather the "odd man out" for two reasons. Firstly, I was asked to present a paper on PRECIS which is very much a verbal indexing system-at a conference dominated by contributions on classification schemes with a natural bias, as the centenary year approaches, toward the Dewey Decimal Classification (DDC). Secondly, I feared (quite wrongly, as it happens) that I might be at variance with one or two of my fellow speakers, who would possibly like to assure us, in an age when we can no longer ignore the computer, that traditional library schemes such as DDC and Library of Congress Classification (LCC) are capable of maintaining their original function of organizing collections of documents, and at the same time are also well suited to the retrieval of relevant citations from machine-held files. In this context, I am reminded of a review of a general collection of essays on classification schemes which appeared in the Journal of Documentation in 1972. Norman Roberts, reviewing the papers which dealt specifically with the well established schemes, deduced that "all the writers project their particular schemes into the future with an optimism that springs, perhaps, as much from a sense of emotional involvement as from concrete evidence." Since I do not believe that these general schemes can play any significant part in the retrieval of items from mechanized files, it appeared that I had been cast in the role of devil's advocate.published or submitted for publicatio

Illinois Digital Environment for Access to Learning and Scholarship Repository

Large-scale directional relationship extraction and resolution

Author: A Culotta
A Gladki
A Koike
A Yuryev
AB Clegg
C Rodriguez-Penagos
CM Topinka
Cory B Giles
D Zhou
F Rinaldi
F Rinaldi
H Chen
H Jang
H Kim
I Donaldson
IK Ruf
J Ding
J Jiang
JA Mitchell
JC Park
JD Kim
JD Kim
JD Wren
JD Wren
JD Wren
Jonathan D Wren
JP Vaque
K Fundel
K Sagae
LM Juliano
M Bundschus
M Chagoyen
M Huang
M Lease
M Wang
M-C de Marneffe
N Daraselia
P Zweigenbaum
R Bunescu
R Kuffner
RC Bunescu
RT Tsai
S Kim
S Novichkova
TK Jenssen
W Pratt
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Crossref

Springer - Publisher Connector

PubMed Central

An approach to automated thesaurus construction using clusterization-based dictionary analysis

Author: Ilya Paramonov
Inna Vorontsova
Nadezhda Lagutina
Natalia Kasatkina
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/04/2015
Field of study

In the paper an automated approach for construction of the terminological thesaurus for a specific domain is proposed. It uses an explanatory dictionary as the initial text corpus and a controlled vocabulary related to the target lexicon to initiate extraction of the terms for the thesaurus. Subdivision of the terms into semantic clusters is based on the CLOPE clustering algorithm. The approach diminishes the cost of the thesaurus creation by involving the expert only once during the whole construction process, and only for analysis of a small subset of the initial dictionary. To validate the performance of the proposed approach the authors successfully constructed a thesaurus in the cardiology domain

Crossref

Directory of Open Access Journals

Computer-assisted text analysis methodology in the social sciences

Author: Alexa Melina
Publication venue: Mannheim
Publication date: 05/11/2010
Field of study

"This report presents an account of methods of research in computer-assisted text analysis in the social sciences. Rather than to provide a comprehensive enumeration of all computer-assisted text analysis investigations either directly or indirectly related to the social sciences using a quantitative and computer-assisted methodology as their text analytical tool, the aim of this report is to describe the current methodological standpoint of computer-assisted text analysis in the social sciences. This report provides, thus, a description and a discussion of the operations carried out in computer-assisted text analysis investigations. The report examines both past and well-established as well as some of the current approaches in the field and describes the techniques and the procedures involved. By this means, a first attempt is made toward cataloguing the kinds of supplementary information as well as computational support which are further required to expand the suitability and applicability of the method for the variety of text analysis goals." (author's abstract

SSOAR - Social Science Open Access Repository