22 research outputs found
An approach to automated thesaurus construction using clusterization-based dictionary analysis
In the paper an automated approach for construction of the terminological thesaurus for a specific domain is proposed. It uses an explanatory dictionary as the initial text corpus and a controlled vocabulary related to the target lexicon to initiate extraction of the terms for the thesaurus. Subdivision of the terms into semantic clusters is based on the CLOPE clustering algorithm. The approach diminishes the cost of the thesaurus creation by involving the expert only once during the whole construction process, and only for analysis of a small subset of the initial dictionary. To validate the performance of the proposed approach the authors successfully constructed a thesaurus in the cardiology domain
Ontology-based Competency Analyses in New Research Domains
Ontology-driven methods of competence management oriented on support of scientific research for new domains are proposed. Ontologies of research domain are matched with personal information about scientific researchers represented into Web (for example, at the social networks) and results of their work (publications, monographs, reports etc.) are processed by logical methods and ontological analysis. Web-services and multi-agent programming paradigm are used for their software realization
Hierarchical text clustering applied to taxonomy evaluation
In computer science, the use for taxonomies is widely embraced in fields such as Artifial
Inteligence, Information Retrieval, Natural Language Processing or Machine Learning.
This concept classifications provide knowledge structures to guide algorithms on the
task to find an acceptable-to-nearly-optimal solution on non deterministic problems.
The main problem with taxonomies is the huge amount of effort that requires to build
one. Traditionally, this is done by human means and involves a team of experts to assure
the quality of the result. Since this is evidently the way to get the best taxonomy
possible (knowledge is an exclusive quality of humans), due to the manpower factor, it
seems to be neither the fastest nor the cheapest one.
This thesis makes an extensive review of the state of the art on taxonomy induction
techniques as well as ontology evaluation methods. It claims the need for a fast, automatic
and arbitrary-domain taxonomy generation method and justifies the chose of the
Wikipedia encyclopedia as the dataset. A framework to deal with taxonomies is proposed
and implemented. In the experiments chapter, two statements are successfully
refuted: the Wikipedia categorization system forms an acyclic directed graph, and the
longest path between two nodes is equivalent to the taxonomic organization. Finally
the framework is used to explore three arbitrary domains
Automatic Terminology Coding for the Biomedical Domain
The biomedical sector, rich in unstructured data from sources like clinical notes and health records, presents a prime opportunity for Natural Language Processing (NLP) applications. Especially pivotal is the task of entity linking, wherein textual mentions are mapped to medical concepts within a knowledge base, in this case, represented by the Unified Medical Language System (UMLS) Metathesaurus. Within this realm, the Italian language faces resource constraints (only 4% of UMLS 4M concepts have a label in the Italian language). Current systems like MAPS Group’s Clinika software lean on label matching to link the extracted facts to the corresponding UMLS concepts. This dissertation deals with the design of a new Clinika component aimed at enhancing entity linking for Italian terms against UMLS, even in the absence of direct Italian labels. Employing transformer-based multilingual embeddings, a novel 'concept guesser' architecture was developed to tackle the linking challenge intelligently, maximizing the level of exploitation of the currently available knowledge. This innovation not only enhances Clinika’s effectiveness but also paves the way for advanced multilingual clinical decision support systems
Bibliographic Control in the Digital Ecosystem
With the contributions of international experts, the book aims to explore the new boundaries of universal bibliographic control. Bibliographic control is radically changing because the bibliographic universe is radically changing: resources, agents, technologies, standards and practices. Among the main topics addressed: library cooperation networks; legal deposit; national bibliographies; new tools and standards (IFLA LRM, RDA, BIBFRAME); authority control and new alliances (Wikidata, Wikibase, Identifiers); new ways of indexing resources (artificial intelligence); institutional repositories; new book supply chain; “discoverability” in the IIIF digital ecosystem; role of thesauri and ontologies in the digital ecosystem; bibliographic control and search engines
Social work with airports passengers
Social work at the airport is in to offer to passengers social services. The main
methodological position is that people are under stress, which characterized by a
particular set of characteristics in appearance and behavior. In such circumstances
passenger attracts in his actions some attention. Only person whom he trusts can help him
with the documents or psychologically
A robust methodology for automated essay grading
None of the available automated essay grading systems can be used to grade essays according to the National Assessment Program – Literacy and Numeracy (NAPLAN) analytic scoring rubric used in Australia. This thesis is a humble effort to address this limitation. The objective of this thesis is to develop a robust methodology for automatically grading essays based on the NAPLAN rubric by using heuristics and rules based on English language and neural network modelling