10 research outputs found

    Sustainable long-term WordNet development and maintenance: Case study of the Czech WordNet

    Get PDF
    Sustainable long-term WordNet development and maintenance: Case study of the Czech WordNet Czech WordNet represents one of the first national wordnets created during the EuroWordNet and BalkaNet projects. However, the data contains various issues that affect the use of Czech WordNet in NLP applications. Since the publication of the first CzWN version, the semantic network was augmented in several phases, however, complex final editing and publishing process has not been finished. In 2017, we have started a project to evaluate and update the Czech WordNet, followed by a connection to the Collaborative Interlingual Index. In this paper, we provide an overview of Czech WordNet data updates and extensions, and present the roadmap to publish a revised version of the Czech WordNet under open license. Moreover, we introduce a developed concept for long-term updates and maintenance of the data based on crowdsourcing activities.   Zrównoważony i długafalowy proces rozwoju i utrzymania wordnetu na przykładzie wordnetu czeskiego Czeski WordNet jest jednym z pierwszych narodowych wordnetów powstałych podczas projektów EuroWordNet i BalkaNet. Jednakże dane zawierają błędy, które wpływają na używanie czeskiego wordnetu w aplikacjach NLP. Od momentu opublikowania pierwszej wersji czeskiego wordnetu sieć semantyczna została rozszerzona w kilku etapach, jednak złożony proces końcowej edycji i publikacji nie został jeszcze zakończony. W roku 2017 zaczęliśmy projekt mający na celu ocenę i aktualizację czeskiego wordnetu, a następnie połączenie go z Collaborative Interlingual Index. W danym artykule przedstawiamy ogólny zarys uaktualnień i rozszerzeń zawartości czeskiego wordnetu, a także prezentujemy plan działania, który doprowadzi do publikacji udoskonalonej wersji czeskiego wordnetu na otwartej licencji. Ponadto prezentujemy opracowaną koncepcję długoterminowych uaktualnień i utrzymania danych w oparciu o działania crowdsourcingowe

    On Singles, Couples and Extended Families. Measuring Overlapping between Latin Vallex and Latin WordNet

    Get PDF
    Different lexical resources may pursue different views on lexical meaning. However, all of them deal with lexical items as common basic components, which are described according to criteria that may vary from one resource to another. In this paper, we present a method for measuring the degree of similarity between a valency-based lexical resource and a WordNet. This is motivated by both theoretical and practical reasons. As for the former, we wonder if there are lexical classes that "impose" themselves regardless of the fact that they are explicitly recorded as such in source lexical resources. As for the latter, our work wants to contribute to the research task dealing with merging lexical resources. In order to apply and evaluate our method, we propose a normalized coefficient of overlapping that measures the overlapping rate between a valency lexicon and a WordNet. In particular, in the context of the exploitation of the linguistic resources for ancient languages built over the last decade, we compute and evaluate the overlapping between a selection of homogeneous lexical subsets extracted from two lexical resources for Latin

    An ontology for human-like interaction systems

    Get PDF
    This report proposes and describes the development of a Ph.D. Thesis aimed at building an ontological knowledge model supporting Human-Like Interaction systems. The main function of such knowledge model in a human-like interaction system is to unify the representation of each concept, relating it to the appropriate terms, as well as to other concepts with which it shares semantic relations. When developing human-like interactive systems, the inclusion of an ontological module can be valuable for both supporting interaction between participants and enabling accurate cooperation of the diverse components of such an interaction system. On one hand, during human communication, the relation between cognition and messages relies in formalization of concepts, linked to terms (or words) in a language that will enable its utterance (at the expressive layer). Moreover, each participant has a unique conceptualization (ontology), different from other individual’s. Through interaction, is the intersection of both part’s conceptualization what enables communication. Therefore, for human-like interaction is crucial to have a strong conceptualization, backed by a vast net of terms linked to its concepts, and the ability of mapping it with any interlocutor’s ontology to support denotation. On the other hand, the diverse knowledge models comprising a human-like interaction system (situation model, user model, dialogue model, etc.) and its interface components (natural language processor, voice recognizer, gesture processor, etc.) will be continuously exchanging information during their operation. It is also required for them to share a solid base of references to concepts, providing consistency, completeness and quality to their processing. Besides, humans usually handle a certain range of similar concepts they can use when building messages. The subject of similarity has been and continues to be widely studied in the fields and literature of computer science, psychology and sociolinguistics. Good similarity measures are necessary for several techniques from these fields such as information retrieval, clustering, data-mining, sense disambiguation, ontology translation and automatic schema matching. Furthermore, the ontological component should also be able to perform certain inferential processes, such as the calculation of semantic similarity between concepts. The principal benefit gained from this procedure is the ability to substitute one concept for another based on a calculation of the similarity of the two, given specific circumstances. From the human’s perspective, the procedure enables referring to a given concept in cases where the interlocutor either does not know the term(s) initially applied to refer that concept, or does not know the concept itself. In the first case, the use of synonyms can do, while in the second one it will be necessary to refer the concept from some other similar (semantically-related) concepts...Programa Oficial de Doctorado en Ciencia y Tecnología InformáticaSecretario: Inés María Galván León.- Secretario: José María Cavero Barca.- Vocal: Yolanda García Rui

    Proceedings of the Sixth International Conference Formal Approaches to South Slavic and Balkan languages

    Get PDF
    Proceedings of the Sixth International Conference Formal Approaches to South Slavic and Balkan Languages publishes 22 papers that were presented at the conference organised in Dubrovnik, Croatia, 25-28 Septembre 2008

    Důležitá slova. Podklady ke kolokačnímu švédsko-českému slovníku základních sloves

    Get PDF
    Basic verbs, i.e. very common verbs that typically denote physical movements, locations, states or actions, undergo various semantic shifts and acquire different secondary uses. In extreme cases, the distribution of secondary uses grows so general that they are regarded as auxiliary verbs (go and to be going to), phase verbs (turn, grow), etc. ese uses are usually well-documented by grammars and language textbooks, and so are idiomatic expressions (phraseologisms) in dictionaries. ere is, however, a grey area in between, which is extremely difficult to learn for non-native speakers. is consists of secondary uses with limited collocability, in particular light verb constructions, and secondary meanings that only get activated under particular morphosyntactic conditions. e basic-verb secondary uses and constructions are usually semantically transparent, such that they do not pose understanding problems, but they are generally unpredictable and language-specific, such that they easily become an issue in non-native text production. In this thesis, Swedish basic verbs are approached from the contrastive point of view of an advanced Czech learner of Swedish. A selection of Swedish constructions with basic verbs is explored. e observations result in a proposal for the structure of a machine-readable Swedish-Czech...Základní slovesa (basic verbs), tj. frekventovaná významová slovesa, jež zpravidla popisují fyzický pohyb, umístění, stav, nebo děj, procházejí řadou sémantických posunů, díky kterým se používají k vyjádření druhotných, přenesených významů. V krajních případech se dané sloveso stává pomocným, způsobovým, nebo fázovým slovesem a přestávají pro ně platit kolokační omezení, jež se vztahují na sloveso užité v jeho primárním (tj. doslovném) významu. Tato užití sloves bývají většinou dobře dokumentována v gramatikách i učebnicích, stejně jako kvalitní slovníky podávají podrobnou informaci o užití těchto sloves v ustálených frazeologických spojeních. Mezi plně gramatikalizovaným užitím na jedné straně a idiomatickým, frazeologickým užitím na druhé straně však existuje celá škála užití základních sloves v přenesených významech, jejíž zvládnutí je pro nerodilého mluvčího značně obtížné: užití v přeneseném významu, jež mají omezenou kolokabilitu. To jsou především verbonominální konstrukce někdy nazývané analytické predikáty (light verb constructions), ale také užití, která za určitých omezených morfosyntaktických podmínek (např. pouze v negaci) aktivují abstraktní sémantické rysy u jiných predikátů, např. zesilují význam, nebo implikují, že daný děj již trvá dlouho, a podobně. Tato druhotná užití významových sloves...Institute of Germanic StudiesÚstav germánských studiíFilozofická fakultaFaculty of Art

    24th International Conference on Information Modelling and Knowledge Bases

    Get PDF
    In the last three decades information modelling and knowledge bases have become essentially important subjects not only in academic communities related to information systems and computer science but also in the business area where information technology is applied. The series of European – Japanese Conference on Information Modelling and Knowledge Bases (EJC) originally started as a co-operation initiative between Japan and Finland in 1982. The practical operations were then organised by professor Ohsuga in Japan and professors Hannu Kangassalo and Hannu Jaakkola in Finland (Nordic countries). Geographical scope has expanded to cover Europe and also other countries. Workshop characteristic - discussion, enough time for presentations and limited number of participants (50) / papers (30) - is typical for the conference. Suggested topics include, but are not limited to: 1. Conceptual modelling: Modelling and specification languages; Domain-specific conceptual modelling; Concepts, concept theories and ontologies; Conceptual modelling of large and heterogeneous systems; Conceptual modelling of spatial, temporal and biological data; Methods for developing, validating and communicating conceptual models. 2. Knowledge and information modelling and discovery: Knowledge discovery, knowledge representation and knowledge management; Advanced data mining and analysis methods; Conceptions of knowledge and information; Modelling information requirements; Intelligent information systems; Information recognition and information modelling. 3. Linguistic modelling: Models of HCI; Information delivery to users; Intelligent informal querying; Linguistic foundation of information and knowledge; Fuzzy linguistic models; Philosophical and linguistic foundations of conceptual models. 4. Cross-cultural communication and social computing: Cross-cultural support systems; Integration, evolution and migration of systems; Collaborative societies; Multicultural web-based software systems; Intercultural collaboration and support systems; Social computing, behavioral modeling and prediction. 5. Environmental modelling and engineering: Environmental information systems (architecture); Spatial, temporal and observational information systems; Large-scale environmental systems; Collaborative knowledge base systems; Agent concepts and conceptualisation; Hazard prediction, prevention and steering systems. 6. Multimedia data modelling and systems: Modelling multimedia information and knowledge; Contentbased multimedia data management; Content-based multimedia retrieval; Privacy and context enhancing technologies; Semantics and pragmatics of multimedia data; Metadata for multimedia information systems. Overall we received 56 submissions. After careful evaluation, 16 papers have been selected as long paper, 17 papers as short papers, 5 papers as position papers, and 3 papers for presentation of perspective challenges. We thank all colleagues for their support of this issue of the EJC conference, especially the program committee, the organising committee, and the programme coordination team. The long and the short papers presented in the conference are revised after the conference and published in the Series of “Frontiers in Artificial Intelligence” by IOS Press (Amsterdam). The books “Information Modelling and Knowledge Bases” are edited by the Editing Committee of the conference. We believe that the conference will be productive and fruitful in the advance of research and application of information modelling and knowledge bases. Bernhard Thalheim Hannu Jaakkola Yasushi Kiyok

    A computational approach to Latin verbs: new resources and methods

    Get PDF
    Questa tesi presenta l'applicazione di metodi computazionali allo studio dei verbi latini. In particolare, mostriamo la creazione di un lessico di sottocategorizzazione estratto automaticamente da corpora annotati; inoltre presentiamo un modello probabilistico per l'acquisizione di preferenze di selezione a partire da corpora annotati e da un'ontologia (Latin WordNet). Infine, descriviamo i risultati di uno studio diacronico e quantitativo sui preverbi spaziali latini
    corecore