148 research outputs found

    On the Utility of Word Embeddings for Enriching OpenWordNet-PT

    Get PDF
    The maintenance of wordnets and lexical knwoledge bases typically relies on time-consuming manual effort. In order to minimise this issue, we propose the exploitation of models of distributional semantics, namely word embeddings learned from corpora, in the automatic identification of relation instances missing in a wordnet. Analogy-solving methods are first used for learning a set of relations from analogy tests focused on each relation. Despite their low accuracy, we noted that a portion of the top-given answers are good suggestions of relation instances that could be included in the wordnet. This procedure is applied to the enrichment of OpenWordNet-PT, a public Portuguese wordnet. Relations are learned from data acquired from this resource, and illustrative examples are provided. Results are promising for accelerating the identification of missing relation instances, as we estimate that about 17% of the potential suggestions are good, a proportion that almost doubles if some are automatically invalidated

    Grouping Synonyms by Definitions

    Get PDF
    We present a method for grouping the synonyms of a lemma according to its dictionary senses. The senses are defined by a large machine readable dictionary for French, the TLFi (Tr\'esor de la langue fran\c{c}aise informatis\'e) and the synonyms are given by 5 synonym dictionaries (also for French). To evaluate the proposed method, we manually constructed a gold standard where for each (word, definition) pair and given the set of synonyms defined for that word by the 5 synonym dictionaries, 4 lexicographers specified the set of synonyms they judge adequate. While inter-annotator agreement ranges on that task from 67% to at best 88% depending on the annotator pair and on the synonym dictionary being considered, the automatic procedure we propose scores a precision of 67% and a recall of 71%. The proposed method is compared with related work namely, word sense disambiguation, synonym lexicon acquisition and WordNet construction

    Approaches towards a Lexical Web: the role of Interoperability

    Get PDF
    After highlighting some of the major dimensions that are relevant for Language Resources (LR) and contribute to their infrastructural role, I underline some priority areas of concern today with respect to implementing an open Language Infrastructure, and specifically what we could call a ?Lexical Web?. My objective is to show that it is imperative to define an underlying global strategy behind the set of initiatives which are/can be launched in Europe and world-wide, and that it is necessary an allembracing vision and a cooperation among different communities to achieve more coherent and useful results. I end up mentioning two new European initiatives that in this direction and promise to be influential in shaping the future of the LR area

    Universal Dictionary of Concepts

    Get PDF
    A universal dictionary of concepts, developed as a part of the ongoing effort to create a semantic intermediary language for global information exchange, is presented. The article describes basic principles and contents of the dictionary and outlines the current state of the project. The dictionary can evolve into an open and freely available language-independent resource with many potential applications. For example, the extensible dictionary of concepts can serve as a pivot to uniformly record and link meanings of words of different languages and facilitate creation of bi- and multilingual dictionaries. Another possible use is word sense markup of corpora. It could bring rich extra benefits due to the fact that the same set of concepts is going to be linked with major world languages including Russian, English, Spanish etc. and supported by multiple text analysis tools. There is a possibility of cooperation and exchange between this dictionary project and other projects, which could enhance the output and eventually spare a lot of parallel effort

    The Lexical Grid: Lexical Resources in Language Infrastructures

    Get PDF
    Language Resources are recognized as a central and strategic for the development of any Human Language Technology system and application product. they play a critical role as horizontal technology and have been recognized in many occasions as a priority also by national and spra-national funding a number of initiatives (such as EAGLES, ISLE, ELRA) to establish some sort of coordination of LR activities, and a number of large LR creation projects, both in the written and in the speech areas

    Recent developments for the linguistic linked open data infrastructure

    Get PDF
    In this paper we describe the contributions made by the European H2020 project “Pret-a-LLOD” (‘Ready-to-use Multilingual Linked Language Data for Knowledge Services across Sectors’) to the further development of the Linguistic Linked Open Data (LLOD) infrastructure. Pret-a-LLOD aims to develop a new methodology for building data value chains applicable to a wide range of sectors and applications and based around language resources and language technologies that can be integrated by means of semantic technologies. We describe the methods implemented for increasing the number of language data sets in the LLOD. We also present the approach for ensuring interoperability and for porting LLOD data sets and services to other infrastructures, as well as the contribution of the projects to existing standards

    On the evaluation and improvement of arabic wordnet coverage and usability

    Full text link
    The final publication is available at Springer via http://dx.doi.org/10.1007/s10579-013-9237-0[EN] Built on the basis of the methods developed for Princeton WordNet and EuroWordNet, Arabic WordNet (AWN) has been an interesting project which combines WordNet structure compliance with Arabic particularities. In this paper, some AWN shortcomings related to coverage and usability are addressed. The use of AWN in question/answering (Q/A) helped us to deeply evaluate the resource from an experience-based perspective. Accordingly, an enrichment of AWN was built by semi-automatically extending its content. Indeed, existing approaches and/or resources developed for other languages were adapted and used for AWN. The experiments conducted in Arabic Q/A have shown an improvement of both AWN coverage as well as usability. Concerning coverage, a great amount of named entities extracted from YAGO were connected with corresponding AWN synsets. Also, a significant number of new verbs and nouns (including Broken Plural forms) were added. In terms of usability, thanks to the use of AWN, the performance for the AWN-based Q/A application registered an overall improvement with respect to the following three measures: accuracy (+9.27 % improvement), mean reciprocal rank (+3.6 improvement) and number of answered questions (+12.79 % improvement).The work presented in Sect. 2.2 was done in the framework of the bilateral Spain-Morocco AECID-PCI C/026728/09 research project. The research of the two first authors is done in the framework of the PROGRAMME D'URGENCE project (grant no. 03/2010). The research of the third author is done in the framework of WIQEI IRSES project (grant no. 269180) within the FP 7 Marie Curie People, DIANA-APPLICATIONS-Finding Hidden Knowledge in Texts: Applications (TIN2012-38603-C02-01) research project and VLC/CAMPUS Microcluster on Multimodal Interaction in Intelligent Systems. We would like to thank Manuel Montes-y-Gomez (INAOE-Puebla, Mexico) and Sandra Garcia-Blasco (Bitsnbrain, Spain) for their feedback on the work presented in Sect. 2.4. We would like finally to thank Violetta Cavalli-Sforza (Al Akhawayn University in Ifrane, Morocco) for having reviewed the linguistic level of the entire document.Abouenour, L.; Bouzoubaa, K.; Rosso, P. (2013). On the evaluation and improvement of arabic wordnet coverage and usability. Language Resources and Evaluation. 47(3):891-917. https://doi.org/10.1007/s10579-013-9237-0S891917473AbbĂšs, R., Dichy, J., & Hassoun, M. (2004). The architecture of a standard Arabic lexical database: Some figures, ratios and categories from the DIINAR.1 source program. In Workshop on computational approaches to Arabic script-based languages, Coling 2004. Geneva, Switzerland.Abouenour, L., Bouzoubaa, K., & Rosso, P. (2009a). Structure-based evaluation of an Arabic semantic query expansion using the JIRS passage retrieval system. In Proceedings of the workshop on computational approaches to Semitic languages, E-ACL-2009, Athens, Greece, March.Abouenour, L., Bouzoubaa, K., & Rosso, P. (2009b). Three-level approach for passage retrieval in Arabic question/answering systems. In Proceedings of the 3rd international conference on Arabic language processing CITALA’09, Rabat, Morocco, May, 2009.Abouenour, L., Bouzoubaa, K., & Rosso, P. (2010a). An evaluated semantic query expansion and structure-based approach for enhancing Arabic question/answering. Special Issue in the International Journal on Information and Communication Technologies/IEEE. June.Abouenour, L., Bouzoubaa, K., & Rosso, P. (2010b). Using the YAGO ontology as a resource for the enrichment of named entities in Arabic WordNet. In Workshop LR & HLT for semitic languages, LREC’10. Malta. May, 2010.Ahonen-Myka, H. (2002). Discovery of frequent word sequences in text. In Proceedings of the ESF exploratory workshop on pattern detection and discovery (pp. 180–189). London, UK: Springer.Al Khalifa, M., & RodrĂ­guez, H. (2009). Automatically extending NE coverage of Arabic WordNet using Wikipedia. In Proceedings of the 3rd international conference on Arabic language processing CITALA’09, May, Rabat, Morocco.Alotaiby, F., Alkharashi, I., & Foda, S. (2009). Processing large Arabic text corpora: Preliminary analysis and results. In Proceedings of the second international conference on Arabic language resources and tools (pp. 78–82), Cairo, Egypt.Baker, C. F., Fillmore, C. J., & Cronin, B. (2003). The structure of the FrameNet database. International Journal of Lexicography, 16(3), 281–296.Baldwin, T., Pool, P., & Colowick, S. M. (2010). PanLex and LEXTRACT: Translating all words of all languages of the world. In Proceedings of Coling 2010, demonstration volume (pp. 37–40), Beijing.Benajiba, Y., Diab, M., & Rosso, P. (2009). Using language independent and language specific features to enhance Arabic named entity recognition. In IEEE transactions on audio, speech and language processing. Special Issue on Processing Morphologically Rich Languages, 17(5), 2009.Benajiba, Y., Rosso, P., & Lyhyaoui, A. (2007). Implementation of the ArabiQA question answering system’s components. In Proceedings of workshop on Arabic natural language processing, 2nd Information Communication Technologies int. symposium, ICTIS-2007, April 3–5, Fez, Morocco.BenoĂźt, S., & Darja, F. (2008). Building a free French WordNet from multilingual resources. Workshop on Ontolex 2008, LREC’08, June, Marrakech, Morocco.Black, W., Elkateb, S., Rodriguez, H, Alkhalifa, M., Vossen, P., Pease, A., et al. (2006). Introducing the Arabic WordNet project. In Proceedings of the third international WordNet conference. Sojka, Choi: Fellbaum & Vossen (eds).Boudelaa, S., & Gaskell, M. G. (2002). A reexamination of the default system for Arabic plurals. Language and Cognitive Processes, 17, 321–343.Brini, W., Ellouze & M., Hadrich, B. L. (2009a). QASAL: Un systĂšme de question-rĂ©ponse dĂ©diĂ© pour les questions factuelles en langue Arabe. In 9th JournĂ©es Scientifiques des Jeunes Chercheurs en GĂ©nie Electrique et Informatique, Tunisia.Brini, W., Trigui, O., Ellouze, M., Mesfar, S., Hadrich, L., & Rosso, P. (2009b). Factoid and definitional Arabic question answering system. In Post-proceedings of NOOJ-2009, June 8–10, Tozeur, Tunisia.Buscaldi, D., Rosso, P., GĂłmez, J. M., & Sanchis, E. (2010). Answering questions with an n-gram based passage retrieval engine. Journal of Intelligent Information Systems, 34(2), 113–134.Costa, R. P., & Seco, N. (2008). Hyponymy extraction and Web search behavior analysis based on query reformulation. In Proceedings of the 11th Ibero-American conference on AI: advances in artificial intelligence (pp. 1–10).Denicia-carral, C., Montes-y-GĂ”mez, M., Villaseñor-pineda, L., & Hernandez, R. G. (2006). A text mining approach for definition question answering. In Proceedings of the 5th international conference on natural language processing, FinTal’2006, Turku, Finland.Diab, M. T. (2004). Feasibility of bootstrapping an Arabic Wordnet leveraging parallel corpora and an English Wordnet. In Proceedings of the Arabic language technologies and resources, NEMLAR, Cairo, Egypt.El Amine, M. A. (2009). Vers une interface pour l’enrichissement des requĂȘtes en arabe dans un systĂšme de recherche d’information. In Proceedings of the 2nd confĂ©rence internationale sur l’informatique et ses applications (CIIA’09), May 3–4, Saida, Algeria.Elghamry, K. (2008). Using the web in building a corpus-based hypernymy-hyponymy Lexicon with hierarchical structure for Arabic. In Proceedings of the 6th international conference on informatics and systems, INFOS 2008. Cairo, Egypt.Elkateb, S., Black, W., Vossen, P., Farwell, D., RodrĂ­guez, H., Pease, A., et al. (2006). Arabic WordNet and the challenges of Arabic. In Proceedings of Arabic NLP/MT conference, London, UK.Fellbaum, C. (Ed.). (1998). WordNet: An electronic lexical database. MA: MIT Press.GarcĂ­a-Blasco, S., Danger, R., & Rosso, P. (2010). Drug–drug interaction detection: A new approach based on maximal frequent sequences. Sociedad Española para el Procesamiento del Lenguaje Natural, SEPLN, 45, 263–266.GarcĂ­a-HernĂĄndez, R. A. (2007). Algoritmos para el descubrimiento de patrones secuenciales maximales. Ph.D. Thesis, INAOE. September, Mexico.GarcĂ­a-HernĂĄndez, R. A., MartĂ­nez Trinidad, J. F., & Carrasco-ochoa, J. A. (2010). Finding maximal sequential patterns in text document collections and single documents. Informatica, 34(1), 93–101.Goweder, A., & De Roeck, A. (2001). Assessment of a significant Arabic corpus. In Proceedings of the Arabic NLP workshop at ACL/EACL, (pp. 73–79), Toulouse, France.Graff, D. (2007). Arabic Gigaword (3rd ed.). Philadelphia, USA: Linguistic Data Consortium.Graff, D., Kong, J., Chen, K., & Maeda, K. (2007). English Gigaword (3rd ed.). Philadelphia, USA: Linguistic Data Consortium.Hammou, B., Abu-salem, H., Lytinen, S., & Evens, M. (2002). QARAB: A question answering system to support the Arabic language. In Proceedings of the workshop on computational approaches to Semitic languages, ACL, (pp. 55–65), Philadelphia.Hearst, M. A. (1992). Automatic acquisition of hyponyms from large text corpora. In Proceedings of the 14th conference on Computational linguistics, COLING ‘92 (vol. 2, pp. 539–545).Kanaan, G., Hammouri, A., Al-Shalabi, R., & Swalha, M. (2009). A new question answering system for the Arabic language. American Journal of Applied Sciences, 6(4), 797–805.Kim, H., Chen, S., & Veale, T. (2006). Analogical reasoning with a synergy of HowNet and WordNet. In Proceedings of GWC’2006, the 3rd global WordNet conference, January, Cheju, Korea.Kipper-Schuler, K. (2006). VerbNet: A broad-coverage, comprehensive verb lexicon. Ph.D. Thesis.Mohammed, F. A., Nasser, K., & Harb, H. M. (1993). A knowledge-based Arabic question answering system (AQAS). In ACM SIGART bulletin (pp. 21–33).Niles, I., & Pease, A. (2001). Towards a standard upper ontology. In Proceedings of FOIS-2 (pp. 2–9), Ogunquit, Maine.Niles, I., & Pease, A. (2003). Linking lexicons and ontologies: Mapping WordNet to the suggested upper merged ontology. In Proceedings of the 2003 international conference on information and knowledge engineering, Las Vegas, Nevada.Ortega-Mendoza, R. M., Villaseñor-pineda, L., & Montes-y-GĂ”mez, M. (2007). Using lexical patterns to extract hyponyms from the Web. In Proceedings of the Mexican international conference on artificial intelligence MICAI 2007. November, Aguascalientes, Mexico. Lecture Notes in Artificial Intelligence 4827. Berlin: Springer.Palmer, M., P. Kingsbury, & D. Gildea. (2005). The proposition bank: An annotated corpus of semantic roles. Computational Linguistics, 21. USA: MIT Press.Pantel, P., & Pennacchiotti, M. (2006). Espresso: Leveraging generic patterns for automatically harvesting semantic relations. In Proceedings of conference on computational linguistics association for computational linguistics, (pp. 113–120), Sydney, Australia.Rodriguez, H., Farwell, D., Farreres, J., Bertran, M., Alkhalifa, M., & MartĂ­, A. (2008a). Arabic WordNet: Semi-automatic extensions using Bayesian Inference. In Proceedings of the the 6th conference on language resources and evaluation LREC2008, May, Marrakech, Morocco.Rodriguez, H., Farwell, D., Farreres, J., Bertran, M., Alkhalifa, M., Mart., M., et al. (2008b). Arabic WordNet: Current state and future extensions. In Proceedings of the fourth global WordNet conference, January 22–25, Szeged, Hungary.Sharaf, A. M. (2009). The Qur’an annotation for text mining. First year transfer report. School of Computing, Leeds University. December.Snow, R., Jurafsky, D., & Andrew, Y. N. (2005). Learning syntactic patterns for automatic hypernym discovery. In Lawrence K. Saul et al. (Eds.), Advances in neural information processing systems, 17. Cambridge, MA: MIT Press.Suchanek, F. M., Kasneci, G., & Weikum, G. (2007). YAGO: A core of semantic knowledge unifying WordNet and Wikipedia. In Proceedings of 16th international World Wide Web conference WWW’2007, (pp. 697–706), May, Banff, Alberta, Canada: ACM Press.Tjong Kim Sang, E., & Hofmann, K. (2007). Automatic extraction of Dutch hypernym–hyponym pairs. In Proceedings of CLIN-2006, Leuven, Belgium.Toral, A., Munoz, R., & Monachini, M. (2008). Named entity WordNet. In Proceedings of the Sixth international conference on language resources and evaluation (LREC’08), Marrakech, Morocco.Vossen, P. (Ed.). (1998). EuroWordNet, a multilingual database with lexical semantic networks. The Netherlands: Kluwer.Wagner, A. (2005). Learning thematic role relations for lexical semantic nets. Ph.D. Thesis, University of TĂŒbingen, 2005

    Sharing Semantic Resources

    Get PDF
    The Semantic Web is an extension of the current Web in which information, so far created for human consumption, becomes machine readable, “enabling computers and people to work in cooperation”. To turn into reality this vision several challenges are still open among which the most important is to share meaning formally represented with ontologies or more generally with semantic resources. This Semantic Web long-term goal has many convergences with the activities in the field of Human Language Technology and in particular in the development of Natural Language Processing applications where there is a great need of multilingual lexical resources. For instance, one of the most important lexical resources, WordNet, is also commonly regarded and used as an ontology. Nowadays, another important phenomenon is represented by the explosion of social collaboration, and Wikipedia, the largest encyclopedia in the world, is object of research as an up to date omni comprehensive semantic resource. The main topic of this thesis is the management and exploitation of semantic resources in a collaborative way, trying to use the already available resources as Wikipedia and Wordnet. This work presents a general environment able to turn into reality the vision of shared and distributed semantic resources and describes a distributed three-layer architecture to enable a rapid prototyping of cooperative applications for developing semantic resources

    Assessing Lexical-Semantic Regularities in Portuguese Word Embeddings

    Get PDF
    Models of word embeddings are often assessed when solving syntactic and semantic analogies. Among the latter, we are interested in relations that one would find in lexical-semantic knowledge bases like WordNet, also covered by some analogy test sets for English. Briefly, this paper aims to study how well pretrained Portuguese word embeddings capture such relations. For this purpose, we created a new test, dubbed TALES, with an exclusive focus on Portuguese lexical-semantic relations, acquired from lexical resources. With TALES, we analyse the performance of methods previously used for solving analogies, on different models of Portuguese word embeddings. Accuracies were clearly below the state of the art in analogies of other kinds, which shows that TALES is a challenging test, mainly due to the nature of lexical-semantic relations, i.e., there are many instances sharing the same argument, thus allowing for several correct answers, sometimes too many to be all included in the dataset. We further inspect the results of the best performing combination of method and model to find that some acceptable answers had been considered incorrect. This was mainly due to the lack of coverage by the source lexical resources and suggests that word embeddings may be a useful source of information for enriching those resources, something we also discuss
    • 

    corecore