1,220 research outputs found

    Improving Hypernymy Extraction with Distributional Semantic Classes

    Full text link
    In this paper, we show how distributionally-induced semantic classes can be helpful for extracting hypernyms. We present methods for inducing sense-aware semantic classes using distributional semantics and using these induced semantic classes for filtering noisy hypernymy relations. Denoising of hypernyms is performed by labeling each semantic class with its hypernyms. On the one hand, this allows us to filter out wrong extractions using the global structure of distributionally similar senses. On the other hand, we infer missing hypernyms via label propagation to cluster terms. We conduct a large-scale crowdsourcing study showing that processing of automatically extracted hypernyms using our approach improves the quality of the hypernymy extraction in terms of both precision and recall. Furthermore, we show the utility of our method in the domain taxonomy induction task, achieving the state-of-the-art results on a SemEval'16 task on taxonomy induction.Comment: In Proceedings of the 11th Conference on Language Resources and Evaluation (LREC 2018). Miyazaki, Japa

    Web 2.0, language resources and standards to automatically build a multilingual named entity lexicon

    Get PDF
    This paper proposes to advance in the current state-of-the-art of automatic Language Resource (LR) building by taking into consideration three elements: (i) the knowledge available in existing LRs, (ii) the vast amount of information available from the collaborative paradigm that has emerged from the Web 2.0 and (iii) the use of standards to improve interoperability. We present a case study in which a set of LRs for different languages (WordNet for English and Spanish and Parole-Simple-Clips for Italian) are extended with Named Entities (NE) by exploiting Wikipedia and the aforementioned LRs. The practical result is a multilingual NE lexicon connected to these LRs and to two ontologies: SUMO and SIMPLE. Furthermore, the paper addresses an important problem which affects the Computational Linguistics area in the present, interoperability, by making use of the ISO LMF standard to encode this lexicon. The different steps of the procedure (mapping, disambiguation, extraction, NE identification and postprocessing) are comprehensively explained and evaluated. The resulting resource contains 974,567, 137,583 and 125,806 NEs for English, Spanish and Italian respectively. Finally, in order to check the usefulness of the constructed resource, we apply it into a state-of-the-art Question Answering system and evaluate its impact; the NE lexicon improves the system’s accuracy by 28.1%. Compared to previous approaches to build NE repositories, the current proposal represents a step forward in terms of automation, language independence, amount of NEs acquired and richness of the information represented

    Taxonomy Induction using Hypernym Subsequences

    Get PDF
    We propose a novel, semi-supervised approach towards domain taxonomy induction from an input vocabulary of seed terms. Unlike all previous approaches, which typically extract direct hypernym edges for terms, our approach utilizes a novel probabilistic framework to extract hypernym subsequences. Taxonomy induction from extracted subsequences is cast as an instance of the minimumcost flow problem on a carefully designed directed graph. Through experiments, we demonstrate that our approach outperforms stateof- the-art taxonomy induction approaches across four languages. Importantly, we also show that our approach is robust to the presence of noise in the input vocabulary. To the best of our knowledge, no previous approaches have been empirically proven to manifest noise-robustness in the input vocabulary

    Unsupervised Sense-Aware Hypernymy Extraction

    Full text link
    In this paper, we show how unsupervised sense representations can be used to improve hypernymy extraction. We present a method for extracting disambiguated hypernymy relationships that propagates hypernyms to sets of synonyms (synsets), constructs embeddings for these sets, and establishes sense-aware relationships between matching synsets. Evaluation on two gold standard datasets for English and Russian shows that the method successfully recognizes hypernymy relationships that cannot be found with standard Hearst patterns and Wiktionary datasets for the respective languages.Comment: In Proceedings of the 14th Conference on Natural Language Processing (KONVENS 2018). Vienna, Austri

    On the evaluation and improvement of arabic wordnet coverage and usability

    Full text link
    The final publication is available at Springer via http://dx.doi.org/10.1007/s10579-013-9237-0[EN] Built on the basis of the methods developed for Princeton WordNet and EuroWordNet, Arabic WordNet (AWN) has been an interesting project which combines WordNet structure compliance with Arabic particularities. In this paper, some AWN shortcomings related to coverage and usability are addressed. The use of AWN in question/answering (Q/A) helped us to deeply evaluate the resource from an experience-based perspective. Accordingly, an enrichment of AWN was built by semi-automatically extending its content. Indeed, existing approaches and/or resources developed for other languages were adapted and used for AWN. The experiments conducted in Arabic Q/A have shown an improvement of both AWN coverage as well as usability. Concerning coverage, a great amount of named entities extracted from YAGO were connected with corresponding AWN synsets. Also, a significant number of new verbs and nouns (including Broken Plural forms) were added. In terms of usability, thanks to the use of AWN, the performance for the AWN-based Q/A application registered an overall improvement with respect to the following three measures: accuracy (+9.27 % improvement), mean reciprocal rank (+3.6 improvement) and number of answered questions (+12.79 % improvement).The work presented in Sect. 2.2 was done in the framework of the bilateral Spain-Morocco AECID-PCI C/026728/09 research project. The research of the two first authors is done in the framework of the PROGRAMME D'URGENCE project (grant no. 03/2010). The research of the third author is done in the framework of WIQEI IRSES project (grant no. 269180) within the FP 7 Marie Curie People, DIANA-APPLICATIONS-Finding Hidden Knowledge in Texts: Applications (TIN2012-38603-C02-01) research project and VLC/CAMPUS Microcluster on Multimodal Interaction in Intelligent Systems. We would like to thank Manuel Montes-y-Gomez (INAOE-Puebla, Mexico) and Sandra Garcia-Blasco (Bitsnbrain, Spain) for their feedback on the work presented in Sect. 2.4. We would like finally to thank Violetta Cavalli-Sforza (Al Akhawayn University in Ifrane, Morocco) for having reviewed the linguistic level of the entire document.Abouenour, L.; Bouzoubaa, K.; Rosso, P. (2013). On the evaluation and improvement of arabic wordnet coverage and usability. Language Resources and Evaluation. 47(3):891-917. https://doi.org/10.1007/s10579-013-9237-0S891917473Abbès, R., Dichy, J., & Hassoun, M. (2004). The architecture of a standard Arabic lexical database: Some figures, ratios and categories from the DIINAR.1 source program. In Workshop on computational approaches to Arabic script-based languages, Coling 2004. Geneva, Switzerland.Abouenour, L., Bouzoubaa, K., & Rosso, P. (2009a). Structure-based evaluation of an Arabic semantic query expansion using the JIRS passage retrieval system. In Proceedings of the workshop on computational approaches to Semitic languages, E-ACL-2009, Athens, Greece, March.Abouenour, L., Bouzoubaa, K., & Rosso, P. (2009b). Three-level approach for passage retrieval in Arabic question/answering systems. In Proceedings of the 3rd international conference on Arabic language processing CITALA’09, Rabat, Morocco, May, 2009.Abouenour, L., Bouzoubaa, K., & Rosso, P. (2010a). An evaluated semantic query expansion and structure-based approach for enhancing Arabic question/answering. Special Issue in the International Journal on Information and Communication Technologies/IEEE. June.Abouenour, L., Bouzoubaa, K., & Rosso, P. (2010b). Using the YAGO ontology as a resource for the enrichment of named entities in Arabic WordNet. In Workshop LR & HLT for semitic languages, LREC’10. Malta. May, 2010.Ahonen-Myka, H. (2002). Discovery of frequent word sequences in text. In Proceedings of the ESF exploratory workshop on pattern detection and discovery (pp. 180–189). London, UK: Springer.Al Khalifa, M., & Rodríguez, H. (2009). Automatically extending NE coverage of Arabic WordNet using Wikipedia. In Proceedings of the 3rd international conference on Arabic language processing CITALA’09, May, Rabat, Morocco.Alotaiby, F., Alkharashi, I., & Foda, S. (2009). Processing large Arabic text corpora: Preliminary analysis and results. In Proceedings of the second international conference on Arabic language resources and tools (pp. 78–82), Cairo, Egypt.Baker, C. F., Fillmore, C. J., & Cronin, B. (2003). The structure of the FrameNet database. International Journal of Lexicography, 16(3), 281–296.Baldwin, T., Pool, P., & Colowick, S. M. (2010). PanLex and LEXTRACT: Translating all words of all languages of the world. In Proceedings of Coling 2010, demonstration volume (pp. 37–40), Beijing.Benajiba, Y., Diab, M., & Rosso, P. (2009). Using language independent and language specific features to enhance Arabic named entity recognition. In IEEE transactions on audio, speech and language processing. Special Issue on Processing Morphologically Rich Languages, 17(5), 2009.Benajiba, Y., Rosso, P., & Lyhyaoui, A. (2007). Implementation of the ArabiQA question answering system’s components. In Proceedings of workshop on Arabic natural language processing, 2nd Information Communication Technologies int. symposium, ICTIS-2007, April 3–5, Fez, Morocco.Benoît, S., & Darja, F. (2008). Building a free French WordNet from multilingual resources. Workshop on Ontolex 2008, LREC’08, June, Marrakech, Morocco.Black, W., Elkateb, S., Rodriguez, H, Alkhalifa, M., Vossen, P., Pease, A., et al. (2006). Introducing the Arabic WordNet project. In Proceedings of the third international WordNet conference. Sojka, Choi: Fellbaum & Vossen (eds).Boudelaa, S., & Gaskell, M. G. (2002). A reexamination of the default system for Arabic plurals. Language and Cognitive Processes, 17, 321–343.Brini, W., Ellouze & M., Hadrich, B. L. (2009a). QASAL: Un système de question-réponse dédié pour les questions factuelles en langue Arabe. In 9th Journées Scientifiques des Jeunes Chercheurs en Génie Electrique et Informatique, Tunisia.Brini, W., Trigui, O., Ellouze, M., Mesfar, S., Hadrich, L., & Rosso, P. (2009b). Factoid and definitional Arabic question answering system. In Post-proceedings of NOOJ-2009, June 8–10, Tozeur, Tunisia.Buscaldi, D., Rosso, P., Gómez, J. M., & Sanchis, E. (2010). Answering questions with an n-gram based passage retrieval engine. Journal of Intelligent Information Systems, 34(2), 113–134.Costa, R. P., & Seco, N. (2008). Hyponymy extraction and Web search behavior analysis based on query reformulation. In Proceedings of the 11th Ibero-American conference on AI: advances in artificial intelligence (pp. 1–10).Denicia-carral, C., Montes-y-Gõmez, M., Villaseñor-pineda, L., & Hernandez, R. G. (2006). A text mining approach for definition question answering. In Proceedings of the 5th international conference on natural language processing, FinTal’2006, Turku, Finland.Diab, M. T. (2004). Feasibility of bootstrapping an Arabic Wordnet leveraging parallel corpora and an English Wordnet. In Proceedings of the Arabic language technologies and resources, NEMLAR, Cairo, Egypt.El Amine, M. A. (2009). Vers une interface pour l’enrichissement des requêtes en arabe dans un système de recherche d’information. In Proceedings of the 2nd conférence internationale sur l’informatique et ses applications (CIIA’09), May 3–4, Saida, Algeria.Elghamry, K. (2008). Using the web in building a corpus-based hypernymy-hyponymy Lexicon with hierarchical structure for Arabic. In Proceedings of the 6th international conference on informatics and systems, INFOS 2008. Cairo, Egypt.Elkateb, S., Black, W., Vossen, P., Farwell, D., Rodríguez, H., Pease, A., et al. (2006). Arabic WordNet and the challenges of Arabic. In Proceedings of Arabic NLP/MT conference, London, UK.Fellbaum, C. (Ed.). (1998). WordNet: An electronic lexical database. MA: MIT Press.García-Blasco, S., Danger, R., & Rosso, P. (2010). Drug–drug interaction detection: A new approach based on maximal frequent sequences. Sociedad Española para el Procesamiento del Lenguaje Natural, SEPLN, 45, 263–266.García-Hernández, R. A. (2007). Algoritmos para el descubrimiento de patrones secuenciales maximales. Ph.D. Thesis, INAOE. September, Mexico.García-Hernández, R. A., Martínez Trinidad, J. F., & Carrasco-ochoa, J. A. (2010). Finding maximal sequential patterns in text document collections and single documents. Informatica, 34(1), 93–101.Goweder, A., & De Roeck, A. (2001). Assessment of a significant Arabic corpus. In Proceedings of the Arabic NLP workshop at ACL/EACL, (pp. 73–79), Toulouse, France.Graff, D. (2007). Arabic Gigaword (3rd ed.). Philadelphia, USA: Linguistic Data Consortium.Graff, D., Kong, J., Chen, K., & Maeda, K. (2007). English Gigaword (3rd ed.). Philadelphia, USA: Linguistic Data Consortium.Hammou, B., Abu-salem, H., Lytinen, S., & Evens, M. (2002). QARAB: A question answering system to support the Arabic language. In Proceedings of the workshop on computational approaches to Semitic languages, ACL, (pp. 55–65), Philadelphia.Hearst, M. A. (1992). Automatic acquisition of hyponyms from large text corpora. In Proceedings of the 14th conference on Computational linguistics, COLING ‘92 (vol. 2, pp. 539–545).Kanaan, G., Hammouri, A., Al-Shalabi, R., & Swalha, M. (2009). A new question answering system for the Arabic language. American Journal of Applied Sciences, 6(4), 797–805.Kim, H., Chen, S., & Veale, T. (2006). Analogical reasoning with a synergy of HowNet and WordNet. In Proceedings of GWC’2006, the 3rd global WordNet conference, January, Cheju, Korea.Kipper-Schuler, K. (2006). VerbNet: A broad-coverage, comprehensive verb lexicon. Ph.D. Thesis.Mohammed, F. A., Nasser, K., & Harb, H. M. (1993). A knowledge-based Arabic question answering system (AQAS). In ACM SIGART bulletin (pp. 21–33).Niles, I., & Pease, A. (2001). Towards a standard upper ontology. In Proceedings of FOIS-2 (pp. 2–9), Ogunquit, Maine.Niles, I., & Pease, A. (2003). Linking lexicons and ontologies: Mapping WordNet to the suggested upper merged ontology. In Proceedings of the 2003 international conference on information and knowledge engineering, Las Vegas, Nevada.Ortega-Mendoza, R. M., Villaseñor-pineda, L., & Montes-y-Gõmez, M. (2007). Using lexical patterns to extract hyponyms from the Web. In Proceedings of the Mexican international conference on artificial intelligence MICAI 2007. November, Aguascalientes, Mexico. Lecture Notes in Artificial Intelligence 4827. Berlin: Springer.Palmer, M., P. Kingsbury, & D. Gildea. (2005). The proposition bank: An annotated corpus of semantic roles. Computational Linguistics, 21. USA: MIT Press.Pantel, P., & Pennacchiotti, M. (2006). Espresso: Leveraging generic patterns for automatically harvesting semantic relations. In Proceedings of conference on computational linguistics association for computational linguistics, (pp. 113–120), Sydney, Australia.Rodriguez, H., Farwell, D., Farreres, J., Bertran, M., Alkhalifa, M., & Martí, A. (2008a). Arabic WordNet: Semi-automatic extensions using Bayesian Inference. In Proceedings of the the 6th conference on language resources and evaluation LREC2008, May, Marrakech, Morocco.Rodriguez, H., Farwell, D., Farreres, J., Bertran, M., Alkhalifa, M., Mart., M., et al. (2008b). Arabic WordNet: Current state and future extensions. In Proceedings of the fourth global WordNet conference, January 22–25, Szeged, Hungary.Sharaf, A. M. (2009). The Qur’an annotation for text mining. First year transfer report. School of Computing, Leeds University. December.Snow, R., Jurafsky, D., & Andrew, Y. N. (2005). Learning syntactic patterns for automatic hypernym discovery. In Lawrence K. Saul et al. (Eds.), Advances in neural information processing systems, 17. Cambridge, MA: MIT Press.Suchanek, F. M., Kasneci, G., & Weikum, G. (2007). YAGO: A core of semantic knowledge unifying WordNet and Wikipedia. In Proceedings of 16th international World Wide Web conference WWW’2007, (pp. 697–706), May, Banff, Alberta, Canada: ACM Press.Tjong Kim Sang, E., & Hofmann, K. (2007). Automatic extraction of Dutch hypernym–hyponym pairs. In Proceedings of CLIN-2006, Leuven, Belgium.Toral, A., Munoz, R., & Monachini, M. (2008). Named entity WordNet. In Proceedings of the Sixth international conference on language resources and evaluation (LREC’08), Marrakech, Morocco.Vossen, P. (Ed.). (1998). EuroWordNet, a multilingual database with lexical semantic networks. The Netherlands: Kluwer.Wagner, A. (2005). Learning thematic role relations for lexical semantic nets. Ph.D. Thesis, University of Tübingen, 2005

    Comparing knowledge sources for nominal anaphora resolution

    Get PDF
    We compare two ways of obtaining lexical knowledge for antecedent selection in other-anaphora and definite noun phrase coreference. Specifically, we compare an algorithm that relies on links encoded in the manually created lexical hierarchy WordNet and an algorithm that mines corpora by means of shallow lexico-semantic patterns. As corpora we use the British National Corpus (BNC), as well as the Web, which has not been previously used for this task. Our results show that (a) the knowledge encoded in WordNet is often insufficient, especially for anaphor-antecedent relations that exploit subjective or context-dependent knowledge; (b) for other-anaphora, the Web-based method outperforms the WordNet-based method; (c) for definite NP coreference, the Web-based method yields results comparable to those obtained using WordNet over the whole dataset and outperforms the WordNet-based method on subsets of the dataset; (d) in both case studies, the BNC-based method is worse than the other methods because of data sparseness. Thus, in our studies, the Web-based method alleviated the lexical knowledge gap often encountered in anaphora resolution, and handled examples with context-dependent relations between anaphor and antecedent. Because it is inexpensive and needs no hand-modelling of lexical knowledge, it is a promising knowledge source to integrate in anaphora resolution systems
    corecore