2,930 research outputs found

    A large annotated corpus for learning natural language inference

    Full text link
    Understanding entailment and contradiction is fundamental to understanding natural language, and inference about entailment and contradiction is a valuable testing ground for the development of semantic representations. However, machine learning research in this area has been dramatically limited by the lack of large-scale resources. To address this, we introduce the Stanford Natural Language Inference corpus, a new, freely available collection of labeled sentence pairs, written by humans doing a novel grounded task based on image captioning. At 570K pairs, it is two orders of magnitude larger than all other resources of its type. This increase in scale allows lexicalized classifiers to outperform some sophisticated existing entailment models, and it allows a neural network-based model to perform competitively on natural language inference benchmarks for the first time.Comment: To appear at EMNLP 2015. The data will be posted shortly before the conference (the week of 14 Sep) at http://nlp.stanford.edu/projects/snli

    The price of inscrutability

    Get PDF
    In our reasoning we depend on the stability of language, the fact that its signs do not arbitrarily change in meaning from moment to moment.(Campbell, 1994, p.82) Some philosophers offer arguments contending that ordinary names such as “London” are radically indeterminate in reference. The conclusion of such arguments is that there is no fact of the matter whether “London” refers to a city in the south of England, or whether instead it refers to Sydney, Australia. Some philosophers have even suggested that we accept the conclusion of these arguments. Such a position seems crazy to many; but what exactly goes wrong if one adopts such a view? This paper evaluates the theoretical costs incurred by one who endorses extreme inscrutability of reference (the ‘inscrutabilist’). I show that there is one particular implication of extreme inscrutability which pushes the price of inscrutabilism too high. An extension of the classic ‘permutation’ arguments for extreme inscrutability allow us to establish what I dub ‘extreme indexical inscrutability’. This result, I argue, unacceptably undermines the epistemology of inference. The first half of the paper develops the background of permutation arguments for extreme inscrutability of reference and evaluates some initial attempts to make trouble for the inscrutabilist. Sections 1 and 2 describe the setting of the original permutation arguments for extreme inscrutability. Sections 3 and 4 survey four potential objections to extreme inscrutability of reference, including some recently raised in Vann McGee’s excellent (2005a). Sections 5 sketches how the permutation arguments can be generalized to establish extreme indexical inscrutability; and shows how this contradicts a ‘stability principle’—that our words do not arbitrarily change their reference from one moment to the next—which I claim plays a vital role in the epistemology of inference. The second half of the paper develops in detail the case for thinking that language is stable in the relevant sense. In section 6, I use this distinction to call into question the epistemological relevance of validity of argument types; Kaplan’s treatment of indexical validity partially resolves this worry, but there is a residual problem. In section 7, I argue that stability is exactly what is needed to bridge this final gap, and so secure the relevance of validity to good inferential practice. Section 8 responds to objections to this claim. An appendix to the paper provides formal backing for the results cited in this paper, including a generalization of permutation arguments to the kind of rich setting required for a realistic semantics of natural language.1 Extreme indexical inscrutability results can be proved within this setting. The first half of the paper shows that the inscrutabilist is committed to extreme indexical inscrutability, which implies that language not determinately ‘stable’. The second half of the paper argues that good inference requires stability. The price of inscrutabilism, therefore, is to sever the connection between the validity of argument-forms and inferential practice: and this is too high a price to pay

    A Survey of Paraphrasing and Textual Entailment Methods

    Full text link
    Paraphrasing methods recognize, generate, or extract phrases, sentences, or longer natural language expressions that convey almost the same information. Textual entailment methods, on the other hand, recognize, generate, or extract pairs of natural language expressions, such that a human who reads (and trusts) the first element of a pair would most likely infer that the other element is also true. Paraphrasing can be seen as bidirectional textual entailment and methods from the two areas are often similar. Both kinds of methods are useful, at least in principle, in a wide range of natural language processing applications, including question answering, summarization, text generation, and machine translation. We summarize key ideas from the two areas by considering in turn recognition, generation, and extraction methods, also pointing to prominent articles and resources.Comment: Technical Report, Natural Language Processing Group, Department of Informatics, Athens University of Economics and Business, Greece, 201

    What is Conceptual Engineering and What Should it Be?

    Get PDF
    Conceptual engineering is the design, implementation, and evaluation of concepts. Conceptual engineering includes or should include de novo conceptual engineering (designing a new concept) as well as conceptual re-engineering (fixing an old concept). It should also include heteronymous (different-word) as well as homonymous (same-word) conceptual engineering. I discuss the importance and the difficulty of these sorts of conceptual engineering in philosophy and elsewhere

    A Generic architecture for semantic enhanced tagging systems

    Get PDF
    The Social Web, or Web 2.0, has recently gained popularity because of its low cost and ease of use. Social tagging sites (e.g. Flickr and YouTube) offer new principles for end-users to publish and classify their content (data). Tagging systems contain free-keywords (tags) generated by end-users to annotate and categorise data. Lack of semantics is the main drawback in social tagging due to the use of unstructured vocabulary. Therefore, tagging systems suffer from shortcomings such as low precision, lack of collocation, synonymy, multilinguality, and use of shorthands. Consequently, relevant contents are not visible, and thus not retrievable while searching in tag-based systems. On the other hand, the Semantic Web, so-called Web 3.0, provides a rich semantic infrastructure. Ontologies are the key enabling technology for the Semantic Web. Ontologies can be integrated with the Social Web to overcome the lack of semantics in tagging systems. In the work presented in this thesis, we build an architecture to address a number of tagging systems drawbacks. In particular, we make use of the controlled vocabularies presented by ontologies to improve the information retrieval in tag-based systems. Based on the tags provided by the end-users, we introduce the idea of adding “system tags” from semantic, as well as social, resources. The “system tags” are comprehensive and wide-ranging in comparison with the limited “user tags”. The system tags are used to fill the gap between the user tags and the search terms used for searching in the tag-based systems. We restricted the scope of our work to tackle the following tagging systems shortcomings: - The lack of semantic relations between user tags and search terms (e.g. synonymy, hypernymy), - The lack of translation mediums between user tags and search terms (multilinguality), - The lack of context to define the emergent shorthand writing user tags. To address the first shortcoming, we use the WordNet ontology as a semantic lingual resource from where system tags are extracted. For the second shortcoming, we use the MultiWordNet ontology to recognise the cross-languages linkages between different languages. Finally, to address the third shortcoming, we use tag clusters that are obtained from the Social Web to create a context for defining the meaning of shorthand writing tags. A prototype for our architecture was implemented. In the prototype system, we built our own database to host videos that we imported from real tag-based system (YouTube). The user tags associated with these videos were also imported and stored in the database. For each user tag, our algorithm adds a number of system tags that came from either semantic ontologies (WordNet or MultiWordNet), or from tag clusters that are imported from the Flickr website. Therefore, each system tag added to annotate the imported videos has a relationship with one of the user tags on that video. The relationship might be one of the following: synonymy, hypernymy, similar term, related term, translation, or clustering relation. To evaluate the suitability of our proposed system tags, we developed an online environment where participants submit search terms and retrieve two groups of videos to be evaluated. Each group is produced from one distinct type of tags; user tags or system tags. The videos in the two groups are produced from the same database and are evaluated by the same participants in order to have a consistent and reliable evaluation. Since the user tags are used nowadays for searching the real tag-based systems, we consider its efficiency as a criterion (reference) to which we compare the efficiency of the new system tags. In order to compare the relevancy between the search terms and each group of retrieved videos, we carried out a statistical approach. According to Wilcoxon Signed-Rank test, there was no significant difference between using either system tags or user tags. The findings revealed that the use of the system tags in the search is as efficient as the use of the user tags; both types of tags produce different results, but at the same level of relevance to the submitted search terms

    The Use of New Technologies for Improving Reading Comprehension

    Get PDF
    Since the introduction of writing systems, reading comprehension has always been a foundation for achievement in several areas within the educational system, as well as a prerequisite for successful participation in most areas of adult life. The increased availability of technologies and web-based resources can be a really valid support, both in the educational and clinical field, to devise training activities that can also be carried out remotely. There are studies in current literature that has examined the efficacy of internet-based programs for reading comprehension for children with reading comprehension difficulties but almost none considered distance rehabilitation programs. The present paper reports data concerning a distance program Cloze, developed in Italy, for improving language and reading comprehension. Twenty-eight children from 3rd to 6th grade with comprehension difficulties were involved. These children completed the distance program for 15\u201320 min for at least three times a week for about 4 months. The program was presented separately to each child, with a degree of difficulty adapted to his/her characteristics. Text reading comprehension (assessed distinguishing between narrative and informative texts) increased after intervention. These findings have clinical and educational implications as they suggest that it is possible to promote reading comprehension with a distance individualized program, avoiding the need for the child displacements, necessary for reaching a rehabilitation center

    A Generic architecture for semantic enhanced tagging systems

    Get PDF
    The Social Web, or Web 2.0, has recently gained popularity because of its low cost and ease of use. Social tagging sites (e.g. Flickr and YouTube) offer new principles for end-users to publish and classify their content (data). Tagging systems contain free-keywords (tags) generated by end-users to annotate and categorise data. Lack of semantics is the main drawback in social tagging due to the use of unstructured vocabulary. Therefore, tagging systems suffer from shortcomings such as low precision, lack of collocation, synonymy, multilinguality, and use of shorthands. Consequently, relevant contents are not visible, and thus not retrievable while searching in tag-based systems. On the other hand, the Semantic Web, so-called Web 3.0, provides a rich semantic infrastructure. Ontologies are the key enabling technology for the Semantic Web. Ontologies can be integrated with the Social Web to overcome the lack of semantics in tagging systems. In the work presented in this thesis, we build an architecture to address a number of tagging systems drawbacks. In particular, we make use of the controlled vocabularies presented by ontologies to improve the information retrieval in tag-based systems. Based on the tags provided by the end-users, we introduce the idea of adding “system tags” from semantic, as well as social, resources. The “system tags” are comprehensive and wide-ranging in comparison with the limited “user tags”. The system tags are used to fill the gap between the user tags and the search terms used for searching in the tag-based systems. We restricted the scope of our work to tackle the following tagging systems shortcomings: - The lack of semantic relations between user tags and search terms (e.g. synonymy, hypernymy), - The lack of translation mediums between user tags and search terms (multilinguality), - The lack of context to define the emergent shorthand writing user tags. To address the first shortcoming, we use the WordNet ontology as a semantic lingual resource from where system tags are extracted. For the second shortcoming, we use the MultiWordNet ontology to recognise the cross-languages linkages between different languages. Finally, to address the third shortcoming, we use tag clusters that are obtained from the Social Web to create a context for defining the meaning of shorthand writing tags. A prototype for our architecture was implemented. In the prototype system, we built our own database to host videos that we imported from real tag-based system (YouTube). The user tags associated with these videos were also imported and stored in the database. For each user tag, our algorithm adds a number of system tags that came from either semantic ontologies (WordNet or MultiWordNet), or from tag clusters that are imported from the Flickr website. Therefore, each system tag added to annotate the imported videos has a relationship with one of the user tags on that video. The relationship might be one of the following: synonymy, hypernymy, similar term, related term, translation, or clustering relation. To evaluate the suitability of our proposed system tags, we developed an online environment where participants submit search terms and retrieve two groups of videos to be evaluated. Each group is produced from one distinct type of tags; user tags or system tags. The videos in the two groups are produced from the same database and are evaluated by the same participants in order to have a consistent and reliable evaluation. Since the user tags are used nowadays for searching the real tag-based systems, we consider its efficiency as a criterion (reference) to which we compare the efficiency of the new system tags. In order to compare the relevancy between the search terms and each group of retrieved videos, we carried out a statistical approach. According to Wilcoxon Signed-Rank test, there was no significant difference between using either system tags or user tags. The findings revealed that the use of the system tags in the search is as efficient as the use of the user tags; both types of tags produce different results, but at the same level of relevance to the submitted search terms
    • 

    corecore