6,056 research outputs found
A large annotated corpus for learning natural language inference
Understanding entailment and contradiction is fundamental to understanding
natural language, and inference about entailment and contradiction is a
valuable testing ground for the development of semantic representations.
However, machine learning research in this area has been dramatically limited
by the lack of large-scale resources. To address this, we introduce the
Stanford Natural Language Inference corpus, a new, freely available collection
of labeled sentence pairs, written by humans doing a novel grounded task based
on image captioning. At 570K pairs, it is two orders of magnitude larger than
all other resources of its type. This increase in scale allows lexicalized
classifiers to outperform some sophisticated existing entailment models, and it
allows a neural network-based model to perform competitively on natural
language inference benchmarks for the first time.Comment: To appear at EMNLP 2015. The data will be posted shortly before the
conference (the week of 14 Sep) at http://nlp.stanford.edu/projects/snli
Ask, and shall you receive?: Understanding Desire Fulfillment in Natural Language Text
The ability to comprehend wishes or desires and their fulfillment is
important to Natural Language Understanding. This paper introduces the task of
identifying if a desire expressed by a subject in a given short piece of text
was fulfilled. We propose various unstructured and structured models that
capture fulfillment cues such as the subject's emotional state and actions. Our
experiments with two different datasets demonstrate the importance of
understanding the narrative and discourse structure to address this task
The SNLI Corpus
The SNLI corpus (version 1.0) is a collection of 570k human-written English sentence pairs manually labeled for balanced classification with the labels entailment, contradiction, and neutral, supporting the task of natural language inference (NLI), also known as recognizing textual entailment (RTE). We aim for it to serve both as a benchmark for evaluating representational systems for text, especially including those induced by representation learning methods, as well as a resource for developing NLP models of any kind.We gratefully acknowledge support from a Google Faculty Research Award, a gift from Bloomberg L.P., the Defense Advanced Research Projects Agency (DARPA) Deep Exploration and Filtering of Text (DEFT) Program under Air Force Research Laboratory (AFRL) contract no. FA8750-13-2-0040, the National Science Foundation under grant no. IIS 1159679, and the Department of the Navy, Office of Naval Research, under grant no. N00014-10-1-0109. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of Google, Bloomberg L.P., DARPA, AFRL NSF, ONR, or the US government. We also thank our many excellent Mechanical Turk contributors
Towards a killer app for the Semantic Web
Killer apps are highly transformative technologies that create new markets and widespread patterns of behaviour. IT generally, and the Web in particular, has benefited from killer apps to create new networks of users and increase its value. The Semantic Web community on the other hand is still awaiting a killer app that proves the superiority of its technologies. There are certain features that distinguish killer apps from other ordinary applications. This paper examines those features in the context of the Semantic Web, in the hope that a better understanding of the characteristics of killer apps might encourage their consideration when developing Semantic Web applications
Visualization of database structures for information retrieval
This paper describes the Book House system, which is designed to support children's information retrieval in libraries as part of their education. It is a shareware program available on CDâROM or floppy disks, and comprises functionality for database searching as well as for classifying and storing book information in the database. The system concept is based on an understanding of children's domain structures and their capabilities for categorization of information needs in connection with their activities in schools, in school libraries or in public libraries. These structures are visualized in the interface by using metaphors and multimedia technology. Through the use of text, images and animation, the Book House encourages children â even at a very early age â to learn by doing in an enjoyable way, which plays on their previous experiences with computer games. Both words and pictures can be used for searching; this makes the system suitable for all age groups. Even children who have not yet learned to read properly can, by selecting pictures, search for and find those books they would like to have read aloud. Thus, at the very beginning of their school life, they can learn to search for books on their own. For the library community, such a system will provide an extended service which will increase the number of children's own searches and also improve the relevance, quality and utilization of the book collections in the libraries. A market research report on the need for an annual indexing service for books in the Book House format is in preparation by the Danish Library Centre A/S
Wittgenstein on line / on the line
wo independent publishing projects have thoroughly changed the state of Wittgenstein scholarship in recent years. Michael Nedo's 'Wiener Ausgabe'1 offers a traditional critical edition of Wittgenstein's philosophical writings ranging from 1929 up to and including the 'Big Typescript' (1933). Considering the eclectic and - at times - arbitrary editorial policy underlying previous publications from the Nachlass2 Nedo's project offers unprecedented philosophical rigor as well as textual criticism in volumes designed for comfortable reading. A second, more ambitious, attempt at a critical edition is the Bergen electronic edition.3 It is planned to include 4 CD-ROMs, covering the entire range of the philosopher's unpublished writing. Two disks are currently available, comprising all of Wittgenstein's manuscripts from 1929-1939, as well as type-scripts, beginning with 'Notes on Logic' (1913) and leading up to Typescript 226, composed in 1939.\ud
\ud
Wittgenstein's writings from the Thirties are, therefore, available in independent, reliable printed and electronic editions respectively. Readers can, for the first time, observe the philosopher at work, transferring paragraphs from pocket notebooks to handwritten 'volumes'; picking acceptable remarks to be included in type-scripts that are, at a later stage, cut up into slips of paper which are again annotated, rearranged and put together in further volumes and type-scripts. But this is only half the excitement. The 'Wiener Ausgabe' and the 'Bergen Edition' stake their success on different media, inevitably provoking a comparison between the well known features of printed scholarly editions and the not so familiar realm of digitized texts
A Data Science Course for Undergraduates: Thinking with Data
Data science is an emerging interdisciplinary field that combines elements of
mathematics, statistics, computer science, and knowledge in a particular
application domain for the purpose of extracting meaningful information from
the increasingly sophisticated array of data available in many settings. These
data tend to be non-traditional, in the sense that they are often live, large,
complex, and/or messy. A first course in statistics at the undergraduate level
typically introduces students with a variety of techniques to analyze small,
neat, and clean data sets. However, whether they pursue more formal training in
statistics or not, many of these students will end up working with data that is
considerably more complex, and will need facility with statistical computing
techniques. More importantly, these students require a framework for thinking
structurally about data. We describe an undergraduate course in a liberal arts
environment that provides students with the tools necessary to apply data
science. The course emphasizes modern, practical, and useful skills that cover
the full data analysis spectrum, from asking an interesting question to
acquiring, managing, manipulating, processing, querying, analyzing, and
visualizing data, as well communicating findings in written, graphical, and
oral forms.Comment: 21 pages total including supplementary material
- âŚ