16,174 research outputs found
Knowledge-Augmented Language Model and its Application to Unsupervised Named-Entity Recognition
Traditional language models are unable to efficiently model entity names
observed in text. All but the most popular named entities appear infrequently
in text providing insufficient context. Recent efforts have recognized that
context can be generalized between entity names that share the same type (e.g.,
\emph{person} or \emph{location}) and have equipped language models with access
to an external knowledge base (KB). Our Knowledge-Augmented Language Model
(KALM) continues this line of work by augmenting a traditional model with a KB.
Unlike previous methods, however, we train with an end-to-end predictive
objective optimizing the perplexity of text. We do not require any additional
information such as named entity tags. In addition to improving language
modeling performance, KALM learns to recognize named entities in an entirely
unsupervised way by using entity type information latent in the model. On a
Named Entity Recognition (NER) task, KALM achieves performance comparable with
state-of-the-art supervised models. Our work demonstrates that named entities
(and possibly other types of world knowledge) can be modeled successfully using
predictive learning and training on large corpora of text without any
additional information.Comment: NAACL 2019; updated to cite Zhou et al. (2018) EMNLP as a piece of
related wor
An Annotated Corpus for Machine Reading of Instructions in Wet Lab Protocols
We describe an effort to annotate a corpus of natural language instructions
consisting of 622 wet lab protocols to facilitate automatic or semi-automatic
conversion of protocols into a machine-readable format and benefit biological
research. Experimental results demonstrate the utility of our corpus for
developing machine learning approaches to shallow semantic parsing of
instructional texts. We make our annotated Wet Lab Protocol Corpus available to
the research community
Scala Server Faces
Progress in the Java language has been slow over the last few years. Scala is emerging as one of the probable successors for Java with features such as type inference, higher order functions, closure support and sequence comprehensions. This allows object-oriented yet concise code to be written using Scala. While Java based MVC frameworks are still prevalent, Scala based frameworks along with Ruby on Rails, Django and PHP are emerging as competitors. Scala has a web framework called Lift which has made an attempt to borrow the advantages of other frameworks while keeping code concise. Since Sun’s MVC framework, Java Server Faces 2.0 and its future versions seem to be heading in a reasonably progressive direction; I have developed a framework which attempts to overcome its limitations. I call such a framework ―Scala Server Faces‖. This framework provides a way of writing Java EE applications in Scala yet borrow from the concept of ―convention over configuration‖ followed by rival web frameworks. Again, an Eclipse tool is provided to make the programmer\u27s task of writing code on the popular Eclipse platform. Scala Server Faces, the framework and the tool allows the programmer to write enterprise web applications in Scala by providing features such as templating support, CRUD screen generation for database model objects, an Ant script to help deployment and integration with the Glassfish Application Server
Towards Automated Recipe Genre Classification using Semi-Supervised Learning
Sharing cooking recipes is a great way to exchange culinary ideas and provide
instructions for food preparation. However, categorizing raw recipes found
online into appropriate food genres can be challenging due to a lack of
adequate labeled data. In this study, we present a dataset named the
``Assorted, Archetypal, and Annotated Two Million Extended (3A2M+) Cooking
Recipe Dataset" that contains two million culinary recipes labeled in
respective categories with extended named entities extracted from recipe
descriptions. This collection of data includes various features such as title,
NER, directions, and extended NER, as well as nine different labels
representing genres including bakery, drinks, non-veg, vegetables, fast food,
cereals, meals, sides, and fusions. The proposed pipeline named 3A2M+ extends
the size of the Named Entity Recognition (NER) list to address missing named
entities like heat, time or process from the recipe directions using two NER
extraction tools. 3A2M+ dataset provides a comprehensive solution to the
various challenging recipe-related tasks, including classification, named
entity recognition, and recipe generation. Furthermore, we have demonstrated
traditional machine learning, deep learning and pre-trained language models to
classify the recipes into their corresponding genre and achieved an overall
accuracy of 98.6\%. Our investigation indicates that the title feature played a
more significant role in classifying the genre
- …