Search CORE

16,174 research outputs found

Knowledge-Augmented Language Model and its Application to Unsupervised Named-Entity Recognition

Author: Du Jingfei
Liu Angli
Stoyanov Veselin
Publication venue
Publication date: 01/01/2019
Field of study

Traditional language models are unable to efficiently model entity names observed in text. All but the most popular named entities appear infrequently in text providing insufficient context. Recent efforts have recognized that context can be generalized between entity names that share the same type (e.g., \emph{person} or \emph{location}) and have equipped language models with access to an external knowledge base (KB). Our Knowledge-Augmented Language Model (KALM) continues this line of work by augmenting a traditional model with a KB. Unlike previous methods, however, we train with an end-to-end predictive objective optimizing the perplexity of text. We do not require any additional information such as named entity tags. In addition to improving language modeling performance, KALM learns to recognize named entities in an entirely unsupervised way by using entity type information latent in the model. On a Named Entity Recognition (NER) task, KALM achieves performance comparable with state-of-the-art supervised models. Our work demonstrates that named entities (and possibly other types of world knowledge) can be modeled successfully using predictive learning and training on large corpora of text without any additional information.Comment: NAACL 2019; updated to cite Zhou et al. (2018) EMNLP as a piece of related wor

arXiv.org e-Print Archive

Crossref

An Annotated Corpus for Machine Reading of Instructions in Wet Lab Protocols

Author: Kulkarni Chaitanya
Machiraju Raghu
Ritter Alan
Xu Wei
Publication venue
Publication date: 01/01/2018
Field of study

We describe an effort to annotate a corpus of natural language instructions consisting of 622 wet lab protocols to facilitate automatic or semi-automatic conversion of protocols into a machine-readable format and benefit biological research. Experimental results demonstrate the utility of our corpus for developing machine learning approaches to shallow semantic parsing of instructional texts. We make our annotated Wet Lab Protocol Corpus available to the research community

arXiv.org e-Print Archive

Crossref

Scala Server Faces

Author: Rao Vikas Vittal
Publication venue: SJSU ScholarWorks
Publication date: 01/01/2010
Field of study

Progress in the Java language has been slow over the last few years. Scala is emerging as one of the probable successors for Java with features such as type inference, higher order functions, closure support and sequence comprehensions. This allows object-oriented yet concise code to be written using Scala. While Java based MVC frameworks are still prevalent, Scala based frameworks along with Ruby on Rails, Django and PHP are emerging as competitors. Scala has a web framework called Lift which has made an attempt to borrow the advantages of other frameworks while keeping code concise. Since Sun’s MVC framework, Java Server Faces 2.0 and its future versions seem to be heading in a reasonably progressive direction; I have developed a framework which attempts to overcome its limitations. I call such a framework ―Scala Server Faces‖. This framework provides a way of writing Java EE applications in Scala yet borrow from the concept of ―convention over configuration‖ followed by rival web frameworks. Again, an Eclipse tool is provided to make the programmer\u27s task of writing code on the popular Eclipse platform. Scala Server Faces, the framework and the tool allows the programmer to write enterprise web applications in Scala by providing features such as templating support, CRUD screen generation for database model objects, an Ant script to help deployment and integration with the Glassfish Application Server

SJSU ScholarWorks

Towards Automated Recipe Genre Classification using Semi-Supervised Learning

Author: Hasan Md. Kamrul
Kabir Md. Mohsinul
Mahmud Hasan
Sakib Nazmus
Shahariar G. M.
Publication venue
Publication date: 24/10/2023
Field of study

Sharing cooking recipes is a great way to exchange culinary ideas and provide instructions for food preparation. However, categorizing raw recipes found online into appropriate food genres can be challenging due to a lack of adequate labeled data. In this study, we present a dataset named the ``Assorted, Archetypal, and Annotated Two Million Extended (3A2M+) Cooking Recipe Dataset" that contains two million culinary recipes labeled in respective categories with extended named entities extracted from recipe descriptions. This collection of data includes various features such as title, NER, directions, and extended NER, as well as nine different labels representing genres including bakery, drinks, non-veg, vegetables, fast food, cereals, meals, sides, and fusions. The proposed pipeline named 3A2M+ extends the size of the Named Entity Recognition (NER) list to address missing named entities like heat, time or process from the recipe directions using two NER extraction tools. 3A2M+ dataset provides a comprehensive solution to the various challenging recipe-related tasks, including classification, named entity recognition, and recipe generation. Furthermore, we have demonstrated traditional machine learning, deep learning and pre-trained language models to classify the recipes into their corresponding genre and achieved an overall accuracy of 98.6\%. Our investigation indicates that the title feature played a more significant role in classifying the genre

arXiv.org e-Print Archive