147,148 research outputs found
Automatic Dish Name Extraction from User-generated Content Using LLM
Extraction of dish names from user-provided content such as food photographs and captions, restaurant reviews, and other free-form text is a challenging task. Rule-based approaches are difficult to maintain and improve. Pattern matching against a predefined dictionary often suffers from low recall. Conventional machine learning models require large amounts of labeled data to perform named entity recognition (e.g., to recognize dish names) which is often costly and does not scale well across multiple languages and countries. This disclosure describes the use of a multimodal large language model to automatically extract dish names from user-generated content such as food photographs and associated free-form text such as tags, captions, etc. Dish name extraction from the user-provided tags can be formulated as an open vocabulary dish name entity recognition and discovery task, which fits naturally with the framework of pre-trained LLMs, and leverages the model capability in handling multilingual, multicultural text understanding
Using Dual-Language Books to Preserve Language & Culture in Alaska Native Communities
“Children learn their language on their mother’s lap.” This conventional wisdom from a Cup’ik Elder describes the approach used by many Alaska Native peoples to promote native language acquisition. Presumably, the children learn by listening to stories and tales from a trusted parent or caregiver. However, what happens when the caregiver does not speak the native language? This chapter describes an effort to address this issue while also promoting better educational outcomes by providing access to diverse dual-language books in Alaska Native languages through the use of a digital children’s library. Potential benefits from these efforts include an increase in resources for schools, a revitalization of Indigenous languages, and an increase in access, with hopes that future work will show evidence that using these dual-language books encourage greater parent support and involvement in education, support second language acquisition, and promote a strong sense of identity. Implications and future efforts follow.Ye
KnowNER: Incremental Multilingual Knowledge in Named Entity Recognition
KnowNER is a multilingual Named Entity Recognition (NER) system that
leverages different degrees of external knowledge. A novel modular framework
divides the knowledge into four categories according to the depth of knowledge
they convey. Each category consists of a set of features automatically
generated from different information sources (such as a knowledge-base, a list
of names or document-specific semantic annotations) and is used to train a
conditional random field (CRF). Since those information sources are usually
multilingual, KnowNER can be easily trained for a wide range of languages. In
this paper, we show that the incorporation of deeper knowledge systematically
boosts accuracy and compare KnowNER with state-of-the-art NER approaches across
three languages (i.e., English, German and Spanish) performing amongst
state-of-the art systems in all of them
Recommended from our members
The lexical fallacy in emotion research: Mistaking vernacular words for psychological entities.
Vernacular lexemes appear self-evident, so we unwittingly reify them. But the words and phrases of natural languages comprise a treacherous basis for identifying valid psychological constructs, as I illustrate in emotion research. Like other vernacular lexemes, the emotion labels in natural languages do not have definite, stable, mutually transparent meanings, and any one vernacular word may be used to denote multiple scientifically distinct entities. In addition, the consequential choice of one lexeme to name a scientific construct rather than any of its partial synonyms is often arbitrary. Furthermore, a given vernacular lexeme from any one of the world's 7000 languages rarely maps one-to-one into an exactly corresponding vernacular lexeme in other languages. Words related to anger in different languages illustrate this. Since each language constitutes a distinct taxonomy of things in the world, most or all languages must fail to cut nature at its joints. In short, it is pernicious to use one language's dictionary as the source of psychological constructs. So scientists need to coin new technical names for scientifically derived constructs-names precisely defined in terms of the constellation of features or components that characterize the constructs they denote. The development of the kama muta construct illustrates one way to go about this. Kama muta is the emotion evoked by sudden intensification of communal sharing-universally experienced but not isomorphic with any vernacular lexeme such as heart warming, moving, touching, collective pride, tender, nostalgic, sentimental, Awww-so cute!. (PsycINFO Database Record (c) 2019 APA, all rights reserved)
- …