1,200 research outputs found

    Named Entity Recognition in Turkish with Bayesian Learning and Hybrid Approaches

    Get PDF
    Named entity recognition is one of the significant textual information extraction tasks. In this paper, we present two approaches for named entity recognition on Turkish texts. The first is a Bayesian learning approach which is trained on a considerably limited training set. The second approach comprises two hybrid systems based on joint utilization of this Bayesian learning approach and a previously proposed rule-based named entity recognizer. All of the proposed three approaches achieve promising performance rates. This paper is significant as it reports the first use of the Bayesian approach for the task of named entity recognition on Turkish texts for which especially practical approaches are still insufficient

    Discovering story chains: A framework based on zigzagged search and news actors

    Get PDF
    A story chain is a set of related news articles that reveal how different events are connected. This study presents a framework for discovering story chains, given an input document, in a text collection. The framework has 3 complementary parts that i) scan the collection, ii) measure the similarity between chain-member candidates and the chain, and iii) measure similarity among news articles. For scanning, we apply a novel text-mining method that uses a zigzagged search that reinvestigates past documents based on the updated chain. We also utilize social networks of news actors to reveal connections among news articles. We conduct 2 user studies in terms of 4 effectiveness measures—relevance, coverage, coherence, and ability to disclose relations. The first user study compares several versions of the framework, by varying parameters, to set a guideline for use. The second compares the framework with 3 baselines. The results show that our method provides statistically significant improvement in effectiveness in 61% of pairwise comparisons, with medium or large effect size; in the remainder, none of the baselines significantly outperforms our method. © 2017 ASIS&T

    The Processing of Emotional Sentences by Young and Older Adults: A Visual World Eye-movement Study

    Get PDF
    Carminati MN, Knoeferle P. The Processing of Emotional Sentences by Young and Older Adults: A Visual World Eye-movement Study. Presented at the Architectures and Mechanisms of Language and Processing (AMLaP), Riva del Garda, Italy

    The Rise of iWar: Identity, Information, and the Individualization of Modern Warfare

    Get PDF
    During a decade of global counterterrorism operations and two extended counterinsurgency campaigns, the United States was confronted with a new kind of adversary. Without uniforms, flags, and formations, the task of identifying and targeting these combatants represented an unprecedented operational challenge for which Cold War era doctrinal methods were largely unsuited. This monograph examines the doctrinal, technical, and bureaucratic innovations that evolved in response to these new operational challenges. It discusses the transition from a conventionally focused, Cold War-era targeting process to one optimized for combating networks and conducting identity-based targeting. It analyzes the policy decisions and strategic choices that were the catalysts of this change and concludes with an in depth examination of emerging technologies that are likely to shape how this mode of warfare will be waged in the future.https://press.armywarcollege.edu/monographs/1436/thumbnail.jp

    Benchmarking Arabic AI with Large Language Models

    Full text link
    With large Foundation Models (FMs), language technologies (AI in general) are entering a new paradigm: eliminating the need for developing large-scale task-specific datasets and supporting a variety of tasks through set-ups ranging from zero-shot to few-shot learning. However, understanding FMs capabilities requires a systematic benchmarking effort by comparing FMs performance with the state-of-the-art (SOTA) task-specific models. With that goal, past work focused on the English language and included a few efforts with multiple languages. Our study contributes to ongoing research by evaluating FMs performance for standard Arabic NLP and Speech processing, including a range of tasks from sequence tagging to content classification across diverse domains. We start with zero-shot learning using GPT-3.5-turbo, Whisper, and USM, addressing 33 unique tasks using 59 publicly available datasets resulting in 96 test setups. For a few tasks, FMs performs on par or exceeds the performance of the SOTA models but for the majority it under-performs. Given the importance of prompt for the FMs performance, we discuss our prompt strategies in detail and elaborate on our findings. Our future work on Arabic AI will explore few-shot prompting, expand the range of tasks, and investigate additional open-source models.Comment: Foundation Models, Large Language Models, Arabic NLP, Arabic Speech, Arabic AI, , CHatGPT Evaluation, USM Evaluation, Whisper Evaluatio

    Cultural Heritage Storytelling, Engagement and Management in the Era of Big Data and the Semantic Web

    Get PDF
    The current Special Issue launched with the aim of further enlightening important CH areas, inviting researchers to submit original/featured multidisciplinary research works related to heritage crowdsourcing, documentation, management, authoring, storytelling, and dissemination. Audience engagement is considered very important at both sites of the CH production–consumption chain (i.e., push and pull ends). At the same time, sustainability factors are placed at the center of the envisioned analysis. A total of eleven (11) contributions were finally published within this Special Issue, enlightening various aspects of contemporary heritage strategies placed in today’s ubiquitous society. The finally published papers are related but not limited to the following multidisciplinary topics:Digital storytelling for cultural heritage;Audience engagement in cultural heritage;Sustainability impact indicators of cultural heritage;Cultural heritage digitization, organization, and management;Collaborative cultural heritage archiving, dissemination, and management;Cultural heritage communication and education for sustainable development;Semantic services of cultural heritage;Big data of cultural heritage;Smart systems for Historical cities – smart cities;Smart systems for cultural heritage sustainability

    Unifying cross-modal concepts in vision and language

    Get PDF
    Enabling computers to demonstrate a proficient understanding of the physical world is an exceedingly challenging task that necessitates the ability to perceive, through vision or other senses, and communicate through natural language. Key to this endeavor is the representation of concepts present in the world within and across different modalities (e.g., vision and language). To an extent, models can capture concepts implicitly through using large quantities of training data. However, the complementary inter-modal and intra-modal connections between concepts are often not captured, which leads to issues such as difficulty generalizing a concept to new contexts or different appearances and an inability to integrate concepts from different sources. The focus of this dissertation is developing ways to represent concepts within models in a unified fashion across vision and language. In particular, there are three challenges that we address: 1) Linking instances of concepts across modalities without strong supervision or large amounts of data external to the target task. In visual question answering, models tend to rely on contextual cues or learned priors instead of actually recognizing and linking concepts across modalities. Consequently, when a concept appears in a new context, models often fail to adapt. We learn to ground concept mentions in text to image regions in the context of visual question answering using self-supervision. We also demonstrate that learning concept grounding helps facilitate the disentanglement of the skills required to answer questions and concept mentions, which can improve generalization to novel compositions of skills and concepts. 2) Consistency towards different mentions of the same concept. An instance of a concept can take many different forms, such as the appearance of a concept in different images or the use of synonyms in text, and it can be difficult for models to infer these relationships from the training data alone. We show that existing visual question answering models have difficulty handling even straightforward changes in concept mentions and the wordings of the questions. We enforce consistency for related questions in these models not only of the answers, but also of the computed intermediate representations, which improves robustness to such variations. 3) Modeling associations between related concepts in complex domains. In scenarios where multiple related sources of information need to be considered, models must be able to connect concepts found within and across these different sources. We introduce the task of knowledge-aware video captioning for news videos, where models must generate descriptions of videos that leverage interconnected background knowledge pertaining to concepts involved in the videos. We build models that learn to associate patterns of concepts found in related news articles, such as entities and events, with video content in order to generate these knowledge-rich descriptions

    Urdu Speech and Text Based Sentiment Analyzer

    Full text link
    Discovering what other people think has always been a key aspect of our information-gathering strategy. People can now actively utilize information technology to seek out and comprehend the ideas of others, thanks to the increased availability and popularity of opinion-rich resources such as online review sites and personal blogs. Because of its crucial function in understanding people's opinions, sentiment analysis (SA) is a crucial task. Existing research, on the other hand, is primarily focused on the English language, with just a small amount of study devoted to low-resource languages. For sentiment analysis, this work presented a new multi-class Urdu dataset based on user evaluations. The tweeter website was used to get Urdu dataset. Our proposed dataset includes 10,000 reviews that have been carefully classified into two categories by human experts: positive, negative. The primary purpose of this research is to construct a manually annotated dataset for Urdu sentiment analysis and to establish the baseline result. Five different lexicon- and rule-based algorithms including Naivebayes, Stanza, Textblob, Vader, and Flair are employed and the experimental results show that Flair with an accuracy of 70% outperforms other tested algorithms.Comment: Sentiment Analysis, Opinion Mining, Urdu language, polarity assessment, lexicon-based metho

    A Semantics-based User Interface Model for Content Annotation, Authoring and Exploration

    Get PDF
    The Semantic Web and Linked Data movements with the aim of creating, publishing and interconnecting machine readable information have gained traction in the last years. However, the majority of information still is contained in and exchanged using unstructured documents, such as Web pages, text documents, images and videos. This can also not be expected to change, since text, images and videos are the natural way in which humans interact with information. Semantic structuring of content on the other hand provides a wide range of advantages compared to unstructured information. Semantically-enriched documents facilitate information search and retrieval, presentation, integration, reusability, interoperability and personalization. Looking at the life-cycle of semantic content on the Web of Data, we see quite some progress on the backend side in storing structured content or for linking data and schemata. Nevertheless, the currently least developed aspect of the semantic content life-cycle is from our point of view the user-friendly manual and semi-automatic creation of rich semantic content. In this thesis, we propose a semantics-based user interface model, which aims to reduce the complexity of underlying technologies for semantic enrichment of content by Web users. By surveying existing tools and approaches for semantic content authoring, we extracted a set of guidelines for designing efficient and effective semantic authoring user interfaces. We applied these guidelines to devise a semantics-based user interface model called WYSIWYM (What You See Is What You Mean) which enables integrated authoring, visualization and exploration of unstructured and (semi-)structured content. To assess the applicability of our proposed WYSIWYM model, we incorporated the model into four real-world use cases comprising two general and two domain-specific applications. These use cases address four aspects of the WYSIWYM implementation: 1) Its integration into existing user interfaces, 2) Utilizing it for lightweight text analytics to incentivize users, 3) Dealing with crowdsourcing of semi-structured e-learning content, 4) Incorporating it for authoring of semantic medical prescriptions
    corecore