426,727 research outputs found

    A Semantic Graph-Based Approach for Mining Common Topics From Multiple Asynchronous Text Streams

    Get PDF
    In the age of Web 2.0, a substantial amount of unstructured content are distributed through multiple text streams in an asynchronous fashion, which makes it increasingly difficult to glean and distill useful information. An effective way to explore the information in text streams is topic modelling, which can further facilitate other applications such as search, information browsing, and pattern mining. In this paper, we propose a semantic graph based topic modelling approach for structuring asynchronous text streams. Our model in- tegrates topic mining and time synchronization, two core modules for addressing the problem, into a unified model. Specifically, for handling the lexical gap issues, we use global semantic graphs of each timestamp for capturing the hid- den interaction among entities from all the text streams. For dealing with the sources asynchronism problem, local semantic graphs are employed to discover similar topics of different entities that can be potentially separated by time gaps. Our experiment on two real-world datasets shows that the proposed model significantly outperforms the existing ones

    Using authentic texts for grammar exercises for a minority language

    Get PDF
    Source at http://www.ep.liu.se/index.en.asp.This paper presents an ATICALL (Authentic Text ICALL) system with automatic visual input enhancement activities for training complex inflection systems in a minority language. We have adapted the freely available VIEW system which was designed to automatically generate activities from any web content. Our system is based on finite state transducers (FST) and Constraint Grammar, originally built for other purposes. The paper describes ways of handling ambiguity in the target form in the exercises, and ways of handling the challenges for VIEW posed by authentic text, typical for a minority language: variations in orthography, and large proportion of nonnormative forms.</p

    Automatic Dish Name Extraction from User-generated Content Using LLM

    Get PDF
    Extraction of dish names from user-provided content such as food photographs and captions, restaurant reviews, and other free-form text is a challenging task. Rule-based approaches are difficult to maintain and improve. Pattern matching against a predefined dictionary often suffers from low recall. Conventional machine learning models require large amounts of labeled data to perform named entity recognition (e.g., to recognize dish names) which is often costly and does not scale well across multiple languages and countries. This disclosure describes the use of a multimodal large language model to automatically extract dish names from user-generated content such as food photographs and associated free-form text such as tags, captions, etc. Dish name extraction from the user-provided tags can be formulated as an open vocabulary dish name entity recognition and discovery task, which fits naturally with the framework of pre-trained LLMs, and leverages the model capability in handling multilingual, multicultural text understanding

    Effects of White Space in Learning via the Web

    Get PDF
    This study measured the effect of specific white space features on learning from instructional Web materials. The study also measured learners' beliefs regarding Web-based instruction. Prior research indicated that small changes in the handling of presentation elements can affect learning. Achievement results from this study indicated that in on-line materials, when content and overall structure are sound, minor differences regarding table borders and vertical spacing in text do not hinder learning. Beliefs regarding Web-based instruction and instructors who use it did not differ significantly between treatment groups. Implications of the study and cautions regarding generalizing from the results are discussed.Yeshttps://us.sagepub.com/en-us/nam/manuscript-submission-guideline

    A Novel Framework for Multi-Document Temporal Summarization (MDTS)

    Get PDF
    Internet or Web consists of a massive amount of information, handling which is a tedious task. Summarization plays a crucial role in extracting or abstracting key content from multiple sources with its meaning contained, thereby reducing the complexity in handling the information. Multi-document summarization gives the gist of the content collected from multiple documents. Temporal summarization concentrates on temporally related events. This paper proposes a Multi-Document Temporal Summarization (MDTS) technique that generates the summary based on temporally related events extracted from multiple documents. This technique extracts the events with the time stamp. TIMEML standards tags are used in extracting events and times. These event-times are stored in a structured database form for easier operations. Sentence ranking methods are build based on the frequency of events occurrences in the sentence. Sentence similarity measures are computed to eliminate the redundant sentences in an extracted summary. Depending on the required summary length, top-ranked sentences are selected to form the summary. Experiments are conducted on DUC 2006 and DUC 2007 data set that was released for multi-document summarization task. The extracted summaries are evaluated using ROUGE to determine precision, recall and F measure of generated summaries. The performance of the proposed method is compared with particle swarm optimization-based algorithm (PSOS), Cat swarm optimization-based summarization (CSOS), Cuckoo Search based multi-document summarization (MDSCSA). It is found that the performance of MDTS is better when compared with other methods. Doi: 10.28991/esj-2021-01268 Full Text: PD

    Applications of integration of AI-based Optical Character Recognition (OCR) and Generative AI in Document Understanding and Processing

    Get PDF
    The adoption of AI-based Optical Character Recognition (OCR) and Generative AI can streamline document processing, shifting from manual to automated digital methods, thus increasing efficiency and accuracy in data handling. This study examines the applications of these technologies across various stages of document management. Initially, OCR technology can scan and digitize physical documents, transforming text images into machine-encoded text. This process is essential for converting paper-based records into digital formats. Additionally, OCR can decipher handwritten notes, making it invaluable for processing historical documents and manually filled forms. In the subsequent phase, these technologies can categorize and organize data. AI algorithms, combined with OCR, can classify text into various categories such as invoices, legal documents, or personal letters, thereby streamlining document sorting and retrieval. Generative AI can further enhance this process by producing concise summaries of lengthy documents, enabling quick comprehension without the need to read the entire text. Error detection and correction are also critical areas where these technologies can be applied. Despite its effectiveness, OCR may misinterpret characters, and AI algorithms can identify these errors by comparing the scanned text against language models. Generative AI can then suggest corrections, improving the accuracy of the digitized text. Moreover, the combination of OCR and Generative AI can be employed for data extraction and analysis, extracting specific information from documents, and conducting sentiment analysis on texts like customer reviews to gain insights into customer opinions. In terms of language translation and localization, Generative AI can translate digitized text into various languages and adapt content for different cultural contexts, crucial for international businesses. Document accessibility is enhanced as AI can convert text to speech and introduce interactive elements, making documents accessible to visually impaired users. Furthermore, in ensuring security and compliance, these technologies can identify and redact sensitive information to comply with privacy laws and verify the authenticity of documents to detect alterations. Finally, AI can generate customizable document templates and content, tailoring documents to specific needs and preferences, demonstrating the extensive impact of AI-based OCR and Generative AI in modern document processing and management

    Adaptive Resonance Theory (ART) for social media analytics

    Get PDF
    This chapter presents the ART-based clustering algorithms for social media analytics in detail. Sections 3.1 and 3.2 introduce Fuzzy ART and its clustering mechanisms, respectively, which provides a deep understanding of the base model that is used and extended for handling the social media clustering challenges. Important concepts such as vigilance region (VR) and its properties are explained and proven. Subsequently, Sects. 3.3-3.7 illustrate five types of ART adaptive resonance theory variants, each of which addresses the challenges in one social media analytical scenario, including automated parameter adaptation, user preference incorporation, short text clustering, heterogeneous data co-clustering and online streaming data indexing. The content of this chapter is several prior studies, including Probabilistic ART [15

    Review of Traffic Sign Detection and Recognition Techniques

    Get PDF
    Text, as one of the most compelling developments of humankind, has assumed a significant job in human life, so distant from antiquated occasions. The rich and exact data epitomized in content is extremely helpful in a wide scope of vision-based applications; along these lines content detection and recognition in regular scenes have turned out to be significant and dynamic research points in PC vision and report investigation. Traffic sign detection and recognition is a field of connected PC vision research worried about the programmed detection and grouping or recognition of traffic signs in scene pictures procured from a moving vehicle. Driving is an assignment dependent on visual data handling. The traffic signs characterize a visual language translated by drivers. Traffic signs convey much data important for effective driving; they portray current traffic circumstance, characterize option to proceed, preclude or grant certain headings. In this paper, talked about different detection and recognition schemes
    • …
    corecore