48,357 research outputs found

    Extensible Markup Language (XML) 1.1

    Get PDF
    El lenguaje extensible de marcas (XML) es un subconjunto de SGML, y aparece completamente definido en este documento. Su objetivo es permitir que SGML genérico pueda ser servido, recibido y procesado en la Web en la misma manera que hoy es posible con HTML. XML ha sido diseñado de tal manera que sea fácil de implementar y buscando interoperabilidad tanto con SGML como con HTML.Second editio

    Information Waste on the World Wide Web and Combating the Clutter

    Get PDF
    The Internet has become a critical part of the infrastructure supporting modern life. The high degree of openness and autonomy of information providers determines the access to a vast amount of information on the Internet. However, this makes the web vulnerable to inaccurate, misleading, or outdated information. The unnecessary and unusable content, which is referred to as “information waste,” takes up hardware resources and clutters the web. In this paper, we examine the phenomenon of web information waste by developing a taxonomy of it and analyzing its causes and effects. We then explore possible solutions and propose a classification approach using quantitative metrics for information waste detection

    AEMIX: semantic verification of weather forecasts on the web

    Get PDF
    Ponencia presentada en: 12th International Conference on Web Information Systems and Technologies celebrada en Roma del 23 al 25 de abril de 2016The main objectives of a meteorological service are the development, implementation and delivery of weather forecasts. Weather predictions are broadcasted to society through different channels, i.e. newspaper, television, radio, etc. Today, the use of theWeb through personal computers and mobile devices stands out. The forecasts, which can be presented in numerical format, in charts, or in written natural language, have a certain margin of error. Providing automatic tools able to assess the precision of predictions allows to improve these forecasts, quantify the degree of success depending on certain variables (geographic areas, weather conditions, time of year, etc.), and focus future work on areas for improvement that increase such accuracy. Despite technological advances, the task of verifying forecasts written in natural language is still performed manually by people in many cases, which is expensive, time-consuming, and subjected to human errors. On the other hand, weather forecasts usually follow several conventions in both structure and use of language, which, while not completely formal, can be exploited to increase the quality of the verification. In this paper, we describe a methodology to quantify the accuracy of weather forecasts posted on the Web and based on natural language. This work obtains relevant information from weather forecasts by using ontologies to capture and take advantage of the structure and language conventions. This approach is implemented in a framework that allows to address different types of predictions with minimal effort. Experimental results with real data are promising, and most importantly, they allow direct use in a real meteorological service.This research work has been supported by the CICYT project TIN2013-46238-C4-4-R, and DGAFS

    Towards the ontology-based approach for factual information matching

    Get PDF
    Factual information is information based on facts or relating to facts. The reliability of automatically extracted facts is the main problem of processing factual information. The fact retrieval system remains one of the most effective tools for identifying the information for decision-making. In this work, we explore how can natural language processing methods and problem domain ontology help to check contradictions and mismatches in facts automatically

    Fully Automated Fact Checking Using External Sources

    Full text link
    Given the constantly growing proliferation of false claims online in recent years, there has been also a growing research interest in automatically distinguishing false rumors from factually true claims. Here, we propose a general-purpose framework for fully-automatic fact checking using external sources, tapping the potential of the entire Web as a knowledge source to confirm or reject a claim. Our framework uses a deep neural network with LSTM text encoding to combine semantic kernels with task-specific embeddings that encode a claim together with pieces of potentially-relevant text fragments from the Web, taking the source reliability into account. The evaluation results show good performance on two different tasks and datasets: (i) rumor detection and (ii) fact checking of the answers to a question in community question answering forums.Comment: RANLP-201

    Through the Lens of Core Competency: Survey on Evaluation of Large Language Models

    Full text link
    From pre-trained language model (PLM) to large language model (LLM), the field of natural language processing (NLP) has witnessed steep performance gains and wide practical uses. The evaluation of a research field guides its direction of improvement. However, LLMs are extremely hard to thoroughly evaluate for two reasons. First of all, traditional NLP tasks become inadequate due to the excellent performance of LLM. Secondly, existing evaluation tasks are difficult to keep up with the wide range of applications in real-world scenarios. To tackle these problems, existing works proposed various benchmarks to better evaluate LLMs. To clarify the numerous evaluation tasks in both academia and industry, we investigate multiple papers concerning LLM evaluations. We summarize 4 core competencies of LLM, including reasoning, knowledge, reliability, and safety. For every competency, we introduce its definition, corresponding benchmarks, and metrics. Under this competency architecture, similar tasks are combined to reflect corresponding ability, while new tasks can also be easily added into the system. Finally, we give our suggestions on the future direction of LLM's evaluation

    Finding answers to questions, in text collections or web, in open domain or specialty domains

    Get PDF
    International audienceThis chapter is dedicated to factual question answering, i.e. extracting precise and exact answers to question given in natural language from texts. A question in natural language gives more information than a bag of word query (i.e. a query made of a list of words), and provides clues for finding precise answers. We will first focus on the presentation of the underlying problems mainly due to the existence of linguistic variations between questions and their answerable pieces of texts for selecting relevant passages and extracting reliable answers. We will first present how to answer factual question in open domain. We will also present answering questions in specialty domain as it requires dealing with semi-structured knowledge and specialized terminologies, and can lead to different applications, as information management in corporations for example. Searching answers on the Web constitutes another application frame and introduces specificities linked to Web redundancy or collaborative usage. Besides, the Web is also multilingual, and a challenging problem consists in searching answers in target language documents other than the source language of the question. For all these topics, we present main approaches and the remaining problems

    Large-Scale Pattern-Based Information Extraction from the World Wide Web

    Get PDF
    Extracting information from text is the task of obtaining structured, machine-processable facts from information that is mentioned in an unstructured manner. It thus allows systems to automatically aggregate information for further analysis, efficient retrieval, automatic validation, or appropriate visualization. This work explores the potential of using textual patterns for Information Extraction from the World Wide Web
    corecore