48,357 research outputs found
Extensible Markup Language (XML) 1.1
El lenguaje extensible de marcas (XML) es un subconjunto de SGML, y aparece completamente definido en este documento. Su objetivo es permitir que SGML genérico pueda ser servido, recibido y procesado en la Web en la misma manera que hoy es posible con HTML. XML ha sido diseñado de tal manera que sea fácil de implementar y buscando interoperabilidad tanto con SGML como con HTML.Second editio
Information Waste on the World Wide Web and Combating the Clutter
The Internet has become a critical part of the infrastructure supporting modern life. The high degree of openness and autonomy of information providers determines the access to a vast amount of information on the Internet. However, this makes the web vulnerable to inaccurate, misleading, or outdated information. The unnecessary and unusable content, which is referred to as “information waste,” takes up hardware resources and clutters the web. In this paper, we examine the phenomenon of web information waste by developing a taxonomy of it and analyzing its causes and effects. We then explore possible solutions and propose a classification approach using quantitative metrics for information waste detection
AEMIX: semantic verification of weather forecasts on the web
Ponencia presentada en: 12th International Conference on Web Information Systems and Technologies celebrada en Roma del 23 al 25 de abril de 2016The main objectives of a meteorological service are the development, implementation and delivery of weather
forecasts. Weather predictions are broadcasted to society through different channels, i.e. newspaper, television, radio, etc. Today, the use of theWeb through personal computers and mobile devices stands out. The forecasts, which can be presented in numerical format, in charts, or in written natural language, have a certain margin of error. Providing automatic tools able to assess the precision of predictions allows to improve these forecasts,
quantify the degree of success depending on certain variables (geographic areas, weather conditions, time of year, etc.), and focus future work on areas for improvement that increase such accuracy. Despite technological advances, the task of verifying forecasts written in natural language is still performed manually by people in many cases, which is expensive, time-consuming, and subjected to human errors. On the other hand, weather
forecasts usually follow several conventions in both structure and use of language, which, while not completely formal, can be exploited to increase the quality of the verification. In this paper, we describe a methodology to quantify the accuracy of weather forecasts posted on the Web and based on natural language. This work obtains relevant information from weather forecasts by using ontologies to capture and take advantage of the structure and language conventions. This approach is implemented in a framework that allows to address different types of predictions with minimal effort. Experimental results with real data are promising, and most importantly, they allow direct use in a real meteorological service.This research work has been supported by the CICYT project TIN2013-46238-C4-4-R, and DGAFS
Towards the ontology-based approach for factual information matching
Factual information is information based on facts or relating to facts. The reliability of automatically extracted facts is the main problem of processing factual information. The fact retrieval system remains one of the most effective tools for identifying the information for decision-making. In this work, we explore how can natural language processing methods and problem domain ontology help to check contradictions and mismatches in facts automatically
Fully Automated Fact Checking Using External Sources
Given the constantly growing proliferation of false claims online in recent
years, there has been also a growing research interest in automatically
distinguishing false rumors from factually true claims. Here, we propose a
general-purpose framework for fully-automatic fact checking using external
sources, tapping the potential of the entire Web as a knowledge source to
confirm or reject a claim. Our framework uses a deep neural network with LSTM
text encoding to combine semantic kernels with task-specific embeddings that
encode a claim together with pieces of potentially-relevant text fragments from
the Web, taking the source reliability into account. The evaluation results
show good performance on two different tasks and datasets: (i) rumor detection
and (ii) fact checking of the answers to a question in community question
answering forums.Comment: RANLP-201
Through the Lens of Core Competency: Survey on Evaluation of Large Language Models
From pre-trained language model (PLM) to large language model (LLM), the
field of natural language processing (NLP) has witnessed steep performance
gains and wide practical uses. The evaluation of a research field guides its
direction of improvement. However, LLMs are extremely hard to thoroughly
evaluate for two reasons. First of all, traditional NLP tasks become inadequate
due to the excellent performance of LLM. Secondly, existing evaluation tasks
are difficult to keep up with the wide range of applications in real-world
scenarios. To tackle these problems, existing works proposed various benchmarks
to better evaluate LLMs. To clarify the numerous evaluation tasks in both
academia and industry, we investigate multiple papers concerning LLM
evaluations. We summarize 4 core competencies of LLM, including reasoning,
knowledge, reliability, and safety. For every competency, we introduce its
definition, corresponding benchmarks, and metrics. Under this competency
architecture, similar tasks are combined to reflect corresponding ability,
while new tasks can also be easily added into the system. Finally, we give our
suggestions on the future direction of LLM's evaluation
Finding answers to questions, in text collections or web, in open domain or specialty domains
International audienceThis chapter is dedicated to factual question answering, i.e. extracting precise and exact answers to question given in natural language from texts. A question in natural language gives more information than a bag of word query (i.e. a query made of a list of words), and provides clues for finding precise answers. We will first focus on the presentation of the underlying problems mainly due to the existence of linguistic variations between questions and their answerable pieces of texts for selecting relevant passages and extracting reliable answers. We will first present how to answer factual question in open domain. We will also present answering questions in specialty domain as it requires dealing with semi-structured knowledge and specialized terminologies, and can lead to different applications, as information management in corporations for example. Searching answers on the Web constitutes another application frame and introduces specificities linked to Web redundancy or collaborative usage. Besides, the Web is also multilingual, and a challenging problem consists in searching answers in target language documents other than the source language of the question. For all these topics, we present main approaches and the remaining problems
Large-Scale Pattern-Based Information Extraction from the World Wide Web
Extracting information from text is the task of obtaining structured, machine-processable facts from information that is mentioned in an unstructured manner. It thus allows systems to automatically aggregate information for further analysis, efficient retrieval, automatic validation, or appropriate visualization. This work explores the potential of using textual patterns for Information Extraction from the World Wide Web
- …