81 research outputs found
QUAL : A Provenance-Aware Quality Model
The research described here is supported by the award made by the RCUK Digital Economy program to the dot.rural Digital Economy Hub; award reference: EP/G066051/1.Peer reviewedPostprin
Precedent-Oriented Experimenting in Designing of Software Intensive Systems
The paper presents a precedent-oriented approach to experimenting with programmable units of developer’sactivity in conceptual designing of Software Intensive Systems (SIS). The reuse of any such a unit is being implemented as atypical work of a designer in accordance with the definite technique which is previously programmed. The offered approachis coordinated with simplifying the complexity on the base of interactions of designers with the accessible experience thekernel of which consists of models of assets included into Experience Base. The simplifying is being achieved by the use ofthe specialized pseudo-code language in programming of assets for their reuse by designers.Keywords/Index Terms— conceptual designing, pseudo-code language, programming, precedent-oriented approach,software intensive systems
Linked Data - the story so far
The term “Linked Data” refers to a set of best practices for publishing and connecting structured data on the Web. These best practices have been adopted by an increasing number of data providers over the last three years, leading to the creation of a global data space containing billions of assertions— the Web of Data. In this article, the authors present the concept and technical principles of Linked Data, and situate these within the broader context of related technological developments. They describe progress to date in publishing Linked Data on the Web, review applications that have been developed to exploit the Web of Data, and map out a research agenda for the Linked Data community as it moves forward
MetaICL: Learning to Learn In Context
We introduce MetaICL (Meta-training for In-Context Learning), a new
meta-training framework for few-shot learning where a pretrained language model
is tuned to do in-context learning on a large set of training tasks. This
meta-training enables the model to more effectively learn a new task in context
at test time, by simply conditioning on a few training examples with no
parameter updates or task-specific templates. We experiment on a large, diverse
collection of tasks consisting of 142 NLP datasets including classification,
question answering, natural language inference, paraphrase detection and more,
across seven different meta-training/target splits. MetaICL outperforms a range
of baselines including in-context learning without meta-training and multi-task
learning followed by zero-shot transfer. We find that the gains are
particularly significant for target tasks that have domain shifts from the
meta-training tasks, and that using a diverse set of the meta-training tasks is
key to improvements. We also show that MetaICL approaches (and sometimes beats)
the performance of models fully finetuned on the target task, and outperforms
much bigger models with nearly 8x parameters. Finally, we show that MetaICL is
complementary to human-written instructions, and the best performance can be
achieved by combining both approaches.Comment: 19 pages, 2 figures. Published as a conference paper at NAACL 2022
(long). Code available at https://github.com/facebookresearch/MetaIC
Explanation in the Semantic Web: a survey of the state of the art
Semantic Web applications use interconnected distributed data and inferential capabilities to compute their results. The users of Semantic Web applications might find it difficult to understand how a result is produced or how a new piece of information is derived in the process. Explanation enables users to understand the process of obtaining results. Explanation adds transparency to the process of obtaining results and enables user trust on the process. The concept of providing explanation has been first introduced in expert systems and later studied in different application areas. This paper provides a brief review of existing research on explanation in the Semantic Web
Completeness and Consistency Analysis for Evolving Knowledge Bases
Assessing the quality of an evolving knowledge base is a challenging task as
it often requires to identify correct quality assessment procedures.
Since data is often derived from autonomous, and increasingly large data
sources, it is impractical to manually curate the data, and challenging to
continuously and automatically assess their quality.
In this paper, we explore two main areas of quality assessment related to
evolving knowledge bases: (i) identification of completeness issues using
knowledge base evolution analysis, and (ii) identification of consistency
issues based on integrity constraints, such as minimum and maximum cardinality,
and range constraints.
For completeness analysis, we use data profiling information from consecutive
knowledge base releases to estimate completeness measures that allow predicting
quality issues. Then, we perform consistency checks to validate the results of
the completeness analysis using integrity constraints and learning models.
The approach has been tested both quantitatively and qualitatively by using a
subset of datasets from both DBpedia and 3cixty knowledge bases. The
performance of the approach is evaluated using precision, recall, and F1 score.
From completeness analysis, we observe a 94% precision for the English DBpedia
KB and 95% precision for the 3cixty Nice KB. We also assessed the performance
of our consistency analysis by using five learning models over three sub-tasks,
namely minimum cardinality, maximum cardinality, and range constraint. We
observed that the best performing model in our experimental setup is the Random
Forest, reaching an F1 score greater than 90% for minimum and maximum
cardinality and 84% for range constraints.Comment: Accepted for Journal of Web Semantic
A systematic literature review of open data quality in practice
Context: The main objective of open data initiatives is to make information freely available through easily accessible mechanisms and facilitate exploitation. In practice openness should be accompanied with a certain level of trustwor- thiness or guarantees about the quality of data. Traditional data quality is a thoroughly researched field with several benchmarks and frameworks to grasp its dimensions. However, quality assessment in open data is a complicated process as it consists of stakeholders, evaluation of datasets as well as the publishing platform.
Objective: In this work, we aim to identify and synthesize various features of open data quality approaches in practice. We applied thematic synthesis to identify the most relevant research problems and quality assessment methodologies. Method: We undertook a systematic literature review to summarize the state of the art on open data quality. The review process starts by developing the review protocol in which all steps, research questions, inclusion and exclusion criteria and analysis procedures are included. The search strategy retrieved 9323 publications from four scientific digital libraries. The selected papers were published between 2005 and 2015. Finally, through a discussion between the authors, 63 paper were included in the final set of selected papers.
Results: Open data quality, in general, is a broad concept, and it could apply to multiple areas. There are many quality issues concerning open data hindering their actual usage for real-world applications. The main ones are unstruc- tured metadata, heterogeneity of data formats, lack of accuracy, incompleteness and lack of validation techniques. Furthermore, we collected the existing quality methodologies from selected papers and synthesized under a unifying classification schema. Also, a list of quality dimensions and metrics from selected paper is reported.
Conclusion: In this research, we provided an overview of the methods related to open data quality, using the instru- ment of systematic literature reviews. Open data quality methodologies vary depending on the application domain. Moreover, the majority of studies focus on satisfying specific quality criteria. With metrics based on generalized data attributes a platform can be created to evaluate all possible open dataset. Also, the lack of methodology validation remains a major problem. Studies should focus on validation techniques
Quality information retrieval for the World Wide Web
The World Wide Web is an unregulated communication medium which exhibits very limited means of quality control. Quality assurance has become a key issue for many information retrieval services on the Internet, e.g. web search engines. This paper introduces some quality evaluation and assessment methods to assess the quality of web pages. The proposed quality evaluation mechanisms are based on a set of quality criteria which were extracted from a targeted user survey. A weighted algorithmic interpretation of the most significant user quoted quality criteria is proposed. In addition, the paper utilizes machine learning methods to produce a prediction of quality for web pages before they are downloaded. The set of quality criteria allows us to implement a web search engine with quality ranking schemes, leading to web crawlers which can crawl directly quality web pages. The proposed approaches produce some very promising results on a sizable web repository
- …