Search CORE

77 research outputs found

EXECUTABLE ARCHIVES: Software integrity for data readability and validation of archived studies

Author: Cubric Marija
Milic-Frayling Natasa
Publication venue
Publication date: 22/10/2021
Field of study

© 2021 author(s). The text of this paper is published under a CC-BY license (https://creativecommons.org/licenses/by/4.0/)This paper presents practices and processes for managing software integrity to support data archiving for long term use in response to the regulatory requirements. Through a case study of a scientific software decommissioning, we revisit the issues of archived data readability. Established software lifecycle management processes are extended with archiving and data integrity requirements for retention of data and revalidation of data analyses. That includes the software transition from operational to archival use within the Executable Archive model that extends the traditional data archive with computing environments with software installations required to reproduce study results from the archived records. The content use requirements are an integral part of both data access and the software management considerations, assuring that data integrity is fully supported by the software integrityPeer reviewe

University of Hertfordshire Research Archive

Contextual Knowledge Learning For Dialogue Generation

Author: Milic-Frayling Natasa
Zheng Wen
Zhou Ke
Publication venue
Publication date: 29/05/2023
Field of study

Incorporating conversational context and knowledge into dialogue generation models has been essential for improving the quality of the generated responses. The context, comprising utterances from previous dialogue exchanges, is used as a source of content for response generation and as a means of selecting external knowledge. However, to avoid introducing irrelevant content, it is key to enable fine-grained scoring of context and knowledge. In this paper, we present a novel approach to context and knowledge weighting as an integral part of model training. We guide the model training through a Contextual Knowledge Learning (CKL) process which involves Latent Vectors for context and knowledge, respectively. CKL Latent Vectors capture the relationship between context, knowledge, and responses through weak supervision and enable differential weighting of context utterances and knowledge sentences during the training process. Experiments with two standard datasets and human evaluation demonstrate that CKL leads to a significant improvement compared with the performance of six strong baseline models and shows robustness with regard to reduced sizes of training sets.Comment: 9 pages, 4 figures, 6 tables. Accepted as a full paper in the main conference by ACL 202

arXiv.org e-Print Archive

Social media brand engagement as a proxy for E-commerce activities: a case study of Sina Weibo and JD

Author: Ch'ng Eugene
Lin Weiqiang
Milic-Frayling Natasa
Saleiro Pedro
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 13/10/2018
Field of study

E-commerce platforms facilitate sales of products while product vendors engage in Social Media Activities (SMA) to drive E-commerce Platform Activities (EPA) of consumers, enticing them to search, browse and buy products. The frequency and timing of SMA are expected to affect levels of EPA, increasing the number of brand related queries, clickthrough, and purchase orders. This paper applies cross-sectional data analysis to explore such beliefs and demonstrates weak-to-moderate correlations between daily SMA and EPA volumes. Further correlation analysis, using 30-day rolling windows, shows a high variability in correlation of SMA-EPA pairs and calls into question the predictive potential of SMA in relation to EPA. Considering the moderate correlation of selected SMA and EPA pairs (e.g., Post-Orders), we investigate whether SMA features can predict changes in the EPA levels, instead of precise EPA daily volumes. We define such levels in terms of EPA distribution quantiles (2, 3, and 5 levels) over training data. We formulate the EPA quantile predictions as a multi-class categorization problem. The experiments with Random Forest and Logistic Regression show a varied success, performing better than random for the top quantiles of purchase orders and for the lowest quantile of search and clickthrough activities. Similar results are obtained when predicting multi-day cumulative EPA levels (1, 3, and 7 days). Our results have considerable practical implications but, most importantly, urge the common beliefs to be re-examined, seeking a stronger evidence of SMA effects on EPA

Nottingham ePrints

arXiv.org e-Print Archive

Nottingham eTheses

Crossref

Predicting News Values from Headline Text and Emotions

Author: Bojana Dalbelo Bašić
di Buono Maria Pia
Goran Glavaš
Jan Šnajder
Martin Tutek
Natasa Milic-Frayling
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2017
Field of study

We present a preliminary study on predicting news values from headline text and emotions. We perform a multivariate analysis on a dataset manually annotated with news values and emotions, discovering interesting correlations among them. We then train two competitive machine learning models – an SVM and a CNN – to predict news values from headline text and emotions as features. We find that, while both models yield a satisfactory performance, some news values are more difficult to detect than others, while some profit more from including emotion information

Crossref

ARCHIVIO ISTITUZIONALE DELLA RICERCA-UNIVERSITA' DEGLI STUDI DI NAPOLI "L'ORIENTALE"

Università degli Studi di Napoli L'Orientale: CINECA IRIS

MAnnheim DOCument Server (Univ. Mannheim)

Benchmarking Arabic AI with Large Language Models

Author: Abdelali Ahmed
Alam Firoj
Ali Ahmed
Boughorbel Sabri
Chowdhury Shammur Absar
Dalvi Fahim
Durrani Nadir
Elshahawy Yousseif
Hasanain Maram
Hawasly Majd
Izham Daniel
Kheir Yassine El
Milic-Frayling Natasa
Mousi Basel
Mubarak Hamdy
Nazar Nizi
Publication venue
Publication date: 24/05/2023
Field of study

With large Foundation Models (FMs), language technologies (AI in general) are entering a new paradigm: eliminating the need for developing large-scale task-specific datasets and supporting a variety of tasks through set-ups ranging from zero-shot to few-shot learning. However, understanding FMs capabilities requires a systematic benchmarking effort by comparing FMs performance with the state-of-the-art (SOTA) task-specific models. With that goal, past work focused on the English language and included a few efforts with multiple languages. Our study contributes to ongoing research by evaluating FMs performance for standard Arabic NLP and Speech processing, including a range of tasks from sequence tagging to content classification across diverse domains. We start with zero-shot learning using GPT-3.5-turbo, Whisper, and USM, addressing 33 unique tasks using 59 publicly available datasets resulting in 96 test setups. For a few tasks, FMs performs on par or exceeds the performance of the SOTA models but for the majority it under-performs. Given the importance of prompt for the FMs performance, we discuss our prompt strategies in detail and elaborate on our findings. Our future work on Arabic AI will explore few-shot prompting, expand the range of tasks, and investigate additional open-source models.Comment: Foundation Models, Large Language Models, Arabic NLP, Arabic Speech, Arabic AI, , CHatGPT Evaluation, USM Evaluation, Whisper Evaluatio

arXiv.org e-Print Archive

The Grey Web: dataveillance vision fulfilled through the evolving Web

Author: Gomer Richard Charles
Milic-Frayling Natasa
schraefel m.c.
Publication venue: 'University of Southampton'
Publication date: 01/04/2014
Field of study

Over the past three decades, Web has evolved from an information medium to an intricate economic ecosystem. Initially focused on supporting the transition from traditional business practices to e-commerce, the Web has given rise to new, purely Web based businesses. Aligned with the original vision and expectations of the ‘free Web’, they have provided free services but, over time, developed business models that leverage the user digital footprints and the user generated content to create economic value. With the use of computing technologies to analyze, aggregate, and share such data, individuals’ privacy has been undermined and, with that, the their ability to shape their role in the digital society and beyond. The purpose of this paper is to instigate the dialogue around the critical societal issues that arise from the current Web economy and motivate research initiatives to assist with addressing them. We present three case studies that quantify the extent, rate, and pervasiveness of the user tracking on the Web. We use them to illustrate the determining aspects of the Web that have to be taken into account by the Web Science community. As researchers we aspire to understand the nature of the Web in depth and, based on that, propose designs and policies that are required to ensure that the Web is fit to be the underpinning of our societies and our digital future

Southampton (e-Prints Soton)