Search CORE

1,959 research outputs found

Pinching sweaters on your phone – iShoogle : multi-gesture touchscreen fabric simulator using natural on-fabric gestures to communicate textile qualities

Author: Orzechowski Pawel Michal
Publication venue: Mathematical and Computer Sciences
Publication date: 01/05/2016
Field of study

The inability to touch fabrics online frustrates consumers, who are used to evaluating physical textiles by engaging in complex, natural gestural interactions. When customers interact with physical fabrics, they combine cross-modal information about the fabric's look, sound and handle to build an impression of its physical qualities. But whenever an interaction with a fabric is limited (i.e. when watching clothes online) there is a perceptual gap between the fabric qualities perceived digitally and the actual fabric qualities that a person would perceive when interacting with the physical fabric. The goal of this thesis was to create a fabric simulator that minimized this perceptual gap, enabling accurate perception of the qualities of fabrics presented digitally. We designed iShoogle, a multi-gesture touch-screen sound-enabled fabric simulator that aimed to create an accurate representation of fabric qualities without the need for touching the physical fabric swatch. iShoogle uses on-screen gestures (inspired by natural on-fabric movements e.g. Crunching) to control pre-recorded videos and audio of fabrics being deformed (e.g. being Crunched). iShoogle creates an illusion of direct video manipulation and also direct manipulation of the displayed fabric. This thesis describes the results of nine studies leading towards the development and evaluation of iShoogle. In the first three studies, we combined expert and non-expert textile-descriptive words and grouped them into eight dimensions labelled with terms Crisp, Hard, Soft, Textured, Flexible, Furry, Rough and Smooth. These terms were used to rate fabric qualities throughout the thesis. We observed natural on-fabric gestures during a fabric handling study (Study 4) and used the results to design iShoogle's on-screen gestures. In Study 5 we examined iShoogle's performance and speed in a fabric handling task and in Study 6 we investigated users' preferences for sound playback interactivity. iShoogle's accuracy was then evaluated in the last three studies by comparing participants’ ratings of textile qualities when using iShoogle with ratings produced when handling physical swatches. We also described the recording and processing techniques for the video and audio content that iShoogle used. Finally, we described the iShoogle iPhone app that was released to the general public. Our evaluation studies showed that iShoogle significantly improved the accuracy of fabric perception in at least some cases. Further research could investigate which fabric qualities and which fabrics are particularly suited to be represented with iShoogle

ROS: The Research Output Service. Heriot-Watt University Edinburgh

Meaning-sensitive noisy text analytics in the low data regime

Author: Kasthuriarachchy Buddhika
Publication venue: 'Federation University Australia'
Publication date: 01/01/2022
Field of study

Digital connectivity is revolutionising people’s quality of life. As broadband and mobile services become faster and more prevalent globally than before, people have started to frequently express their wants and desires on social media platforms. Thus, deriving insights from text data has become a popular approach, both in the industry and academia, to provide social media analytics solutions across a range of disciplines, including consumer behaviour, sales, sports and sociology. Businesses can harness the data shared on social networks to improve their organisations’ strategic business decisions by leveraging advanced Natural Language Processing (NLP) techniques, such as context-aware representations. Specifically, SportsHosts, our industry partner, will be able to launch digital marketing solutions that optimise audience targeting and personalisation using NLP-powered solutions. However, social media data are often noisy and diverse, making the task very challenging. Further, real-world NLP tasks often suffer from insufficient labelled data due to the costly and time-consuming nature of manual annotation. Nevertheless, businesses are keen on maximising the return on investment by boosting the performance of these NLP models in the real world, particularly with social media data. In this thesis, we make several contributions to address these challenges. Firstly, we propose to improve the NLP model’s ability to comprehend noisy text in a low data regime by leveraging prior knowledge from pre-trained language models. Secondly, we analyse the impact of text augmentation and the quality of synthetic sentences in a context-aware NLP setting and propose a meaning-sensitive text augmentation technique using a Masked Language Model. Thirdly, we offer a cost-efficient text data annotation methodology and an end-to-end framework to deploy efficient and effective social media analytics solutions in the real world.Doctor of Philosoph

Federation ResearchOnline

More Data Can Lead Us Astray: Active Data Acquisition in the Presence of Label Bias

Author: De-Arteaga Maria
Li Yunyi
Saar-Tsechansky Maytal
Publication venue
Publication date: 15/07/2022
Field of study

An increased awareness concerning risks of algorithmic bias has driven a surge of efforts around bias mitigation strategies. A vast majority of the proposed approaches fall under one of two categories: (1) imposing algorithmic fairness constraints on predictive models, and (2) collecting additional training samples. Most recently and at the intersection of these two categories, methods that propose active learning under fairness constraints have been developed. However, proposed bias mitigation strategies typically overlook the bias presented in the observed labels. In this work, we study fairness considerations of active data collection strategies in the presence of label bias. We first present an overview of different types of label bias in the context of supervised learning systems. We then empirically show that, when overlooking label bias, collecting more data can aggravate bias, and imposing fairness constraints that rely on the observed labels in the data collection process may not address the problem. Our results illustrate the unintended consequences of deploying a model that attempts to mitigate a single type of bias while neglecting others, emphasizing the importance of explicitly differentiating between the types of bias that fairness-aware algorithms aim to address, and highlighting the risks of neglecting label bias during data collection

arXiv.org e-Print Archive

Recommended from our members

Identifying and Modeling Code-Switched Language

Author: Soto Martinez Victor
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2020
Field of study

Code-switching is the phenomenon by which bilingual speakers switch between multiple languages during written or spoken communication. The importance of developing language technologies that are able to process code-switched language is immense, given the large populations that routinely code-switch. Current NLP and Speech models break down when used on code-switched data, interrupting the language processing pipeline in back-end systems and forcing users to communicate in ways which for them are unnatural. There are four main challenges that arise in building code-switched models: lack of code-switched data on which to train generative language models; lack of multilingual language annotations on code-switched examples which are needed to train supervised models; little understanding of how to leverage monolingual and parallel resources to build better code-switched models; and finally, how to use these models to learn why and when code-switching happens across language pairs. In this thesis, I look into different aspects of these four challenges. The first part of this thesis focuses on how to obtain reliable corpora of code-switched language. We collected a large corpus of code-switched language from social media using a combination of sets of anchor words that exist in one language and sentence-level language taggers. The newly obtained corpus is superior to other corpora collected via different strategies when it comes to the amount and type of bilingualism in it. It also helps train better language tagging models. We also have proposed a new annotation scheme to obtain part-of-speech tags for code-switched English-Spanish language. The annotation scheme is composed of three different subtasks including automatic labeling, word-specific questions labeling and question-tree word labeling. The part-of-speech labels obtained for the Miami Bangor corpus of English-Spanish conversational speech show very high agreement and accuracy. The second section of this thesis focuses on the tasks of part-of-speech tagging and language modeling. For the first task, we proposed a state-of-the-art approach to part-of-speech tagging of code-switched English-Spanish data based on recurrent neural networks.Our models were tested on the Miami Bangor corpus on the task of POS tagging alone, for which we achieved 96.34% accuracy, and joint part-of-speech and language ID tagging,which achieved similar POS tagging accuracy (96.39%) and very high language ID accuracy (98.78%). For the task of language modeling, we first conducted an exhaustive analysis of the relationship between cognate words and code-switching. We then proposed a set of cognate-based features that helped improve language modeling performance by 12% relative points. Furthermore, we showed that these features can also be used across language pairs and still obtain performance improvements. Finally, we tackled the question of how to use monolingual resources for code-switching models by pre-training state-of-the-art cross-lingual language models on large monolingual corpora and fine-tuning them on the tasks of language modeling and word-level language tagging on code-switched data. We obtained state-of-the-art results on both tasks

Columbia University Academic Commons

Not all Fake News is Written: A Dataset and Analysis of Misleading Video Headlines

Author: Boyd-Graber Jordan
Hassan Naeemul
Sung Yoo Yeon
Publication venue
Publication date: 14/12/2023
Field of study

Polarization and the marketplace for impressions have conspired to make navigating information online difficult for users, and while there has been a significant effort to detect false or misleading text, multimodal datasets have received considerably less attention. To complement existing resources, we present multimodal Video Misleading Headline (VMH), a dataset that consists of videos and whether annotators believe the headline is representative of the video's contents. After collecting and annotating this dataset, we analyze multimodal baselines for detecting misleading headlines. Our annotation process also focuses on why annotators view a video as misleading, allowing us to better understand the interplay of annotators' background and the content of the videos.Comment: EMNLP 2023 Main Pape

arXiv.org e-Print Archive

Strategies to Address Data Sparseness in Implicit Semantic Role Labeling

Author: Feizabadi Parvin Sadat
Publication venue
Publication date: 01/01/2019
Field of study

Natural language texts frequently contain predicates whose complete understanding re- quires access to other parts of the discourse. Human readers can retrieve such infor- mation across sentence boundaries and infer the implicit piece of information. This capability enables us to understand complicated texts without needing to repeat the same information in every single sentence. However, for computational systems, resolv- ing such information is problematic because computational approaches traditionally rely on sentence-level processing and rarely take into account the extra-sentential context. In this dissertation, we investigate this omission phenomena, called implicit semantic role labeling. Implicit semantic role labeling involves identification of predicate argu- ments that are not locally realized but are resolvable from the context. For example, in ”What’s the matter, Walters? asked Baynes sharply.”, the ADDRESSEE of the predicate ask, Walters, is not mentioned as one of its syntactic arguments, but can be recoverable from the previous sentence. In this thesis, we try to improve methods for the automatic processing of such predicate instances to improve natural language pro- cessing applications. Our main contribution is introducing approaches to solve the data sparseness problem of the task. We improve automatic identification of implicit roles by increasing the amount of training set without needing to annotate new instances. For this purpose, we propose two approaches. As the first one, we use crowdsourcing to annotate instances of implicit semantic roles and show that with an appropriate task de- sign, reliable annotation of implicit semantic roles can be obtained from the non-experts without the need to present precise and linguistic definition of the roles to them. As the second approach, we combine seemingly incompatible corpora to solve the problem of data sparseness of ISRL by applying a domain adaptation technique. We show that out of domain data from a different genre can be successfully used to improve a baseline implicit semantic role labeling model, when used with an appropriate domain adapta- tion technique. The results also show that the improvement occurs regardless of the predicate part of speech, that is, identification of implicit roles relies more on semantic features than syntactic ones. Therefore, annotating instances of nominal predicates, for instance, can help to improve identification of verbal predicates’ implicit roles, we well. Our findings also show that the variety of the additional data is more important than its size. That is, increasing a large amount of data does not necessarily lead to a better model

Heidelberger Dokumentenserver

Grounding event references in news

Author: Altena R.
Geerlings W.A.
Klingeren B. van
Lange W.C.M. de
Werf T.S.
Publication venue: School of Information Technologies
Publication date: 01/01/2000
Field of study

Events are frequently discussed in natural language, and their accurate identification is central to language understanding. Yet they are diverse and complex in ontology and reference; computational processing hence proves challenging. News provides a shared basis for communication by reporting events. We perform several studies into news event reference. One annotation study characterises each news report in terms of its update and topic events, but finds that topic is better consider through explicit references to background events. In this context, we propose the event linking task which—analogous to named entity linking or disambiguation—models the grounding of references to notable events. It defines the disambiguation of an event reference as a link to the archival article that first reports it. When two references are linked to the same article, they need not be references to the same event. Event linking hopes to provide an intuitive approximation to coreference, erring on the side of over-generation in contrast with the literature. The task is also distinguished in considering event references from multiple perspectives over time. We diagnostically evaluate the task by first linking references to past, newsworthy events in news and opinion pieces to an archive of the Sydney Morning Herald. The intensive annotation results in only a small corpus of 229 distinct links. However, we observe that a number of hyperlinks targeting online news correspond to event links. We thus acquire two large corpora of hyperlinks at very low cost. From these we learn weights for temporal and term overlap features in a retrieval system. These noisy data lead to significant performance gains over a bag-of-words baseline. While our initial system can accurately predict many event links, most will require deep linguistic processing for their disambiguation

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

Sydney eScholarship

Radboud Repository

Dissertations of the University of Groningen

Design and implementation of a high productivity user interface for a digital dermatoscope

Author
Publication venue
Publication date
Field of study

Information technology offers great potential for healthcare applications. Modern medicine is increasingly taking advantage of digital imaging and computer-assisted diagnosis. Dermatology is no different. Digital dermatoscopy is emerging as the standard for diagnosis of cutaneous lesions. High quality digital images allow dermatologists to improve accuracy, and to assess the evolution of lesions. However, state-of-the-art technology fails to support dermatologists in daily practice: the available systems on the market increase average visit time, and are expensive. Enabling a highly efficient use of the digital dermatoscope will shorten average visit time, and thus allow screening a higher portion of the population at risk with higher frequenc

Padua Thesis and Dissertation Archive

Grounding event references in news

Author: Nothman Joel
Publication venue: Faculty of Engineering and Information Technologies, School of Information Technologies
Publication date: 01/01/2014
Field of study

Sydney eScholarship