Search CORE

1,997 research outputs found

Recommended from our members

Characterising semantically coherent classes of text through feature discovery

Author: Robertson Andrew David
Publication venue
Publication date: 09/07/2019
Field of study

There is a growing need to provide support for social scientists and humanities scholars to gather and “engage” with very large datasets of free text, to perform very bespoke analyses. method52 is a text analysis platform built for this purpose (Wibberley et al., 2014), and forms a foundation that this thesis builds upon. A central part of method52 and its methodologies is a classifier training component based on dualist (Settles, 2011), and the general process of data engagement with method52 is determined to constitute a continuous cycle of characterising semantically coherent sub-collections, classes, of the text. Two broad methodologies exist for supporting this type of engagement process: (1) a top-down approach wherein concepts and their relationships are explicitly modelled for reasoning, and (2) a more surface-level, bottom-up approach, which entails the use of key terms (surface features) to characterise data. Following the second of these approaches, this thesis examines ways of better supporting this type of data engagement to more effectively support the needs of social scientists and humanities scholars in engaging with text data. The classifier component provides an active learning training environment emphasising the labelling of individual features. However, it can be difficult to interpret and incorporate prior knowledge of features. The process of feature discovery based on the current classifier model does not always produce useful results. And understanding the data well enough to produce successful classifiers is timeconsuming. A new method for discovering features in a corpus is introduced, and feature discovery methods are explored to resolve these issues. When collecting social media data, documents are often obtained by querying an API with a set of key phrases. Therefore, the set of possible classes characterising the data is defined by these basic surface features. It is difficult to know exactly which terms must searched for, and the usefulness of terms can change over time as new discussions and vocabulary emerge. Building on the feature discovery techniques, a framework is presented in this thesis for streaming data with an automatically adapting query to deal with these issues

Sussex Research Online

Using distributional similarity to organise biomedical terminology

Author: Dowdall James
Keller Bill
Schneider Gerold
Weeds Julie
Weir David
Publication venue: 'John Benjamins Publishing Company'
Publication date: 01/01/2005
Field of study

We investigate an application of distributional similarity techniques to the problem of structural organisation of biomedical terminology. Our application domain is the relatively small GENIA corpus. Using terms that have been accurately marked-up by hand within the corpus, we consider the problem of automatically determining semantic proximity. Terminological units are dened for our purposes as normalised classes of individual terms. Syntactic analysis of the corpus data is carried out using the Pro3Gres parser and provides the data required to calculate distributional similarity using a variety of dierent measures. Evaluation is performed against a hand-crafted gold standard for this domain in the form of the GENIA ontology. We show that distributional similarity can be used to predict semantic type with a good degree of accuracy

ZORA

Sussex Research Online

Descriptive document clustering via discriminant learning in a co-embedded space of multilevel similarities

Author: Ananiadou
Arai
Baraldi
Baraldi
Beil
Belkin
Bengio
Bengio
Bharambe
Carmel
Carpineto
Chang
Chen
Cheng
Cover
Cribbin
Cristianini
Cutting
Deerwester
Domingos
Drineas
Dubin
Duda
Eckart
Frantzi
Geraci
Globerson
Hatzivassiloglou
Haykin
Hearst
Hussain
Jain
Jayabharathy
Jones
Kohonen
Korkontzelos
Koshman
Kovács
Lagus
Lam
Lan
Li
Li
Luxburg
Mu
Mu
Mu
Noel
Osiński
Osiński
Ouyang
Rooneya
Salton
Stefanowski
Syed
Theodosiou
Thomas
Torgerson
Tseng
Wang
Xu
Xu
Zeng
Zhang
Publication venue: 'Wiley'
Publication date: 03/12/2014
Field of study

Descriptive document clustering aims at discovering clusters of semantically interrelated documents together with meaningful labels to summarize the content of each document cluster. In this work, we propose a novel descriptive clustering framework, referred to as CEDL. It relies on the formulation and generation of 2 types of heterogeneous objects, which correspond to documents and candidate phrases, using multilevel similarity information. CEDL is composed of 5 main processing stages. First, it simultaneously maps the documents and candidate phrases into a common co‐embedded space that preserves higher‐order, neighbor‐based proximities between the combined sets of documents and phrases. Then, it discovers an approximate cluster structure of documents in the common space. The third stage extracts promising topic phrases by constructing a discriminant model where documents along with their cluster memberships are used as training instances. Subsequently, the final cluster labels are selected from the topic phrases using a ranking scheme using multiple scores based on the extracted co‐embedding information and the discriminant output. The final stage polishes the initial clusters to reduce noise and accommodate the multitopic nature of documents. The effectiveness and competitiveness of CEDL is demonstrated qualitatively and quantitatively with experiments using document databases from different application fields

University of Liverpool Repository

Crossref

Edge Hill University Research Information Repository

The University of Manchester - Institutional Repository

Natural language processing and cognitive science : proceedings 2018

Author: Lubaszewski Wiesław
Sedes Florence
Sharp Bernadette
Publication venue: Jagiellonian Library
Publication date: 01/01/2018
Field of study

Jagiellonian Univeristy Repository

Modelling naturalistic argumentation in research literatures: representation and interaction design issues

Author: Buckingham Shum Simon J.
Li Gangmin
Mancini Clara
Sereno Bertrand
Uren Victoria
Publication venue: 'Wiley'
Publication date: 01/01/2007
Field of study

This paper characterises key weaknesses in the ability of current digital libraries to support scholarly inquiry, and as a way to address these, proposes computational services grounded in semiformal models of the naturalistic argumentation commonly found in research lteratures. It is argued that a design priority is to balance formal expressiveness with usability, making it critical to co-evolve the modelling scheme with appropriate user interfaces for argument construction and analysis. We specify the requirements for an argument modelling scheme for use by untrained researchers, describe the resulting ontology, contrasting it with other domain modelling and semantic web approaches, before discussing passive and intelligent user interfaces designed to support analysts in the construction, navigation and analysis of scholarly argument structures in a Web-based environment

Crossref

Open Research Online (The Open University)

The architecture of partisan debates: The online controversy on the no-deal Brexit

Author: Santagiustina Carlo Romano Marcello Alessandro
Warglien Massimo
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2022
Field of study

We propose a framework to analyse partisan debates that involves extracting, classifying and exploring the latent argumentation structure and dynamics of online societal controversies. In this paper, the focus is placed on causal arguments, and the proposed framework is applied to the Twitter debate on the consequences of a hard Brexit scenario. Regular expressions based on causative verbs, structural topic modelling, and dynamic time warping techniques were used to identify partisan faction arguments, as well as their relations, and to infer agenda-setting dynamics. The results highlight that the arguments employed by partisan factions are mostly constructed around constellations of effect-classes based on polarised verb groups. These constellations show that the no-deal debate hinges on structurally balanced building blocks. Brexiteers focus more on arguments related to greenfield trading opportunities and increased autonomy, whereas Remainers argue more about what a no-deal Brexit could destroy, focusing on hard border issues, social tensions in Ireland and Scotland and other economy- and healthcare-related problems. More notably, inferred debate leadership dynamics show that, despite their different usage of terms and arguments, the two factions’ argumentation dynamics are strongly intertwined. Moreover, the identified periods in which agenda-setting roles change are linked to major events, such as extensions, elections and the Yellowhammer plan leak, and to new issues that emerged in relation to these events

Directory of Open Access Journals

PubMed Central

Archivio istituzionale della ricerca - Università degli Studi di Venezia Ca' Foscari

Deploying mutation impact text-mining software with the SADI Semantic Web Services framework

Author: Baker Christopher JO
Laurila Jonas Bergman
Riazanov Alexandre
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Background: Mutation impact extraction is an important task designed to harvest relevant annotations from scientific documents for reuse in multiple contexts. Our previous work on text mining for mutation impacts resulted in (i) the development of a GATE-based pipeline that mines texts for information about impacts of mutations on proteins, (ii) the population of this information into our OWL DL mutation impact ontology, and (iii) establishing an experimental semantic database for storing the results of text mining. Results: This article explores the possibility of using the SADI framework as a medium for publishing our mutation impact software and data. SADI is a set of conventions for creating web services with semantic descriptions that facilitate automatic discovery and orchestration. We describe a case study exploring and demonstrating the utility of the SADI approach in our context. We describe several SADI services we created based on our text mining API and data, and demonstrate how they can be used in a number of biologically meaningful scenarios through a SPARQL interface (SHARE) to SADI services. In all cases we pay special attention to the integration of mutation impact services with external SADI services providing information about related biological entities, such as proteins, pathways, and drugs. Conclusion: We have identified that SADI provides an effective way of exposing our mutation impact data suc

CiteSeerX

Crossref

Springer - Publisher Connector

PubMed Central

Characterising the language demands of the Key Stage 3 National Curriculum for Wales (2000):Towards a ‘functional approach’ to planning English as an Additional Language development

Author: Brentnall Jonathan
Publication venue
Publication date: 26/10/2010
Field of study

Aberystwyth Research Portal

Semantic Spaces for Video Analysis of Behaviour

Author: Xu Xun
Publication venue: 'Queen Mary University of London'
Publication date: 12/06/2017
Field of study

PhDThere are ever growing interests from the computer vision community into human behaviour analysis based on visual sensors. These interests generally include: (1) behaviour recognition - given a video clip or specific spatio-temporal volume of interest discriminate it into one or more of a set of pre-defined categories; (2) behaviour retrieval - given a video or textual description as query, search for video clips with related behaviour; (3) behaviour summarisation - given a number of video clips, summarise out representative and distinct behaviours. Although countless efforts have been dedicated into problems mentioned above, few works have attempted to analyse human behaviours in a semantic space. In this thesis, we define semantic spaces as a collection of high-dimensional Euclidean space in which semantic meaningful events, e.g. individual word, phrase and visual event, can be represented as vectors or distributions which are referred to as semantic representations. With the semantic space, semantic texts, visual events can be quantitatively compared by inner product, distance and divergence. The introduction of semantic spaces can bring lots of benefits for visual analysis. For example, discovering semantic representations for visual data can facilitate semantic meaningful video summarisation, retrieval and anomaly detection. Semantic space can also seamlessly bridge categories and datasets which are conventionally treated independent. This has encouraged the sharing of data and knowledge across categories and even datasets to improve recognition performance and reduce labelling effort. Moreover, semantic space has the ability to generalise learned model beyond known classes which is usually referred to as zero-shot learning. Nevertheless, discovering such a semantic space is non-trivial due to (1) semantic space is hard to define manually. Humans always have a good sense of specifying the semantic relatedness between visual and textual instances. But a measurable and finite semantic space can be difficult to construct with limited manual supervision. As a result, constructing semantic space from data is adopted to learn in an unsupervised manner; (2) It is hard to build a universal semantic space, i.e. this space is always contextual dependent. So it is important to build semantic space upon selected data such that it is always meaningful within the context. Even with a well constructed semantic space, challenges are still present including; (3) how to represent visual instances in the semantic space; and (4) how to mitigate the misalignment of visual feature and semantic spaces across categories and even datasets when knowledge/data are generalised. This thesis tackles the above challenges by exploiting data from different sources and building contextual semantic space with which data and knowledge can be transferred and shared to facilitate the general video behaviour analysis. To demonstrate the efficacy of semantic space for behaviour analysis, we focus on studying real world problems including surveillance behaviour analysis, zero-shot human action recognition and zero-shot crowd behaviour recognition with techniques specifically tailored for the nature of each problem. Firstly, for video surveillances scenes, we propose to discover semantic representations from the visual data in an unsupervised manner. This is due to the largely availability of unlabelled visual data in surveillance systems. By representing visual instances in the semantic space, data and annotations can be generalised to new events and even new surveillance scenes. Specifically, to detect abnormal events this thesis studies a geometrical alignment between semantic representation of events across scenes. Semantic actions can be thus transferred to new scenes and abnormal events can be detected in an unsupervised way. To model multiple surveillance scenes simultaneously, we show how to learn a shared semantic representation across a group of semantic related scenes through a multi-layer clustering of scenes. With multi-scene modelling we show how to improve surveillance tasks including scene activity profiling/understanding, crossscene query-by-example, behaviour classification, and video summarisation. Secondly, to avoid extremely costly and ambiguous video annotating, we investigate how to generalise recognition models learned from known categories to novel ones, which is often termed as zero-shot learning. To exploit the limited human supervision, e.g. category names, we construct the semantic space via a word-vector representation trained on large textual corpus in an unsupervised manner. Representation of visual instance in semantic space is obtained by learning a visual-to-semantic mapping. We notice that blindly applying the mapping learned from known categories to novel categories can cause bias and deteriorating the performance which is termed as domain shift. To solve this problem we employed techniques including semisupervised learning, self-training, hubness correction, multi-task learning and domain adaptation. All these methods in combine achieve state-of-the-art performance in zero-shot human action task. In the last, we study the possibility to re-use known and manually labelled semantic crowd attributes to recognise rare and unknown crowd behaviours. This task is termed as zero-shot crowd behaviours recognition. Crucially we point out that given the multi-labelled nature of semantic crowd attributes, zero-shot recognition can be improved by exploiting the co-occurrence between attributes. To summarise, this thesis studies methods for analysing video behaviours and demonstrates that exploring semantic spaces for video analysis is advantageous and more importantly enables multi-scene analysis and zero-shot learning beyond conventional learning strategies

Queen Mary Research Online

Visualizing internetworked argumentation

Author: A Knott
A Knott
DJ Watts
FM Shipman
HWJ Rittel
L Egghe
PAR Erdos
R Trigg
RJJ Boland
S Buckingham Shum
S Toulmin
TR Gruber
W Reader
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2003
Field of study

In this chapter, we outline a project which traces its source of inspiration back to the grand visions of Vannevar Bush (scholarly trails of linked concepts), Doug Engelbart (highly interactive intellectual tools, particularly for argumentation), and Ted Nelson (large scale internet publishing with recognised intellectual property). In essence, we are tackling the age-old question of how to organise distributed, collective knowledge. Specifically, we pose the following question as a foil: In 2010, will scholarly knowledge still be published solely in prose, or can we imagine a complementary infrastructure that is ‘native’ to the emerging semantic, collaborative web, enabling more effective dissemination and analysis of ideas

Crossref

OPUS - University of Technology Sydney

Open Research Online (The Open University)