Search CORE

38,700 research outputs found

Boilerplate Removal using a Neural Sequence Labeling Model

Author: Anand Avishek
Khosla Megha
Leonhardt Jurek
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 22/04/2020
Field of study

The extraction of main content from web pages is an important task for numerous applications, ranging from usability aspects, like reader views for news articles in web browsers, to information retrieval or natural language processing. Existing approaches are lacking as they rely on large amounts of hand-crafted features for classification. This results in models that are tailored to a specific distribution of web pages, e.g. from a certain time frame, but lack in generalization power. We propose a neural sequence labeling model that does not rely on any hand-crafted features but takes only the HTML tags and words that appear in a web page as input. This allows us to present a browser extension which highlights the content of arbitrary web pages directly within the browser using our model. In addition, we create a new, more current dataset to show that our model is able to adapt to changes in the structure of web pages and outperform the state-of-the-art model.Comment: WWW20 Demo pape

arXiv.org e-Print Archive

Crossref

Potential use of oxygen as a metabolic biosensor in combination with T2*-weighted MRI to define the ischemic penumbra

Author: Barrie Condon
Brierley JB
Celestine Santosh
Christopher McCabe
David Brennan
David I Graham
Donald M Hadley
Haacke EM
I Mhairi Macrae
Keith W Muir
Law R
Lindsay Gallagher
Scremin OU
William M Holmes
Willy Gsell
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2008
Field of study

We describe a novel magnetic resonance imaging technique for detecting metabolism indirectly through changes in oxyhemoglobin:deoxyhemoglobin ratios and T2* signal change during ‘oxygen challenge’ (OC, 5 mins 100% O2). During OC, T2* increase reflects O2 binding to deoxyhemoglobin, which is formed when metabolizing tissues take up oxygen. Here OC has been applied to identify tissue metabolism within the ischemic brain. Permanent middle cerebral artery occlusion was induced in rats. In series 1 scanning (n=5), diffusion-weighted imaging (DWI) was performed, followed by echo-planar T2* acquired during OC and perfusion-weighted imaging (PWI, arterial spin labeling). Oxygen challenge induced a T2* signal increase of 1.8%, 3.7%, and 0.24% in the contralateral cortex, ipsilateral cortex within the PWI/DWI mismatch zone, and ischemic core, respectively. T2* and apparent diffusion coefficient (ADC) map coregistration revealed that the T2* signal increase extended into the ADC lesion (3.4%). In series 2 (n=5), FLASH T2* and ADC maps coregistered with histology revealed a T2* signal increase of 4.9% in the histologically defined border zone (55% normal neuronal morphology, located within the ADC lesion boundary) compared with a 0.7% increase in the cortical ischemic core (92% neuronal ischemic cell change, core ADC lesion). Oxygen challenge has potential clinical utility and, by distinguishing metabolically active and inactive tissues within hypoperfused regions, could provide a more precise assessment of penumbra

First Author Advantage: Citation Labeling in Research

Author: Cormode Graham
Muthukrishnan S.
Yan Jinyun
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2013
Field of study

Citations among research papers, and the networks they form, are the primary object of study in scientometrics. The act of making a citation reflects the citer's knowledge of the related literature, and of the work being cited. We aim to gain insight into this process by studying citation keys: user-chosen labels to identify a cited work. Our main observation is that the first listed author is disproportionately represented in such labels, implying a strong mental bias towards the first author.Comment: Computational Scientometrics: Theory and Applications at The 22nd CIKM 201

arXiv.org e-Print Archive

CiteSeerX

Crossref

Spontaneous Subtle Expression Detection and Recognition based on Facial Strain

Author: Liong Sze-Teng
Ngo Anh Cat Le
Oh Yee-Hui
Phan Raphael Chung-Wei
See John
Tan Su-Wei
Wong KokSheik
Publication venue: 'Elsevier BV'
Publication date: 08/06/2016
Field of study

Optical strain is an extension of optical flow that is capable of quantifying subtle changes on faces and representing the minute facial motion intensities at the pixel level. This is computationally essential for the relatively new field of spontaneous micro-expression, where subtle expressions can be technically challenging to pinpoint. In this paper, we present a novel method for detecting and recognizing micro-expressions by utilizing facial optical strain magnitudes to construct optical strain features and optical strain weighted features. The two sets of features are then concatenated to form the resultant feature histogram. Experiments were performed on the CASME II and SMIC databases. We demonstrate on both databases, the usefulness of optical strain information and more importantly, that our best approaches are able to outperform the original baseline results for both detection and recognition tasks. A comparison of the proposed method with other existing spatio-temporal feature extraction approaches is also presented.Comment: 21 pages (including references), single column format, accepted to Signal Processing: Image Communication journa

arXiv.org e-Print Archive

Deakin Research Online

Heriot Watt Pure

SHDL@MMU Digital Repository

A review of domain adaptation without target labels

Author: Kouw Wouter M.
Loog Marco
Publication venue
Publication date: 01/01/2019
Field of study

Domain adaptation has become a prominent problem setting in machine learning and related fields. This review asks the question: how can a classifier learn from a source domain and generalize to a target domain? We present a categorization of approaches, divided into, what we refer to as, sample-based, feature-based and inference-based methods. Sample-based methods focus on weighting individual observations during training based on their importance to the target domain. Feature-based methods revolve around on mapping, projecting and representing features such that a source classifier performs well on the target domain and inference-based methods incorporate adaptation into the parameter estimation procedure, for instance through constraints on the optimization procedure. Additionally, we review a number of conditions that allow for formulating bounds on the cross-domain generalization error. Our categorization highlights recurring ideas and raises questions important to further research.Comment: 20 pages, 5 figure

arXiv.org e-Print Archive

Crossref

Less is More: Micro-expression Recognition from Video using Apex Frame

Author: Liong Sze-Teng
Phan Raphael C. -W.
See John
Wong KokSheik
Publication venue: 'Elsevier BV'
Publication date: 01/01/2018
Field of study

Despite recent interest and advances in facial micro-expression research, there is still plenty room for improvement in terms of micro-expression recognition. Conventional feature extraction approaches for micro-expression video consider either the whole video sequence or a part of it, for representation. However, with the high-speed video capture of micro-expressions (100-200 fps), are all frames necessary to provide a sufficiently meaningful representation? Is the luxury of data a bane to accurate recognition? A novel proposition is presented in this paper, whereby we utilize only two images per video: the apex frame and the onset frame. The apex frame of a video contains the highest intensity of expression changes among all frames, while the onset is the perfect choice of a reference frame with neutral expression. A new feature extractor, Bi-Weighted Oriented Optical Flow (Bi-WOOF) is proposed to encode essential expressiveness of the apex frame. We evaluated the proposed method on five micro-expression databases: CAS(ME)

^2

, CASME II, SMIC-HS, SMIC-NIR and SMIC-VIS. Our experiments lend credence to our hypothesis, with our proposed technique achieving a state-of-the-art F1-score recognition performance of 61% and 62% in the high frame rate CASME II and SMIC-HS databases respectively.Comment: 14 pages double-column, author affiliations updated, acknowledgment of grant support adde

arXiv.org e-Print Archive

Heriot Watt Pure

SHDL@MMU Digital Repository