38,700 research outputs found
Boilerplate Removal using a Neural Sequence Labeling Model
The extraction of main content from web pages is an important task for
numerous applications, ranging from usability aspects, like reader views for
news articles in web browsers, to information retrieval or natural language
processing. Existing approaches are lacking as they rely on large amounts of
hand-crafted features for classification. This results in models that are
tailored to a specific distribution of web pages, e.g. from a certain time
frame, but lack in generalization power. We propose a neural sequence labeling
model that does not rely on any hand-crafted features but takes only the HTML
tags and words that appear in a web page as input. This allows us to present a
browser extension which highlights the content of arbitrary web pages directly
within the browser using our model. In addition, we create a new, more current
dataset to show that our model is able to adapt to changes in the structure of
web pages and outperform the state-of-the-art model.Comment: WWW20 Demo pape
Potential use of oxygen as a metabolic biosensor in combination with T2*-weighted MRI to define the ischemic penumbra
We describe a novel magnetic resonance imaging technique for detecting metabolism indirectly through changes in oxyhemoglobin:deoxyhemoglobin ratios and T2* signal change during ‘oxygen challenge’ (OC, 5 mins 100% O2). During OC, T2* increase reflects O2 binding to deoxyhemoglobin, which is formed when metabolizing tissues take up oxygen. Here OC has been applied to identify tissue metabolism within the ischemic brain. Permanent middle cerebral artery occlusion was induced in rats. In series 1 scanning (n=5), diffusion-weighted imaging (DWI) was performed, followed by echo-planar T2* acquired during OC and perfusion-weighted imaging (PWI, arterial spin labeling). Oxygen challenge induced a T2* signal increase of 1.8%, 3.7%, and 0.24% in the contralateral cortex, ipsilateral cortex within the PWI/DWI mismatch zone, and ischemic core, respectively. T2* and apparent diffusion coefficient (ADC) map coregistration revealed that the T2* signal increase extended into the ADC lesion (3.4%). In series 2 (n=5), FLASH T2* and ADC maps coregistered with histology revealed a T2* signal increase of 4.9% in the histologically defined border zone (55% normal neuronal morphology, located within the ADC lesion boundary) compared with a 0.7% increase in the cortical ischemic core (92% neuronal ischemic cell change, core ADC lesion). Oxygen challenge has potential clinical utility and, by distinguishing metabolically active and inactive tissues within hypoperfused regions, could provide a more precise assessment of penumbra
First Author Advantage: Citation Labeling in Research
Citations among research papers, and the networks they form, are the primary
object of study in scientometrics. The act of making a citation reflects the
citer's knowledge of the related literature, and of the work being cited. We
aim to gain insight into this process by studying citation keys: user-chosen
labels to identify a cited work. Our main observation is that the first listed
author is disproportionately represented in such labels, implying a strong
mental bias towards the first author.Comment: Computational Scientometrics: Theory and Applications at The 22nd
CIKM 201
Spontaneous Subtle Expression Detection and Recognition based on Facial Strain
Optical strain is an extension of optical flow that is capable of quantifying
subtle changes on faces and representing the minute facial motion intensities
at the pixel level. This is computationally essential for the relatively new
field of spontaneous micro-expression, where subtle expressions can be
technically challenging to pinpoint. In this paper, we present a novel method
for detecting and recognizing micro-expressions by utilizing facial optical
strain magnitudes to construct optical strain features and optical strain
weighted features. The two sets of features are then concatenated to form the
resultant feature histogram. Experiments were performed on the CASME II and
SMIC databases. We demonstrate on both databases, the usefulness of optical
strain information and more importantly, that our best approaches are able to
outperform the original baseline results for both detection and recognition
tasks. A comparison of the proposed method with other existing spatio-temporal
feature extraction approaches is also presented.Comment: 21 pages (including references), single column format, accepted to
Signal Processing: Image Communication journa
A review of domain adaptation without target labels
Domain adaptation has become a prominent problem setting in machine learning
and related fields. This review asks the question: how can a classifier learn
from a source domain and generalize to a target domain? We present a
categorization of approaches, divided into, what we refer to as, sample-based,
feature-based and inference-based methods. Sample-based methods focus on
weighting individual observations during training based on their importance to
the target domain. Feature-based methods revolve around on mapping, projecting
and representing features such that a source classifier performs well on the
target domain and inference-based methods incorporate adaptation into the
parameter estimation procedure, for instance through constraints on the
optimization procedure. Additionally, we review a number of conditions that
allow for formulating bounds on the cross-domain generalization error. Our
categorization highlights recurring ideas and raises questions important to
further research.Comment: 20 pages, 5 figure
Less is More: Micro-expression Recognition from Video using Apex Frame
Despite recent interest and advances in facial micro-expression research,
there is still plenty room for improvement in terms of micro-expression
recognition. Conventional feature extraction approaches for micro-expression
video consider either the whole video sequence or a part of it, for
representation. However, with the high-speed video capture of micro-expressions
(100-200 fps), are all frames necessary to provide a sufficiently meaningful
representation? Is the luxury of data a bane to accurate recognition? A novel
proposition is presented in this paper, whereby we utilize only two images per
video: the apex frame and the onset frame. The apex frame of a video contains
the highest intensity of expression changes among all frames, while the onset
is the perfect choice of a reference frame with neutral expression. A new
feature extractor, Bi-Weighted Oriented Optical Flow (Bi-WOOF) is proposed to
encode essential expressiveness of the apex frame. We evaluated the proposed
method on five micro-expression databases: CAS(ME), CASME II, SMIC-HS,
SMIC-NIR and SMIC-VIS. Our experiments lend credence to our hypothesis, with
our proposed technique achieving a state-of-the-art F1-score recognition
performance of 61% and 62% in the high frame rate CASME II and SMIC-HS
databases respectively.Comment: 14 pages double-column, author affiliations updated, acknowledgment
of grant support adde
- …