38,700 research outputs found

    Boilerplate Removal using a Neural Sequence Labeling Model

    Full text link
    The extraction of main content from web pages is an important task for numerous applications, ranging from usability aspects, like reader views for news articles in web browsers, to information retrieval or natural language processing. Existing approaches are lacking as they rely on large amounts of hand-crafted features for classification. This results in models that are tailored to a specific distribution of web pages, e.g. from a certain time frame, but lack in generalization power. We propose a neural sequence labeling model that does not rely on any hand-crafted features but takes only the HTML tags and words that appear in a web page as input. This allows us to present a browser extension which highlights the content of arbitrary web pages directly within the browser using our model. In addition, we create a new, more current dataset to show that our model is able to adapt to changes in the structure of web pages and outperform the state-of-the-art model.Comment: WWW20 Demo pape

    Potential use of oxygen as a metabolic biosensor in combination with T2*-weighted MRI to define the ischemic penumbra

    Get PDF
    We describe a novel magnetic resonance imaging technique for detecting metabolism indirectly through changes in oxyhemoglobin:deoxyhemoglobin ratios and T2* signal change during ‘oxygen challenge’ (OC, 5 mins 100% O2). During OC, T2* increase reflects O2 binding to deoxyhemoglobin, which is formed when metabolizing tissues take up oxygen. Here OC has been applied to identify tissue metabolism within the ischemic brain. Permanent middle cerebral artery occlusion was induced in rats. In series 1 scanning (n=5), diffusion-weighted imaging (DWI) was performed, followed by echo-planar T2* acquired during OC and perfusion-weighted imaging (PWI, arterial spin labeling). Oxygen challenge induced a T2* signal increase of 1.8%, 3.7%, and 0.24% in the contralateral cortex, ipsilateral cortex within the PWI/DWI mismatch zone, and ischemic core, respectively. T2* and apparent diffusion coefficient (ADC) map coregistration revealed that the T2* signal increase extended into the ADC lesion (3.4%). In series 2 (n=5), FLASH T2* and ADC maps coregistered with histology revealed a T2* signal increase of 4.9% in the histologically defined border zone (55% normal neuronal morphology, located within the ADC lesion boundary) compared with a 0.7% increase in the cortical ischemic core (92% neuronal ischemic cell change, core ADC lesion). Oxygen challenge has potential clinical utility and, by distinguishing metabolically active and inactive tissues within hypoperfused regions, could provide a more precise assessment of penumbra

    First Author Advantage: Citation Labeling in Research

    Full text link
    Citations among research papers, and the networks they form, are the primary object of study in scientometrics. The act of making a citation reflects the citer's knowledge of the related literature, and of the work being cited. We aim to gain insight into this process by studying citation keys: user-chosen labels to identify a cited work. Our main observation is that the first listed author is disproportionately represented in such labels, implying a strong mental bias towards the first author.Comment: Computational Scientometrics: Theory and Applications at The 22nd CIKM 201

    Spontaneous Subtle Expression Detection and Recognition based on Facial Strain

    Full text link
    Optical strain is an extension of optical flow that is capable of quantifying subtle changes on faces and representing the minute facial motion intensities at the pixel level. This is computationally essential for the relatively new field of spontaneous micro-expression, where subtle expressions can be technically challenging to pinpoint. In this paper, we present a novel method for detecting and recognizing micro-expressions by utilizing facial optical strain magnitudes to construct optical strain features and optical strain weighted features. The two sets of features are then concatenated to form the resultant feature histogram. Experiments were performed on the CASME II and SMIC databases. We demonstrate on both databases, the usefulness of optical strain information and more importantly, that our best approaches are able to outperform the original baseline results for both detection and recognition tasks. A comparison of the proposed method with other existing spatio-temporal feature extraction approaches is also presented.Comment: 21 pages (including references), single column format, accepted to Signal Processing: Image Communication journa

    A review of domain adaptation without target labels

    Full text link
    Domain adaptation has become a prominent problem setting in machine learning and related fields. This review asks the question: how can a classifier learn from a source domain and generalize to a target domain? We present a categorization of approaches, divided into, what we refer to as, sample-based, feature-based and inference-based methods. Sample-based methods focus on weighting individual observations during training based on their importance to the target domain. Feature-based methods revolve around on mapping, projecting and representing features such that a source classifier performs well on the target domain and inference-based methods incorporate adaptation into the parameter estimation procedure, for instance through constraints on the optimization procedure. Additionally, we review a number of conditions that allow for formulating bounds on the cross-domain generalization error. Our categorization highlights recurring ideas and raises questions important to further research.Comment: 20 pages, 5 figure

    Less is More: Micro-expression Recognition from Video using Apex Frame

    Full text link
    Despite recent interest and advances in facial micro-expression research, there is still plenty room for improvement in terms of micro-expression recognition. Conventional feature extraction approaches for micro-expression video consider either the whole video sequence or a part of it, for representation. However, with the high-speed video capture of micro-expressions (100-200 fps), are all frames necessary to provide a sufficiently meaningful representation? Is the luxury of data a bane to accurate recognition? A novel proposition is presented in this paper, whereby we utilize only two images per video: the apex frame and the onset frame. The apex frame of a video contains the highest intensity of expression changes among all frames, while the onset is the perfect choice of a reference frame with neutral expression. A new feature extractor, Bi-Weighted Oriented Optical Flow (Bi-WOOF) is proposed to encode essential expressiveness of the apex frame. We evaluated the proposed method on five micro-expression databases: CAS(ME)2^2, CASME II, SMIC-HS, SMIC-NIR and SMIC-VIS. Our experiments lend credence to our hypothesis, with our proposed technique achieving a state-of-the-art F1-score recognition performance of 61% and 62% in the high frame rate CASME II and SMIC-HS databases respectively.Comment: 14 pages double-column, author affiliations updated, acknowledgment of grant support adde
    • …
    corecore