Search CORE

847,897 research outputs found

BlogForever D2.6: Data Extraction Methodology

Author: Banos V.
Davis R.
Gkotsis G.
Pincent E.
Stepanyan K.
Publication venue
Publication date: 25/10/2013
Field of study

This report outlines an inquiry into the area of web data extraction, conducted within the context of blog preservation. The report reviews theoretical advances and practical developments for implementing data extraction. The inquiry is extended through an experiment that demonstrates the effectiveness and feasibility of implementing some of the suggested approaches. More specifically, the report discusses an approach based on unsupervised machine learning that employs the RSS feeds and HTML representations of blogs. It outlines the possibilities of extracting semantics available in blogs and demonstrates the benefits of exploiting available standards such as microformats and microdata. The report proceeds to propose a methodology for extracting and processing blog data to further inform the design and development of the BlogForever platform

ZENODO

Optimal Extraction of Fibre Optic Spectroscopy

Author: Hill
Lee
M. N. Birchall
Markwardt
Parry
Press
R. Sharp
Saunders
Sharp
Sharp
Publication venue: 'CSIRO Publishing'
Publication date: 02/12/2009
Field of study

We report an optimal extraction methodology, for the reduction of multi-object fibre spectroscopy data, operating in the regime of tightly packed (and hence significantly overlapping) fibre profiles. The routine minimises crosstalk between adjacent fibres and statistically weights the extraction to reduce noise. As an example of the process we use simulations of the numerous modes of operation of the AAOmega fibre spectrograph and observational data from the SPIRAL Integral Field Unit at the Anglo-Australian Telescope.Comment: Accepted for publication in PAS

arXiv.org e-Print Archive

Crossref

The Australian National University

CLaSPS: a new methodology for Knowledge extraction from complex astronomical dataset

Author: Aihara
Babu
Bonfield
Borne
Budavári
C. Donalek
Civano
D'Abrusco
Elvis
Evans
Fabbiano
Fraix-Burnet
G. Djorgovski
G. Fabbiano
G. Longo
Hartigan
Lloyd
Martin
Massaro
Massaro
Massaro
Massaro
Mukherjee
Nolan
O. Laurino
R Development Core Team
R. D'Abrusco
Skrutskie
Strehl
Taylor
Vignali
Way
Way
Wright
Publication venue: 'IOP Publishing'
Publication date: 01/01/2012
Field of study

In this paper we present the Clustering-Labels-Score Patterns Spotter (CLaSPS), a new methodology for the determination of correlations among astronomical observables in complex datasets, based on the application of distinct unsupervised clustering techniques. The novelty in CLaSPS is the criterion used for the selection of the optimal clusterings, based on a quantitative measure of the degree of correlation between the cluster memberships and the distribution of a set of observables, the labels, not employed for the clustering. In this paper we discuss the applications of CLaSPS to two simple astronomical datasets, both composed of extragalactic sources with photometric observations at different wavelengths from large area surveys. The first dataset, CSC+, is composed of optical quasars spectroscopically selected in the SDSS data, observed in the X-rays by Chandra and with multi-wavelength observations in the near-infrared, optical and ultraviolet spectral intervals. One of the results of the application of CLaSPS to the CSC+ is the re-identification of a well-known correlation between the alphaOX parameter and the near ultraviolet color, in a subset of CSC+ sources with relatively small values of the near-ultraviolet colors. The other dataset consists of a sample of blazars for which photometric observations in the optical, mid and near infrared are available, complemented for a subset of the sources, by Fermi gamma-ray data. The main results of the application of CLaSPS to such datasets have been the discovery of a strong correlation between the multi-wavelength color distribution of blazars and their optical spectral classification in BL Lacs and Flat Spectrum Radio Quasars and a peculiar pattern followed by blazars in the WISE mid-infrared colors space. This pattern and its physical interpretation have been discussed in details in other papers by one of the authors.Comment: 18 pages, 9 figures, accepted for publication in Ap

arXiv.org e-Print Archive

Archivio della ricerca - Università degli studi di Napoli Federico II

Crossref

Caltech Authors

Optimizing a sustainable ultrasound assisted extraction method for the recovery of polyphenols from lemon by-products:comparison with hot water and organic solvent extractions

Author: Bowyer Michael C.
Golding John B.
Papoutsis Konstantinos
Pristijono Penta
Scarlett Christopher J.
Stathopoulos Costas E.
Vuong Quan V.
Publication venue
Publication date: 19/02/2018
Field of study

Response surface methodology (RSM) based on a three-factor and three-level Box–Behnken design was employed for optimizing the aqueous ultrasound-assisted extraction (AUAE) conditions, including extraction time (35–45 min), extraction temperature (45–55 °C) and ultrasonic power (150–250 W), for the recovery of total phenolic content (TPC) and rutin from lemon by-products. The independent variables and their values were selected on the basis of preliminary experiments, where the effects of five extraction parameters (particle size, extraction time and temperature, ultrasonic power and sample-to-solvent ratio) on TPC and rutin extraction yields were investigated. The yields of TPC and rutin were studied using a second-order polynomial equation. The optimum AUAE conditions for TPC were extraction time of 45 min, extraction temperature of 50 °C and ultrasonic power of 250 W with a predicted value of 18.10 ± 0.24 mg GAE/g dw, while the optimum AUAE conditions for rutin were extraction time of 35 min, extraction temperature of 48 °C and ultrasonic power of 150W with a predicted value of 3.20 ± 0.12 mg/g dw. The extracts obtained at the optimum AUAE conditions were compared with those obtained by a hot water and an organic solvent conventional extraction in terms of TPC, total flavonoid content (TF) and antioxidant capacity. The extracts obtained by AUAE had the same TPC, TF and ferric reducing antioxidant power as those achieved by organic solvent conventional extraction. However, hot water extraction led to extracts with the highest flavonoid content and antioxidant capacity. Scanning electron microscopy analysis showed that all the extraction methods led to cell damage to varying extents

Abertay Research Portal

Crossref

Crowdsourcing Semantic Label Propagation in Relation Classification

Author: Aroyo Lora
Dumitrache Anca
Welty Chris
Publication venue
Publication date: 03/09/2018
Field of study

Distant supervision is a popular method for performing relation extraction from text that is known to produce noisy labels. Most progress in relation extraction and classification has been made with crowdsourced corrections to distant-supervised labels, and there is evidence that indicates still more would be better. In this paper, we explore the problem of propagating human annotation signals gathered for open-domain relation classification through the CrowdTruth methodology for crowdsourcing, that captures ambiguity in annotations by measuring inter-annotator disagreement. Our approach propagates annotations to sentences that are similar in a low dimensional embedding space, expanding the number of labels by two orders of magnitude. Our experiments show significant improvement in a sentence-level multi-class relation classifier.Comment: In publication at the First Workshop on Fact Extraction and Verification (FeVer) at EMNLP 201

arXiv.org e-Print Archive