16,985 research outputs found
Propagation Kernels
We introduce propagation kernels, a general graph-kernel framework for
efficiently measuring the similarity of structured data. Propagation kernels
are based on monitoring how information spreads through a set of given graphs.
They leverage early-stage distributions from propagation schemes such as random
walks to capture structural information encoded in node labels, attributes, and
edge information. This has two benefits. First, off-the-shelf propagation
schemes can be used to naturally construct kernels for many graph types,
including labeled, partially labeled, unlabeled, directed, and attributed
graphs. Second, by leveraging existing efficient and informative propagation
schemes, propagation kernels can be considerably faster than state-of-the-art
approaches without sacrificing predictive performance. We will also show that
if the graphs at hand have a regular structure, for instance when modeling
image or video data, one can exploit this regularity to scale the kernel
computation to large databases of graphs with thousands of nodes. We support
our contributions by exhaustive experiments on a number of real-world graphs
from a variety of application domains
Improvements on coronal hole detection in SDO/AIA images using supervised classification
We demonstrate the use of machine learning algorithms in combination with
segmentation techniques in order to distinguish coronal holes and filaments in
SDO/AIA EUV images of the Sun. Based on two coronal hole detection techniques
(intensity-based thresholding, SPoCA), we prepared data sets of manually
labeled coronal hole and filament channel regions present on the Sun during the
time range 2011 - 2013. By mapping the extracted regions from EUV observations
onto HMI line-of-sight magnetograms we also include their magnetic
characteristics. We computed shape measures from the segmented binary maps as
well as first order and second order texture statistics from the segmented
regions in the EUV images and magnetograms. These attributes were used for data
mining investigations to identify the most performant rule to differentiate
between coronal holes and filament channels. We applied several classifiers,
namely Support Vector Machine, Linear Support Vector Machine, Decision Tree,
and Random Forest and found that all classification rules achieve good results
in general, with linear SVM providing the best performances (with a true skill
statistic of ~0.90). Additional information from magnetic field data
systematically improves the performance across all four classifiers for the
SPoCA detection. Since the calculation is inexpensive in computing time, this
approach is well suited for applications on real-time data. This study
demonstrates how a machine learning approach may help improve upon an
unsupervised feature extraction method.Comment: in press for SWS
Knowledge representation and text mining in biomedical, healthcare, and political domains
Knowledge representation and text mining can be employed to discover new knowledge and develop services by using the massive amounts of text gathered by modern information systems. The applied methods should take into account the domain-specific nature of knowledge. This thesis explores knowledge representation and text mining in three application domains.
Biomolecular events can be described very precisely and concisely with appropriate representation schemes. Protein–protein interactions are commonly modelled in biological databases as binary relationships, whereas the complex relationships used in text mining are rich in information. The experimental results of this thesis show that complex relationships can be reduced to binary relationships and that it is possible to reconstruct complex relationships from mixtures of linguistically similar relationships. This encourages the extraction of complex relationships from the scientific literature even if binary relationships are required by the application at hand. The experimental results on cross-validation schemes for pair-input data help to understand how existing knowledge regarding dependent instances (such those concerning protein–protein pairs) can be leveraged to improve the generalisation performance estimates of learned models.
Healthcare documents and news articles contain knowledge that is more difficult to model than biomolecular events and tend to have larger vocabularies than biomedical scientific articles. This thesis describes an ontology that models patient education documents and their content in order to improve the availability and quality of such documents. The experimental results of this thesis also show that the Recall-Oriented Understudy for Gisting Evaluation measures are a viable option for the automatic evaluation of textual patient record summarisation methods and that the area under the receiver operating characteristic curve can be used in a large-scale sentiment analysis. The sentiment analysis of Reuters news corpora suggests that the Western mainstream media portrays China negatively in politics-related articles but not in general, which provides new evidence to consider in the debate over the image of China in the Western media
A unified pseudo- framework
The pseudo- is an algorithm for estimating the angular power and
cross-power spectra that is very fast and, in realistic cases, also nearly
optimal. The algorithm can be extended to deal with contaminant deprojection
and purification, and can therefore be applied in a wide variety of
scenarios of interest for current and future cosmological observations. This
paper presents NaMaster, a public, validated, accurate and easy-to-use software
package that, for the first time, provides a unified framework to compute
angular cross-power spectra of any pair of spin-0 or spin-2 fields,
contaminated by an arbitrary number of linear systematics and requiring - or
-mode purification, both on the sphere or in the flat-sky approximation. We
describe the mathematical background of the estimator, including all the
features above, and its software implementation in NaMaster. We construct a
validation suite that aims to resemble the types of observations that
next-generation large-scale structure and ground-based CMB experiments will
face, and use it to show that the code is able to recover the input power
spectra in the most complex scenarios with no detectable bias. NaMaster can be
found at https://github.com/LSSTDESC/NaMaster, and is provided with
comprehensive documentation and a number of code examples.Comment: 27 pages, 17 figures, accepted in MNRAS. Code can be found at
https://github.com/LSSTDESC/NaMaste
The impact of beam deconvolution on noise properties in CMB measurements: Application to Planck LFI
We present an analysis of the effects of beam deconvolution on noise
properties in CMB measurements. The analysis is built around the artDeco beam
deconvolver code. We derive a low-resolution noise covariance matrix that
describes the residual noise in deconvolution products, both in harmonic and
pixel space. The matrix models the residual correlated noise that remains in
time-ordered data after destriping, and the effect of deconvolution on it. To
validate the results, we generate noise simulations that mimic the data from
the Planck LFI instrument. A test for the full 70 GHz covariance in
multipole range yields a mean reduced of 1.0037. We
compare two destriping options, full and independent destriping, when
deconvolving subsets of available data. Full destriping leaves substantially
less residual noise, but leaves data sets intercorrelated. We derive also a
white noise covariance matrix that provides an approximation of the full noise
at high multipoles, and study the properties on high-resolution noise in pixel
space through simulations.Comment: 22 pages, 25 figure
Efficient Regularized Least-Squares Algorithms for Conditional Ranking on Relational Data
In domains like bioinformatics, information retrieval and social network
analysis, one can find learning tasks where the goal consists of inferring a
ranking of objects, conditioned on a particular target object. We present a
general kernel framework for learning conditional rankings from various types
of relational data, where rankings can be conditioned on unseen data objects.
We propose efficient algorithms for conditional ranking by optimizing squared
regression and ranking loss functions. We show theoretically, that learning
with the ranking loss is likely to generalize better than with the regression
loss. Further, we prove that symmetry or reciprocity properties of relations
can be efficiently enforced in the learned models. Experiments on synthetic and
real-world data illustrate that the proposed methods deliver state-of-the-art
performance in terms of predictive power and computational efficiency.
Moreover, we also show empirically that incorporating symmetry or reciprocity
properties can improve the generalization performance
THE TOOLS AND MONTE CARLO WORKING GROUP Summary Report from the Les Houches 2009 Workshop on TeV Colliders
This is the summary and introduction to the proceedings contributions for the
Les Houches 2009 "Tools and Monte Carlo" working group.Comment: 144 Pages. Workshop site
http://wwwlapp.in2p3.fr/conferences/LesHouches/Houches2009/ . Conveners were
Butterworth, Maltoni, Moortgat, Richardson, Schumann and Skand
- …