16,985 research outputs found

    Propagation Kernels

    Full text link
    We introduce propagation kernels, a general graph-kernel framework for efficiently measuring the similarity of structured data. Propagation kernels are based on monitoring how information spreads through a set of given graphs. They leverage early-stage distributions from propagation schemes such as random walks to capture structural information encoded in node labels, attributes, and edge information. This has two benefits. First, off-the-shelf propagation schemes can be used to naturally construct kernels for many graph types, including labeled, partially labeled, unlabeled, directed, and attributed graphs. Second, by leveraging existing efficient and informative propagation schemes, propagation kernels can be considerably faster than state-of-the-art approaches without sacrificing predictive performance. We will also show that if the graphs at hand have a regular structure, for instance when modeling image or video data, one can exploit this regularity to scale the kernel computation to large databases of graphs with thousands of nodes. We support our contributions by exhaustive experiments on a number of real-world graphs from a variety of application domains

    Improvements on coronal hole detection in SDO/AIA images using supervised classification

    Full text link
    We demonstrate the use of machine learning algorithms in combination with segmentation techniques in order to distinguish coronal holes and filaments in SDO/AIA EUV images of the Sun. Based on two coronal hole detection techniques (intensity-based thresholding, SPoCA), we prepared data sets of manually labeled coronal hole and filament channel regions present on the Sun during the time range 2011 - 2013. By mapping the extracted regions from EUV observations onto HMI line-of-sight magnetograms we also include their magnetic characteristics. We computed shape measures from the segmented binary maps as well as first order and second order texture statistics from the segmented regions in the EUV images and magnetograms. These attributes were used for data mining investigations to identify the most performant rule to differentiate between coronal holes and filament channels. We applied several classifiers, namely Support Vector Machine, Linear Support Vector Machine, Decision Tree, and Random Forest and found that all classification rules achieve good results in general, with linear SVM providing the best performances (with a true skill statistic of ~0.90). Additional information from magnetic field data systematically improves the performance across all four classifiers for the SPoCA detection. Since the calculation is inexpensive in computing time, this approach is well suited for applications on real-time data. This study demonstrates how a machine learning approach may help improve upon an unsupervised feature extraction method.Comment: in press for SWS

    Knowledge representation and text mining in biomedical, healthcare, and political domains

    Get PDF
    Knowledge representation and text mining can be employed to discover new knowledge and develop services by using the massive amounts of text gathered by modern information systems. The applied methods should take into account the domain-specific nature of knowledge. This thesis explores knowledge representation and text mining in three application domains. Biomolecular events can be described very precisely and concisely with appropriate representation schemes. Protein–protein interactions are commonly modelled in biological databases as binary relationships, whereas the complex relationships used in text mining are rich in information. The experimental results of this thesis show that complex relationships can be reduced to binary relationships and that it is possible to reconstruct complex relationships from mixtures of linguistically similar relationships. This encourages the extraction of complex relationships from the scientific literature even if binary relationships are required by the application at hand. The experimental results on cross-validation schemes for pair-input data help to understand how existing knowledge regarding dependent instances (such those concerning protein–protein pairs) can be leveraged to improve the generalisation performance estimates of learned models. Healthcare documents and news articles contain knowledge that is more difficult to model than biomolecular events and tend to have larger vocabularies than biomedical scientific articles. This thesis describes an ontology that models patient education documents and their content in order to improve the availability and quality of such documents. The experimental results of this thesis also show that the Recall-Oriented Understudy for Gisting Evaluation measures are a viable option for the automatic evaluation of textual patient record summarisation methods and that the area under the receiver operating characteristic curve can be used in a large-scale sentiment analysis. The sentiment analysis of Reuters news corpora suggests that the Western mainstream media portrays China negatively in politics-related articles but not in general, which provides new evidence to consider in the debate over the image of China in the Western media

    A unified pseudo-CC_\ell framework

    Get PDF
    The pseudo-CC_\ell is an algorithm for estimating the angular power and cross-power spectra that is very fast and, in realistic cases, also nearly optimal. The algorithm can be extended to deal with contaminant deprojection and E/BE/B purification, and can therefore be applied in a wide variety of scenarios of interest for current and future cosmological observations. This paper presents NaMaster, a public, validated, accurate and easy-to-use software package that, for the first time, provides a unified framework to compute angular cross-power spectra of any pair of spin-0 or spin-2 fields, contaminated by an arbitrary number of linear systematics and requiring BB- or EE-mode purification, both on the sphere or in the flat-sky approximation. We describe the mathematical background of the estimator, including all the features above, and its software implementation in NaMaster. We construct a validation suite that aims to resemble the types of observations that next-generation large-scale structure and ground-based CMB experiments will face, and use it to show that the code is able to recover the input power spectra in the most complex scenarios with no detectable bias. NaMaster can be found at https://github.com/LSSTDESC/NaMaster, and is provided with comprehensive documentation and a number of code examples.Comment: 27 pages, 17 figures, accepted in MNRAS. Code can be found at https://github.com/LSSTDESC/NaMaste

    The impact of beam deconvolution on noise properties in CMB measurements: Application to Planck LFI

    Full text link
    We present an analysis of the effects of beam deconvolution on noise properties in CMB measurements. The analysis is built around the artDeco beam deconvolver code. We derive a low-resolution noise covariance matrix that describes the residual noise in deconvolution products, both in harmonic and pixel space. The matrix models the residual correlated noise that remains in time-ordered data after destriping, and the effect of deconvolution on it. To validate the results, we generate noise simulations that mimic the data from the Planck LFI instrument. A χ2\chi^2 test for the full 70 GHz covariance in multipole range =050\ell=0-50 yields a mean reduced χ2\chi^2 of 1.0037. We compare two destriping options, full and independent destriping, when deconvolving subsets of available data. Full destriping leaves substantially less residual noise, but leaves data sets intercorrelated. We derive also a white noise covariance matrix that provides an approximation of the full noise at high multipoles, and study the properties on high-resolution noise in pixel space through simulations.Comment: 22 pages, 25 figure

    Efficient Regularized Least-Squares Algorithms for Conditional Ranking on Relational Data

    Full text link
    In domains like bioinformatics, information retrieval and social network analysis, one can find learning tasks where the goal consists of inferring a ranking of objects, conditioned on a particular target object. We present a general kernel framework for learning conditional rankings from various types of relational data, where rankings can be conditioned on unseen data objects. We propose efficient algorithms for conditional ranking by optimizing squared regression and ranking loss functions. We show theoretically, that learning with the ranking loss is likely to generalize better than with the regression loss. Further, we prove that symmetry or reciprocity properties of relations can be efficiently enforced in the learned models. Experiments on synthetic and real-world data illustrate that the proposed methods deliver state-of-the-art performance in terms of predictive power and computational efficiency. Moreover, we also show empirically that incorporating symmetry or reciprocity properties can improve the generalization performance
    corecore