1,178 research outputs found
Generalized Shortest Path Kernel on Graphs
We consider the problem of classifying graphs using graph kernels. We define
a new graph kernel, called the generalized shortest path kernel, based on the
number and length of shortest paths between nodes. For our example
classification problem, we consider the task of classifying random graphs from
two well-known families, by the number of clusters they contain. We verify
empirically that the generalized shortest path kernel outperforms the original
shortest path kernel on a number of datasets. We give a theoretical analysis
for explaining our experimental results. In particular, we estimate
distributions of the expected feature vectors for the shortest path kernel and
the generalized shortest path kernel, and we show some evidence explaining why
our graph kernel outperforms the shortest path kernel for our graph
classification problem.Comment: Short version presented at Discovery Science 2015 in Banf
Recommended from our members
A novel string representation and kernel function for the comparison of I/O access patterns
Parallel I/O access patterns act as fingerprints of a parallel program. In order to extract meaningful information from these patterns, they have to be represented appropriately. Due to the fact that string objects can be easily compared using Kernel Methods, a conversion to a weighted string representation is proposed in this paper, together with a novel string kernel function called Kast Spectrum Kernel. The similarity matrices, obtained after applying the mentioned kernel over a set of examples from a real application, were analyzed using Kernel Principal Component Analysis (Kernel PCA) and Hierarchical Clustering. The evaluation showed that 2 out of 4 I/O access pattern groups were completely identified, while the other 2 conformed a single cluster due to the intrinsic similarity of their members. The proposed strategy can be promisingly applied to other similarity problems involving tree-like structured data
Space-efficient Feature Maps for String Alignment Kernels
String kernels are attractive data analysis tools for analyzing string data.
Among them, alignment kernels are known for their high prediction accuracies in
string classifications when tested in combination with SVM in various
applications. However, alignment kernels have a crucial drawback in that they
scale poorly due to their quadratic computation complexity in the number of
input strings, which limits large-scale applications in practice. We address
this need by presenting the first approximation for string alignment kernels,
which we call space-efficient feature maps for edit distance with moves
(SFMEDM), by leveraging a metric embedding named edit sensitive parsing (ESP)
and feature maps (FMs) of random Fourier features (RFFs) for large-scale string
analyses. The original FMs for RFFs consume a huge amount of memory
proportional to the dimension d of input vectors and the dimension D of output
vectors, which prohibits its large-scale applications. We present novel
space-efficient feature maps (SFMs) of RFFs for a space reduction from O(dD) of
the original FMs to O(d) of SFMs with a theoretical guarantee with respect to
concentration bounds. We experimentally test SFMEDM on its ability to learn SVM
for large-scale string classifications with various massive string data, and we
demonstrate the superior performance of SFMEDM with respect to prediction
accuracy, scalability and computation efficiency.Comment: Full version for ICDM'19 pape
Extending local features with contextual information in graph kernels
Graph kernels are usually defined in terms of simpler kernels over local
substructures of the original graphs. Different kernels consider different
types of substructures. However, in some cases they have similar predictive
performances, probably because the substructures can be interpreted as
approximations of the subgraphs they induce. In this paper, we propose to
associate to each feature a piece of information about the context in which the
feature appears in the graph. A substructure appearing in two different graphs
will match only if it appears with the same context in both graphs. We propose
a kernel based on this idea that considers trees as substructures, and where
the contexts are features too. The kernel is inspired from the framework in
[6], even if it is not part of it. We give an efficient algorithm for computing
the kernel and show promising results on real-world graph classification
datasets.Comment: To appear in ICONIP 201
GRP78 expression in canine mammary tumors: association with malignancy
78-kDa glucose-regulated protein (GRP78) is over-expressed in human breast carcinomas. GRP78 expression was studied in 40 spontaneous canine mammary tumors and evaluated in relation to tumor histological type, mode of growth, grade, lymph node metastases and distant metastases. All tumors exhibited GRP78 immunostaining. In the normal canine mammary gland, GRP78 was also expressed although not in all cases. In carcinomas GRP78 was detected in the cytoplasm in more than 50% of tumor cells in the vast majority of cases (87.5%). There was a significant association between the absence of squamous differentiation (PÂ =Â 0.02) and GRP78 over-expression, but no association with other clinico-pathological features. GRP78 was often co-expressed with galectin-3 in canine mammary tumors (CMT).
Micrometer-sized Water Ice Particles for Planetary Science Experiments: Influence of Surface Structure on Collisional Properties
Models and observations suggest that ice-particle aggregation at and beyond the snowline dominates the earliest stages of planet formation, which therefore is subject to many laboratory studies. However, the pressureâtemperature gradients in protoplanetary disks mean that the ices are constantly processed, undergoing phase changes between different solid phases and the gas phase. Open questions remain as to whether the properties of the icy particles themselves dictate collision outcomes and therefore how effectively collision experiments reproduce conditions in protoplanetary environments. Previous experiments often yielded apparently contradictory results on collision outcomes, only agreeing in a temperature dependence setting in above â210 K. By exploiting the unique capabilities of the NIMROD neutron scattering instrument, we characterized the bulk and surface structure of icy particles used in collision experiments, and studied how these structures alter as a function of temperature at a constant pressure of around 30 mbar. Our icy grains, formed under liquid nitrogen, undergo changes in the crystalline ice-phase, sublimation, sintering and surface pre-melting as they are heated from 103 to 247 K. An increase in the thickness of the diffuse surface layer from â10 to â30 Ă
(â2.5 to 12 bilayers) proves increased molecular mobility at temperatures above â210 K. Because none of the other changes tie-in with the temperature trends in collisional outcomes, we conclude that the surface pre-melting phenomenon plays a key role in collision experiments at these temperatures. Consequently, the pressureâtemperature environment, may have a larger influence on collision outcomes than previously thought
Inductive queries for a drug designing robot scientist
It is increasingly clear that machine learning algorithms need to be integrated in an iterative scientific discovery loop, in which data is queried repeatedly by means of inductive queries and where the computer provides guidance to the experiments that are being performed. In this chapter, we summarise several key challenges in achieving this integration of machine learning and data mining algorithms in methods for the discovery of Quantitative Structure Activity Relationships (QSARs). We introduce the concept of a robot scientist, in which all steps of the discovery process are automated; we discuss the representation of molecular data such that knowledge discovery tools can analyse it, and we discuss the adaptation of machine learning and data mining algorithms to guide QSAR experiments
Surface morphology of AlGaN/GaN heterostructures grown on bulk GaN by MBE
In this report the influence of the growth conditions on the surface morphology of AlGaN/GaN heterostructures grown on sapphire-based and bulk GaN substrates is nondestructively investigated with focus on the decoration of defects and the surface roughness. Under Ga-rich conditions specific types of dislocations are unintentionally decorated with shallow hillocks. In contrast, under Ga-lean conditions deep pits are inherently formed at these defect sites. The structural data show that the dislocation density of the substrate sets the limit for the density of dislocation-mediated surface structures after MBE overgrowth and no noticeable amount of surface defects is introduced during the MBE procedure. Moreover, the transfer of crystallographic information, e.g. the miscut of the substrate to the overgrown structure, is confirmed. The combination of our MBE overgrowth with the employed surface morphology analysis by atomic force microscopy (AFM) provides a unique possibility for a nondestructive, retrospective analysis of the original substrate defect density prior to device processing
Tensile strained membranes for cavity optomechanics
We investigate the optomechanical properties of tensile-strained ternary
InGaP nanomembranes grown on GaAs. This material system combines the benefits
of highly strained membranes based on stoichiometric silicon nitride, with the
unique properties of thin-film semiconductor single crystals, as previously
demonstrated with suspended GaAs. Here we employ lattice mismatch in epitaxial
growth to impart an intrinsic tensile strain to a monocrystalline thin film
(approximately 30 nm thick). These structures exhibit mechanical quality
factors of 2*10^6 or beyond at room temperature and 17 K for eigenfrequencies
up to 1 MHz, yielding Q*f products of 2*10^12 Hz for a tensile stress of ~170
MPa. Incorporating such membranes in a high finesse Fabry-Perot cavity, we
extract an upper limit to the total optical loss (including both absorption and
scatter) of 40 ppm at 1064 nm and room temperature. Further reductions of the
In content of this alloy will enable tensile stress levels of 1 GPa, with the
potential for a significant increase in the Q*f product, assuming no
deterioration in the mechanical loss at this composition and strain level. This
materials system is a promising candidate for the integration of strained
semiconductor membrane structures with low-loss semiconductor mirrors and for
realizing stacks of membranes for enhanced optomechanical coupling.Comment: 10 pages, 3 figure
The functional readthrough extension of malate dehydrogenase reveals a modification of the genetic code
Translational readthrough gives rise to C-terminally extended proteins, thereby providing the cell with new protein isoforms. These may have different properties from the parental proteins if the extensions contain functional domains. While for most genes amino acid incorporation at the stop codon is far lower than 0.1%, about 4% of malate dehydrogenase (MDH1) is physiologically extended by translational readthrough and the actual ratio of MDH1x (extended protein) to ânormal' MDH1 is dependent on the cell type. In human cells, arginine and tryptophan are co-encoded by the MDH1x UGA stop codon. Readthrough is controlled by the 7-nucleotide high-readthrough stop codon context without contribution of the subsequent 50 nucleotides encoding the extension. All vertebrate MDH1x is directed to peroxisomes via a hidden peroxisomal targeting signal (PTS) in the readthrough extension, which is more highly conserved than the extension of lactate dehydrogenase B. The hidden PTS of non-mammalian MDH1x evolved to be more efficient than the PTS of mammalian MDH1x. These results provide insight into the genetic and functional co-evolution of these dually localized dehydrogenases
- âŠ