1,219 research outputs found
How Many Dissimilarity/Kernel Self Organizing Map Variants Do We Need?
In numerous applicative contexts, data are too rich and too complex to be
represented by numerical vectors. A general approach to extend machine learning
and data mining techniques to such data is to really on a dissimilarity or on a
kernel that measures how different or similar two objects are. This approach
has been used to define several variants of the Self Organizing Map (SOM). This
paper reviews those variants in using a common set of notations in order to
outline differences and similarities between them. It discusses the advantages
and drawbacks of the variants, as well as the actual relevance of the
dissimilarity/kernel SOM for practical applications
Creating Growth by Connecting PlaceBased Development Strategies
In the past years, the European Commission launched three thematic Smart Specialisation platforms to support interregional collaborations and to support European Union regions committed to co-invest jointly in strategic growth areas. The bottom up component in this process has resulted in a wide variety of industry-scientific partnerships at regional and transnational levels. These networks include regions, which are very different in terms of innovation ecosystems, but nevertheless connected through shared thematic focus enabling transnational processes of innovation. This paper explains how interregional partnerships build on the efforts and results achieved in national and regional research and innovation strategies for Smart Specialisation and how, as a result of this, new European innovation ecosystems are emerging. With reference to existing literature and experiences so far, the paper outlines a conceptual framework of how transnational cooperation may strengthen regional place-based development strategies and improve regional innovation capabilities. Key analytical concepts are proximity, knowledge complexity, entrepreneurial discovery processes, stakeholder analysis and cluster emergence.© European Union, 2020. The reuse policy of the European Commission is implemented by the Commission Decision 2011/833/EU of 12 December 2011 on the reuse of Commission documents (OJ L 330, 14.12.2011, p. 39). Except otherwise noted, the reuse of this document is authorised under the Creative Commons Attribution 4.0 International (CC BY 4.0) licence (https://creativecommons.org/licenses/by/4.0/). This means that reuse is allowed provided appropriate credit is given and any changes are indicated. For any use or reproduction of photos or other material that is not owned by the EU, permission must be sought directly from the copyright holders.JRC Technical report JRC12224
Open innovation and the formation of universityâindustry links in the food manufacturing and technology sector: evidence from the UK
Purpose
Despite typically being regarded as âlow-tech,â the Food Manufacturing and Technology Sector is increasingly turning to open innovation practices involving collaboration with universities in order to innovate. Given the broad range of activities undertaken by this sector and the fact that it utilises analytical, synthetic, and symbolic knowledge for innovation, it makes an interesting case study on the factors that influence the formation of University-Industry links.
Design/methodology/approach
Using data from 249 collaborative projects that occurred between UK universities and food manufacturing and technology firms, the analysis utilises a logistic regression model based on a âsynthetic counterfactual approachâ to modelling the probability a collaborative link will be established with one university and not others.
Findings
The results suggest that organisational proximity, conceptualised through the presence of prior ties between actors, have the largest influence on the formation of U-I links. In addition, spatial and technological proximity between actors also have a positive influence on link formation. This result suggests that the specificity of knowledge to the food sector is important in the formation of these U-I links.
Research limitations/implications
2
The results suggest that the open innovation practices of food manufacturing and technology firms are like other sectors, even though their innovation practices are considered to be different. However, the limitations of the paper mean that these findings may be specific to firms in the food manufacturing and technology sector in the UK.
Originality/value
The food sector is under-represented in empirical studies on university collaboration; this paper addresses this and provides new insights into the formation of these links
Domain-adaptive Message Passing Graph Neural Network
Cross-network node classification (CNNC), which aims to classify nodes in a
label-deficient target network by transferring the knowledge from a source
network with abundant labels, draws increasing attention recently. To address
CNNC, we propose a domain-adaptive message passing graph neural network
(DM-GNN), which integrates graph neural network (GNN) with conditional
adversarial domain adaptation. DM-GNN is capable of learning informative
representations for node classification that are also transferrable across
networks. Firstly, a GNN encoder is constructed by dual feature extractors to
separate ego-embedding learning from neighbor-embedding learning so as to
jointly capture commonality and discrimination between connected nodes.
Secondly, a label propagation node classifier is proposed to refine each node's
label prediction by combining its own prediction and its neighbors' prediction.
In addition, a label-aware propagation scheme is devised for the labeled source
network to promote intra-class propagation while avoiding inter-class
propagation, thus yielding label-discriminative source embeddings. Thirdly,
conditional adversarial domain adaptation is performed to take the
neighborhood-refined class-label information into account during adversarial
domain adaptation, so that the class-conditional distributions across networks
can be better matched. Comparisons with eleven state-of-the-art methods
demonstrate the effectiveness of the proposed DM-GNN
Onset of an outline map to get a hold on the wildwood of clustering methods
The domain of cluster analysis is a meeting point for a very rich
multidisciplinary encounter, with cluster-analytic methods being studied and
developed in discrete mathematics, numerical analysis, statistics, data
analysis and data science, and computer science (including machine learning,
data mining, and knowledge discovery), to name but a few. The other side of the
coin, however, is that the domain suffers from a major accessibility problem as
well as from the fact that it is rife with division across many pretty isolated
islands. As a way out, the present paper offers an outline map for the
clustering domain as a whole, which takes the form of an overarching conceptual
framework and a common language. With this framework we wish to contribute to
structuring the domain, to characterizing methods that have often been
developed and studied in quite different contexts, to identifying links between
them, and to introducing a frame of reference for optimally setting up cluster
analyses in data-analytic practice.Comment: 33 pages, 4 figure
Representation Learning for Attributed Multiplex Heterogeneous Network
Network embedding (or graph embedding) has been widely used in many
real-world applications. However, existing methods mainly focus on networks
with single-typed nodes/edges and cannot scale well to handle large networks.
Many real-world networks consist of billions of nodes and edges of multiple
types, and each node is associated with different attributes. In this paper, we
formalize the problem of embedding learning for the Attributed Multiplex
Heterogeneous Network and propose a unified framework to address this problem.
The framework supports both transductive and inductive learning. We also give
the theoretical analysis of the proposed framework, showing its connection with
previous works and proving its better expressiveness. We conduct systematical
evaluations for the proposed framework on four different genres of challenging
datasets: Amazon, YouTube, Twitter, and Alibaba. Experimental results
demonstrate that with the learned embeddings from the proposed framework, we
can achieve statistically significant improvements (e.g., 5.99-28.23% lift by
F1 scores; p<<0.01, t-test) over previous state-of-the-art methods for link
prediction. The framework has also been successfully deployed on the
recommendation system of a worldwide leading e-commerce company, Alibaba Group.
Results of the offline A/B tests on product recommendation further confirm the
effectiveness and efficiency of the framework in practice.Comment: Accepted to KDD 2019. Website: https://sites.google.com/view/gatn
Possibilistic classifiers for numerical data
International audienceNaive Bayesian Classifiers, which rely on independence hypotheses, together with a normality assumption to estimate densities for numerical data, are known for their simplicity and their effectiveness. However, estimating densities, even under the normality assumption, may be problematic in case of poor data. In such a situation, possibility distributions may provide a more faithful representation of these data. Naive Possibilistic Classifiers (NPC), based on possibility theory, have been recently proposed as a counterpart of Bayesian classifiers to deal with classification tasks. There are only few works that treat possibilistic classification and most of existing NPC deal only with categorical attributes. This work focuses on the estimation of possibility distributions for continuous data. In this paper we investigate two kinds of possibilistic classifiers. The first one is derived from classical or flexible Bayesian classifiers by applying a probabilityâpossibility transformation to Gaussian distributions, which introduces some further tolerance in the description of classes. The second one is based on a direct interpretation of data in possibilistic formats that exploit an idea of proximity between data values in different ways, which provides a less constrained representation of them. We show that possibilistic classifiers have a better capability to detect new instances for which the classification is ambiguous than Bayesian classifiers, where probabilities may be poorly estimated and illusorily precise. Moreover, we propose, in this case, an hybrid possibilistic classification approach based on a nearest-neighbour heuristics to improve the accuracy of the proposed possibilistic classifiers when the available information is insufficient to choose between classes. Possibilistic classifiers are compared with classical or flexible Bayesian classifiers on a collection of benchmarks databases. The experiments reported show the interest of possibilistic classifiers. In particular, flexible possibilistic classifiers perform well for data agreeing with the normality assumption, while proximity-based possibilistic classifiers outperform others in the other cases. The hybrid possibilistic classification exhibits a good ability for improving accuracy
PhyloGibbs: A Gibbs Sampling Motif Finder That Incorporates Phylogeny
A central problem in the bioinformatics of gene regulation is to find the binding sites for regulatory proteins. One of the most promising approaches toward identifying these short and fuzzy sequence patterns is the comparative analysis of orthologous intergenic regions of related species. This analysis is complicated by various factors. First, one needs to take the phylogenetic relationship between the species into account in order to distinguish conservation that is due to the occurrence of functional sites from spurious conservation that is due to evolutionary proximity. Second, one has to deal with the complexities of multiple alignments of orthologous intergenic regions, and one has to consider the possibility that functional sites may occur outside of conserved segments. Here we present a new motif sampling algorithm, PhyloGibbs, that runs on arbitrary collections of multiple local sequence alignments of orthologous sequences. The algorithm searches over all ways in which an arbitrary number of binding sites for an arbitrary number of transcription factors (TFs) can be assigned to the multiple sequence alignments. These binding site configurations are scored by a Bayesian probabilistic model that treats aligned sequences by a model for the evolution of binding sites and âbackgroundâ intergenic DNA. This model takes the phylogenetic relationship between the species in the alignment explicitly into account. The algorithm uses simulated annealing and Monte Carlo Markov-chain sampling to rigorously assign posterior probabilities to all the binding sites that it reports. In tests on synthetic data and real data from five Saccharomyces species our algorithm performs significantly better than four other motif-finding algorithms, including algorithms that also take phylogeny into account. Our results also show that, in contrast to the other algorithms, PhyloGibbs can make realistic estimates of the reliability of its predictions. Our tests suggest that, running on the five-species multiple alignment of a single gene's upstream region, PhyloGibbs on average recovers over 50% of all binding sites in S. cerevisiae at a specificity of about 50%, and 33% of all binding sites at a specificity of about 85%. We also tested PhyloGibbs on collections of multiple alignments of intergenic regions that were recently annotated, based on ChIP-on-chip data, to contain binding sites for the same TF. We compared PhyloGibbs's results with the previous analysis of these data using six other motif-finding algorithms. For 16 of 21 TFs for which all other motif-finding methods failed to find a significant motif, PhyloGibbs did recover a motif that matches the literature consensus. In 11 cases where there was disagreement in the results we compiled lists of known target genes from the literature, and found that running PhyloGibbs on their regulatory regions yielded a binding motif matching the literature consensus in all but one of the cases. Interestingly, these literature gene lists had little overlap with the targets annotated based on the ChIP-on-chip data. The PhyloGibbs code can be downloaded from http://www.biozentrum.unibas.ch/~nimwegen/cgi-bin/phylogibbs.cgi or http://www.imsc.res.in/~rsidd/phylogibbs. The full set of predicted sites from our tests on yeast are available at http://www.swissregulon.unibas.ch
- âŠ