102 research outputs found
Temporal Cross-Media Retrieval with Soft-Smoothing
Multimedia information have strong temporal correlations that shape the way
modalities co-occur over time. In this paper we study the dynamic nature of
multimedia and social-media information, where the temporal dimension emerges
as a strong source of evidence for learning the temporal correlations across
visual and textual modalities. So far, cross-media retrieval models, explored
the correlations between different modalities (e.g. text and image) to learn a
common subspace, in which semantically similar instances lie in the same
neighbourhood. Building on such knowledge, we propose a novel temporal
cross-media neural architecture, that departs from standard cross-media
methods, by explicitly accounting for the temporal dimension through temporal
subspace learning. The model is softly-constrained with temporal and
inter-modality constraints that guide the new subspace learning task by
favouring temporal correlations between semantically similar and temporally
close instances. Experiments on three distinct datasets show that accounting
for time turns out to be important for cross-media retrieval. Namely, the
proposed method outperforms a set of baselines on the task of temporal
cross-media retrieval, demonstrating its effectiveness for performing temporal
subspace learning.Comment: To appear in ACM MM 201
A network-based dynamical ranking system for competitive sports
From the viewpoint of networks, a ranking system for players or teams in
sports is equivalent to a centrality measure for sports networks, whereby a
directed link represents the result of a single game. Previously proposed
network-based ranking systems are derived from static networks, i.e.,
aggregation of the results of games over time. However, the score of a player
(or team) fluctuates over time. Defeating a renowned player in the peak
performance is intuitively more rewarding than defeating the same player in
other periods. To account for this factor, we propose a dynamic variant of such
a network-based ranking system and apply it to professional men's tennis data.
We derive a set of linear online update equations for the score of each player.
The proposed ranking system predicts the outcome of the future games with a
higher accuracy than the static counterparts.Comment: 6 figure
Cross-modal subspace learning with scheduled adaptive margin constraints
This work has been partially funded by the CMU Portugal research project GoLocal Ref. CMUP-ERI/TIC/0046/2014, by the H2020 ICT project COGNITUS with the grant agreement no 687605 and by the FCT project NOVA LINCS Ref. UID/CEC/04516/2019. We also gratefully acknowledge the support of NVIDIA Corporation with the donation of the GPUs used for this research.Cross-modal embeddings, between textual and visual modalities, aim to organise multimodal instances by their semantic correlations. State-of-the-art approaches use maximum-margin methods, based on the hinge-loss, to enforce a constant margin m, to separate projections of multimodal instances from different categories. In this paper, we propose a novel scheduled adaptive maximum-margin (SAM) formulation that infers triplet-specific constraints during training, therefore organising instances by adaptively enforcing inter-category and inter-modality correlations. This is supported by a scheduled adaptive margin function, that is smoothly activated, replacing a static margin by an adaptively inferred one reflecting triplet-specific semantic correlations while accounting for the incremental learning behaviour of neural networks to enforce category cluster formation and enforcement. Experiments on widely used datasets show that our model improved upon state-of-the-art approaches, by achieving a relative improvement of up to approximate to 12.5% over the second best method, thus confirming the effectiveness of our scheduled adaptive margin formulation.publishersversionpublishe
Diachronic cross-modal embeddings
This work has been partially funded by the CMU Portugal research project GoLocal Ref. CMUP-ERI/TIC/0046/2014, by the H2020 ICT project COGNITUS with the grant agreement no 687605 and by the FCT project NOVA LINCS Ref. UID/CEC/04516/2019. We also gratefully acknowledge the support of NVIDIA Corporation with the donation of the GPUs used for this research.Understanding the semantic shifts of multimodal information is only possible with models that capture cross-modal interactions over time. Under this paradigm, a new embedding is needed that structures visual-textual interactions according to the temporal dimension, thus, preserving data's original temporal organisation. This paper introduces a novel diachronic cross-modal embedding (DCM), where cross-modal correlations are represented in embedding space, throughout the temporal dimension, preserving semantic similarity at each instant t. To achieve this, we trained a neural cross-modal architecture, under a novel ranking loss strategy, that for each multimodal instance, enforces neighbour instances' temporal alignment, through subspace structuring constraints based on a temporal alignment window. Experimental results show that our DCM embedding successfully organises instances over time. Quantitative experiments, confirm that DCM is able to preserve semantic cross-modal correlations at each instant t while also providing better alignment capabilities. Qualitative experiments unveil new ways to browse multimodal content and hint that multimodal understanding tasks can benefit from this new embedding.publishersversionpublishe
Heterogeneous Nuclear Ribonucleoprotein K Is Overexpressed in Acute Myeloid Leukemia and Causes Myeloproliferation in Mice via Altered
Acute myeloid leukemia (AML) is driven by numerous molecular events that contribute to disease progression. Herein, we identify hnRNP K overexpression as a recurrent abnormality in AML that negatively correlates with patient survival. Overexpression of hnRNP K in murine fetal liver cells results in altered self-renewal and differentiation potential. Further, murine transplantation models reveal that hnRNP K overexpression results in myeloproliferation in vivo. Mechanistic studies expose a direct functional relationship between hnRNP K and RUNX1—a master transcriptional regulator of hematopoiesis often dysregulated in leukemia. Molecular analyses show that overexpression of hnRNP K results in an enrichment of an alternatively spliced isoform of RUNX1 lacking exon 4. Our work establishes hnRNP K’s oncogenic potential in influencing myelogenesis through its regulation of RUNX1 splicing and subsequent transcriptional activity
One-Pass Ranking Models for Low-Latency Product Recommendations
Purchase logs collected in e-commerce platforms provide rich information about customer preferences. These logs can be leveraged to improve the quality of product recommenda-tions by feeding them to machine-learned ranking models. However, a variety of deployment constraints limit the näıve applicability of machine learning to this problem. First, the amount and the dimensionality of the data make in-memory learning simply not possible. Second, the drift of customers’ preference over time require to retrain the ranking model regularly with freshly collected data. This limits the time that is available for training to prohibitively short intervals. Third, ranking in real-time is necessary whenever the query complexity prevents us from caching the predictions. This constraint requires to minimize prediction time (or equiva
Detecting Remote Evolutionary Relationships among Proteins by Large-Scale Semantic Embedding
Virtually every molecular biologist has searched a protein or DNA sequence database to find sequences that are evolutionarily related to a given query. Pairwise sequence comparison methods—i.e., measures of similarity between query and target sequences—provide the engine for sequence database search and have been the subject of 30 years of computational research. For the difficult problem of detecting remote evolutionary relationships between protein sequences, the most successful pairwise comparison methods involve building local models (e.g., profile hidden Markov models) of protein sequences. However, recent work in massive data domains like web search and natural language processing demonstrate the advantage of exploiting the global structure of the data space. Motivated by this work, we present a large-scale algorithm called ProtEmbed, which learns an embedding of protein sequences into a low-dimensional “semantic space.” Evolutionarily related proteins are embedded in close proximity, and additional pieces of evidence, such as 3D structural similarity or class labels, can be incorporated into the learning process. We find that ProtEmbed achieves superior accuracy to widely used pairwise sequence methods like PSI-BLAST and HHSearch for remote homology detection; it also outperforms our previous RankProp algorithm, which incorporates global structure in the form of a protein similarity network. Finally, the ProtEmbed embedding space can be visualized, both at the global level and local to a given query, yielding intuition about the structure of protein sequence space
Incorporating rich background knowledge for gene named entity classification and recognition
<p>Abstract</p> <p>Background</p> <p>Gene named entity classification and recognition are crucial preliminary steps of text mining in biomedical literature. Machine learning based methods have been used in this area with great success. In most state-of-the-art systems, elaborately designed lexical features, such as words, n-grams, and morphology patterns, have played a central part. However, this type of feature tends to cause extreme sparseness in feature space. As a result, out-of-vocabulary (OOV) terms in the training data are not modeled well due to lack of information.</p> <p>Results</p> <p>We propose a general framework for gene named entity representation, called feature coupling generalization (FCG). The basic idea is to generate higher level features using term frequency and co-occurrence information of highly indicative features in huge amount of unlabeled data. We examine its performance in a named entity classification task, which is designed to remove non-gene entries in a large dictionary derived from online resources. The results show that new features generated by FCG outperform lexical features by 5.97 F-score and 10.85 for OOV terms. Also in this framework each extension yields significant improvements and the sparse lexical features can be transformed into both a lower dimensional and more informative representation. A forward maximum match method based on the refined dictionary produces an F-score of 86.2 on BioCreative 2 GM test set. Then we combined the dictionary with a conditional random field (CRF) based gene mention tagger, achieving an F-score of 89.05, which improves the performance of the CRF-based tagger by 4.46 with little impact on the efficiency of the recognition system. A demo of the NER system is available at <url>http://202.118.75.18:8080/bioner</url>.</p
Learning valued relations from data
Driven by a large number of potential applications in areas like bioinformatics, information retrieval and social network analysis, the problem setting of inferring relations between pairs of data objects has recently been investigated quite intensively in the machine learning community. To this end, current approaches typically consider datasets containing crisp relations, so that standard classification methods can be adopted. However, relations between objects like similarities and preferences are in many real-world applications often expressed in a graded manner. A general kernel-based framework for learning relations from data is introduced here. It extends existing approaches because both crisp and valued relations are considered, and it unifies existing approaches because different types of valued relations can be modeled, including symmetric and reciprocal relations. This framework establishes in this way important links between recent developments in fuzzy set theory and machine learning. Its usefulness is demonstrated on a case study in document retrieval
Tabular: A Schema-driven Probabilistic Programming Language
We propose a new kind of probabilistic programming language for machine learning. We write programs simply by annotating existing relational schemas with probabilistic model expressions. We describe a detailed design of our language, Tabular, complete with formal semantics and type system. A rich series of examples illustrates the expressiveness of Tabular. We report an implementation, and show evidence of the succinctness of our notation relative to current best practice. Finally, we describe and verify a transformation of Tabular schemas so as to predict missing values in a concrete database. The ability to query for missing values provides a uniform interface to a wide variety of tasks, including classification, clustering, recommendation, and ranking
- …