2,370 research outputs found

    Commonality Preserving Multiple Instance Clustering Based on Diverse Density

    Full text link
    Abstract. Image-set clustering is a problem decomposing a given im-age set into disjoint subsets satisfying specied criteria. For single vector image representations, proximity or similarity criterion is widely applied, i.e., proximal or similar images form a cluster. Recent trend of the im-age description, however, is the local feature based, i.e., an image is described by multiple local features, e.g., SIFT, SURF, and so on. In this description, which criterion should be employed for the clustering? As an answer to this question, this paper presents an image-set clus-tering method based on commonality, that is, images preserving strong commonality (coherent local features) form a cluster. In this criterion, image variations that do not affect common features are harmless. In the case of face images, hair-style changes and partial occlusions by glasses may not affect the cluster formation. We dened four commonality mea-sures based on Diverse Density, that are used in agglomerative clustering. Through comparative experiments, we conrmed that two of our meth-ods perform better than other methods examined in the experiments.

    Product-Driven Data Mining

    Get PDF
    Manifold Data Mining has developed innovative demographic and household spending pattern databases for six-digit postal codes in Canada. Their collection of information consists of both demographic and expenditure variables which are expressed through thousands of individually tracked factors. This large collection of information about consumer behaviour is typically referred to as a mine. Although very large in practice, for the purposes of this report, the data mine consisted of mm individuals and nn factors where m∼2000m \sim 2000 and n∼50n \sim 50 . Ideally, the first algorithm would identify a few factors in the data mine which would differentiate customers in terms of a particular product preference. Then the second algorithm would build on this information by looking for patterns in the data mine which would identify related areas of consumer spending. To test the algorithms two case studies were undertaken. The first study involved differentiating BMW and Honda car owners. The algorithms developed were reasonably successful at both finding questions that differentiate these two populations and identifying common characteristics amongst the groups of respondents. For the second case study it was hoped that the same algorithms could differentiate between consumers of two brands of beer. In this case the first algorithm was not as successful as differentiating between all groups; it showed some distinctions between beer drinkers and non-beer drinkers, but not as clearly defined as in the first case study. The second algorithm was then used successfully to further identify spending patterns once this distinction was made. In this second case study a deeper factor analysis could be used to identify a combination of factors which could be used in the first algorithm

    Preserving Modality Structure Improves Multi-Modal Learning

    Full text link
    Self-supervised learning on large-scale multi-modal datasets allows learning semantically meaningful embeddings in a joint multi-modal representation space without relying on human annotations. These joint embeddings enable zero-shot cross-modal tasks like retrieval and classification. However, these methods often struggle to generalize well on out-of-domain data as they ignore the semantic structure present in modality-specific embeddings. In this context, we propose a novel Semantic-Structure-Preserving Consistency approach to improve generalizability by preserving the modality-specific relationships in the joint embedding space. To capture modality-specific semantic relationships between samples, we propose to learn multiple anchors and represent the multifaceted relationship between samples with respect to their relationship with these anchors. To assign multiple anchors to each sample, we propose a novel Multi-Assignment Sinkhorn-Knopp algorithm. Our experimentation demonstrates that our proposed approach learns semantically meaningful anchors in a self-supervised manner. Furthermore, our evaluation on MSR-VTT and YouCook2 datasets demonstrates that our proposed multi-anchor assignment based solution achieves state-of-the-art performance and generalizes to both inand out-of-domain datasets. Code: https://github.com/Swetha5/Multi_Sinkhorn_KnoppComment: Accepted at ICCV 202

    R&D collaboration networks in the European FrameworkProgrammes: Data processing, network construction and selected results

    Get PDF
    We describe the construction of a large and novel data set on R&D collaboration networks in the first five EU Framework Programmes (FPs), examine key features and provide economic interpretations for our findings. The data set is based on publicly available raw data that pre-sents numerous challenges. We critically examine the different problems and detail how we have dealt with them. We describe how we construct networks from the processed data. The resulting networks display properties typical for large complex networks, including scale-free degree distributions and the small-world property. The former indicates the presence of net-work hubs, which we identify. Theoretical work shows the latter to be beneficial for knowl-edge creation and diffusion. Structural features are remarkably similar across FPs, indicating similar network formation mechanisms despite changes in governance rules. Several findings point towards the existence of a stable core of interlinked actors since the early FPs with inte-gration increasing over time. This core consists mainly of universities and research organisa-tions. The paper concludes with an agenda for future research.R&D collaboration, EU Framework Programmes, complex networks, small world effect, knowledge creation, knowledge diffusion, European Research Area

    Anomaly Detection in Sequential Data: A Deep Learning-Based Approach

    Get PDF
    Anomaly Detection has been researched in various domains with several applications in intrusion detection, fraud detection, system health management, and bio-informatics. Conventional anomaly detection methods analyze each data instance independently (univariate or multivariate) and ignore the sequential characteristics of the data. Anomalies in the data can be detected by grouping the individual data instances into sequential data and hence conventional way of analyzing independent data instances cannot detect anomalies. Currently: (1) Deep learning-based algorithms are widely used for anomaly detection purposes. However, significant computational overhead time is incurred during the training process due to static constant batch size and learning rate parameters for each epoch, (2) the threshold to decide whether an event is normal or malicious is often set as static. This can drastically increase the false alarm rate if the threshold is set low or decrease the True Alarm rate if it is set to a remarkably high value, (3) Real-life data is messy. It is impossible to learn the data features by training just one algorithm. Therefore, several one-class-based algorithms need to be trained. The final output is the ensemble of the output from all the algorithms. The prediction accuracy can be increased by giving a proper weight to each algorithm\u27s output. By extending the state-of-the-art techniques in learning-based algorithms, this dissertation provides the following solutions: (i) To address (1), we propose a hybrid, dynamic batch size and learning rate tuning algorithm that reduces the overall training time of the neural network. (ii) As a solution for (2), we present an adaptive thresholding algorithm that reduces high false alarm rates. (iii) To overcome (3), we propose a multilevel hybrid ensemble anomaly detection framework that increases the anomaly detection rate of the high dimensional dataset
    • …
    corecore