Search CORE

7 research outputs found

A matrix factorization framework for jointly analyzing multiple nonnegative data

Author: AI Gilbert
C Stabilini
F Köckerling
F Köckerling
GH Van Ramshorst
I Kyle-Leinhase
L Serdén
M Trias
Publication venue: Omnipress
Publication date: 01/01/2011
Field of study

Nonnegative matrix factorization based methods provide one of the simplest and most effective approaches to text mining. However, their applicability is mainly limited to analyzing a single data source. In this paper, we propose a novel joint matrix factorization framework which can jointly analyze multiple data sources by exploiting their shared and individual structures. The proposed framework is flexible to handle any arbitrary sharing configurations encountered in real world data. We derive an efficient algorithm for learning the factorization and show that its convergence is theoretically guaranteed. We demonstrate the utility and effectiveness of the proposed framework in two real-world applications–improving social media retrieval using auxiliary sources and cross-social media retrieval. Representing each social media source using their textual tags, for both applications, we show that retrieval performance exceeds the existing state-of-the-art techniques. The proposed solution provides a generic framework and can be applicable to a wider context in data mining wherever one needs to exploit mutual and individual knowledge present across multiple data sources

Archivio istituzionale della Ricerca - Bocconi

Crossref

espace@Curtin

Noisy multi-label semi-supervised dimensionality reduction

Author: Bianchi Filippo Maria
Jenssen Robert
Mikalsen Karl Øyvind
Soguero-Ruiz Cristina
Publication venue: 'Elsevier BV'
Publication date: 01/01/2019
Field of study

Noisy labeled data represent a rich source of information that often are easily accessible and cheap to obtain, but label noise might also have many negative consequences if not accounted for. How to fully utilize noisy labels has been studied extensively within the framework of standard supervised machine learning over a period of several decades. However, very little research has been conducted on solving the challenge posed by noisy labels in non-standard settings. This includes situations where only a fraction of the samples are labeled (semi-supervised) and each high-dimensional sample is associated with multiple labels. In this work, we present a novel semi-supervised and multi-label dimensionality reduction method that effectively utilizes information from both noisy multi-labels and unlabeled data. With the proposed Noisy multi-label semi-supervised dimensionality reduction (NMLSDR) method, the noisy multi-labels are denoised and unlabeled data are labeled simultaneously via a specially designed label propagation algorithm. NMLSDR then learns a projection matrix for reducing the dimensionality by maximizing the dependence between the enlarged and denoised multi-label space and the features in the projected space. Extensive experiments on synthetic data, benchmark datasets, as well as a real-world case study, demonstrate the effectiveness of the proposed algorithm and show that it outperforms state-of-the-art multi-label feature extraction algorithms.Comment: 38 page

arXiv.org e-Print Archive

Munin - Open Research Archive

NORA - Norwegian Open Research Archives

Harnessing Teamwork in Networks: Prediction, Optimization, and Explanation

Author
Publication venue
Publication date: 01/01/2018
Field of study

abstract: Teams are increasingly indispensable to achievements in any organizations. Despite the organizations' substantial dependency on teams, fundamental knowledge about the conduct of team-enabled operations is lacking, especially at the {\it social, cognitive} and {\it information} level in relation to team performance and network dynamics. The goal of this dissertation is to create new instruments to {\it predict}, {\it optimize} and {\it explain} teams' performance in the context of composite networks (i.e., social-cognitive-information networks). Understanding the dynamic mechanisms that drive the success of high-performing teams can provide the key insights into building the best teams and hence lift the productivity and profitability of the organizations. For this purpose, novel predictive models to forecast the long-term performance of teams ({\it point prediction}) as well as the pathway to impact ({\it trajectory prediction}) have been developed. A joint predictive model by exploring the relationship between team level and individual level performances has also been proposed. For an existing team, it is often desirable to optimize its performance through expanding the team by bringing a new team member with certain expertise, or finding a new candidate to replace an existing under-performing member. I have developed graph kernel based performance optimization algorithms by considering both the structural matching and skill matching to solve the above enhancement scenarios. I have also worked towards real time team optimization by leveraging reinforcement learning techniques. With the increased complexity of the machine learning models for predicting and optimizing teams, it is critical to acquire a deeper understanding of model behavior. For this purpose, I have investigated {\em explainable prediction} -- to provide explanation behind a performance prediction and {\em explainable optimization} -- to give reasons why the model recommendations are good candidates for certain enhancement scenarios.Dissertation/ThesisDoctoral Dissertation Computer Science 201

ASU Digital Repository

Recommended from our members

Machine learning for improving the quality of citizen science data

Author: Yu Jun
Publication venue: 'Oregon State University'
Publication date
Field of study

Citizen Science is a paradigm in which volunteers from the general public participate in scientific studies, often by performing data collection. This paradigm is especially useful if the scope of the study is too broad to be performed by a limited number of trained scientists. Although citizen scientists can contribute large quantities of data, data quality is often a concern due to variability in the skills of volunteers. In my thesis, I investigate applying machine learning techniques to improve the quality of data submitted to citizen science projects. The context of my work is eBird, which is one of the largest citizen science projects in existence. In the eBird project, citizen scientists act as a large global network of human sensors, recording observations of bird species and submitting these observations to a centralized database where they are used for ecological research such as species distribution modeling and reserve design. Machine learning can be used to improve data quality by modeling an observer's skill level, developing an automated data verification model and discovering groups of misidentified species

ScholarsArchive@OSU

A shared-subspace learning framework for multi-label classification

Author: Argyriou A.
Ji S.
Ji S.
Jieping Ye
Kumar S.
Lei Tang
Li Y.-X.
McCallum A.
Roth V.
Schölkopf S.
Shipeng Yu
Shuiwang Ji
Sun L.
Tang L.
Yang Y.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref