8 research outputs found
Multi-label learning with emerging new labels
In a multi-label learning task, an object possesses multiple concepts where each concept is represented by a class label. Previous studies on multi-label learning have focused on a fixed set of class labels, i.e., the class label set of test data is the same as that in the training set. In many applications, however, the environment is dynamic and new concepts may emerge in a data stream. In order to maintain a good predictive performance in this environment, a multi-label learning method must have the ability to detect and classify instances with emerging new labels. To this end, we propose a new approach called Multi-label learning with Emerging New Labels (MuENL). It has three functions: classify instances on currently known labels, detect the emergence of a new label, and construct a new classifier for each new label that works collaboratively with the classifier for known labels. In addition, we show that MuENL can be easily extended to handle sparse high dimensional data streams by simply reducing the original dimensionality, and then applying MuENL on the reduced dimensional space. Our empirical evaluation shows the effectiveness of MuENL on several benchmark datasets and MuENLHD on the sparse high dimensional Weibo dataset
Multi-label Classification via Adaptive Resonance Theory-based Clustering
This paper proposes a multi-label classification algorithm capable of
continual learning by applying an Adaptive Resonance Theory (ART)-based
clustering algorithm and the Bayesian approach for label probability
computation. The ART-based clustering algorithm adaptively and continually
generates prototype nodes corresponding to given data, and the generated nodes
are used as classifiers. The label probability computation independently counts
the number of label appearances for each class and calculates the Bayesian
probabilities. Thus, the label probability computation can cope with an
increase in the number of labels. Experimental results with synthetic and
real-world multi-label datasets show that the proposed algorithm has
competitive classification performance to other well-known algorithms while
realizing continual learning
Overview of BioASQ 2021-MESINESP track. Evaluation of advance hierarchical classification techniques for scientific literature, patents and clinical trials
CLEF 2021 – Conference and Labs of the Evaluation Forum, September 21–24, 2021, Bucharest, Romania,There is a pressing need to exploit recent advances in natural language processing technologies, in
particular language models and deep learning approaches, to enable improved retrieval, classification
and ultimately access to information contained in multiple, heterogeneous types of documents. This is
particularly true for the field of biomedicine and clinical research, where medical experts and scientists
need to carry out complex search queries against a variety of document collections, including literature,
patents, clinical trials or other kind of content like EHRs. Indexing documents with structured controlled
vocabularies used for semantic search engines and query expansion purposes is a critical task for enabling
sophisticated user queries and even cross-language retrieval. Due to the complexity of the medical domain
and the use of very large hierarchical indexing terminologies, implementing efficient automatic systems
to aid manual indexing is extremely difficult. This paper provides a summary of the MESINESP task
results on medical semantic indexing in Spanish (BioASQ/ CLEF 2021 Challenge). MESINESP was carried
out in direct collaboration with literature content databases and medical indexing experts using the DeCS
vocabulary, a similar resource as MeSH terms. Seven participating teams used advanced technologies
including extreme multilabel classification and deep language models to solve this challenge which can
be viewed as a multi-label classification problem. MESINESP resources, we have released a Gold Standard
collection of 243,000 documents with a total of 2179 manual annotations divided in train, development
and test subsets covering literature, patents as well as clinical trial summaries, under a cross-genre
training and data labeling scenario. Manual indexing of the evaluation subsets was carried out by three
independent experts using a specially developed indexing interface called ASIT. Additionally, we have
published a collection of large-scale automatic semantic annotations based on NER systems of these
documents with mentions of drugs/medications (170,000), symptoms (137,000), diseases (840,000) and
clinical procedures (415,000). In addition to a summary of the used technologies by the teams, this paperS
The Emerging Trends of Multi-Label Learning
Exabytes of data are generated daily by humans, leading to the growing need
for new efforts in dealing with the grand challenges for multi-label learning
brought by big data. For example, extreme multi-label classification is an
active and rapidly growing research area that deals with classification tasks
with an extremely large number of classes or labels; utilizing massive data
with limited supervision to build a multi-label classification model becomes
valuable for practical applications, etc. Besides these, there are tremendous
efforts on how to harvest the strong learning capability of deep learning to
better capture the label dependencies in multi-label learning, which is the key
for deep learning to address real-world classification tasks. However, it is
noted that there has been a lack of systemic studies that focus explicitly on
analyzing the emerging trends and new challenges of multi-label learning in the
era of big data. It is imperative to call for a comprehensive survey to fulfill
this mission and delineate future research directions and new applications.Comment: Accepted to TPAMI 202
Integrating Machine Learning Paradigms for Predictive Maintenance in the Fourth Industrial Revolution era
In the last decade, manufacturing companies have been facing two significant challenges. First, digitalization imposes adopting Industry 4.0 technologies and allows creating smart, connected, self-aware, and self-predictive factories. Second, the attention on sustainability imposes to evaluate and reduce the impact of the implemented solutions from economic and social points of view. In manufacturing companies, the maintenance of physical assets assumes a critical role. Increasing the reliability and the availability of production systems leads to the minimization of systems’ downtimes; In addition, the proper system functioning avoids production wastes and potentially catastrophic accidents. Digitalization and new ICT technologies have assumed a relevant role in maintenance strategies. They allow assessing the health condition of machinery at any point in time. Moreover, they allow predicting the future behavior of machinery so that maintenance interventions can be planned, and the useful life of components can be exploited until the time instant before their fault.
This dissertation provides insights on Predictive Maintenance goals and tools in Industry 4.0 and proposes a novel data acquisition, processing, sharing, and storage framework that addresses typical issues machine producers and users encounter. The research elaborates on two research questions that narrow down the potential approaches to data acquisition, processing, and analysis for fault diagnostics in evolving environments. The research activity is developed according to a research framework, where the research questions are addressed by research levers that are explored according to research topics. Each topic requires a specific set of methods and approaches; however, the overarching methodological approach presented in this dissertation includes three fundamental aspects: the maximization of the quality level of input data, the use of Machine Learning methods for data analysis, and the use of case studies deriving from both controlled environments (laboratory) and real-world instances
Learning and Leveraging Structured Knowledge from User-Generated Social Media Data
Knowledge has long been a crucial element in Artificial Intelligence (AI), which can be traced back to knowledge-based systems, or expert systems, in the 1960s. Knowledge provides contexts to facilitate machine understanding and improves the explainability and performance of many semantic-based applications. The acquisition of knowledge is, however, a complex step, normally requiring much effort and time from domain experts. In machine learning as one key domain of AI, the learning and leveraging of structured knowledge, such as ontologies and knowledge graphs, have become popular in recent years with the advent of massive user-generated social media data. The main hypothesis in this thesis is therefore that a substantial amount of useful knowledge can be derived from user-generated social media data. A popular, common type of social media data is social tagging data, accumulated from users' tagging in social media platforms. Social tagging data exhibit unstructured characteristics, including noisiness, flatness, sparsity, incompleteness, which prevent their efficient knowledge discovery and usage. The aim of this thesis is thus to learn useful structured knowledge from social media data regarding these unstructured characteristics. Several research questions have then been formulated related to the hypothesis and the research challenges. A knowledge-centred view has been considered throughout this thesis: knowledge bridges the gap between massive user-generated data to semantic-based applications. The study first reviews concepts related to structured knowledge, then focuses on two main parts, learning structured knowledge and leveraging structured knowledge from social tagging data. To learn structured knowledge, a machine learning system is proposed to predict subsumption relations from social tags. The main idea is to learn to predict accurate relations with features, generated with probabilistic topic modelling and founded on a formal set of assumptions on deriving subsumption relations. Tag concept hierarchies can then be organised to enrich existing Knowledge Bases (KBs), such as DBpedia and ACM Computing Classification Systems. The study presents relation-level evaluation, ontology-level evaluation, and the novel, Knowledge Base Enrichment based evaluation, and shows that the proposed approach can generate high quality and meaningful hierarchies to enrich existing KBs. To leverage structured knowledge of tags, the research focuses on the task of automated social annotation and propose a knowledge-enhanced deep learning model. Semantic-based loss regularisation has been proposed to enhance the deep learning model with the similarity and subsumption relations between tags. Besides, a novel, guided attention mechanism, has been proposed to mimic the users' behaviour of reading the title before digesting the content for annotation. The integrated model, Joint Multi-label Attention Network (JMAN), significantly outperformed the state-of-the-art, popular baseline methods, with consistent performance gain of the semantic-based loss regularisers on several deep learning models, on four real-world datasets. With the careful treatment of the unstructured characteristics and with the novel probabilistic and neural network based approaches, useful knowledge can be learned from user-generated social media data and leveraged to support semantic-based applications. This validates the hypothesis of the research and addresses the research questions. Future studies are considered to explore methods to efficiently learn and leverage other various types of structured knowledge and to extend current approaches to other user-generated data