1,940 research outputs found
Light-weight Deep Extreme Multilabel Classification
Extreme multi-label (XML) classification refers to the task of supervised
multi-label learning that involves a large number of labels. Hence, scalability
of the classifier with increasing label dimension is an important
consideration. In this paper, we develop a method called LightDXML which
modifies the recently developed deep learning based XML framework by using
label embeddings instead of feature embedding for negative sampling and
iterating cyclically through three major phases: (1) proxy training of label
embeddings (2) shortlisting of labels for negative sampling and (3) final
classifier training using the negative samples. Consequently, LightDXML also
removes the requirement of a re-ranker module, thereby, leading to further
savings on time and memory requirements. The proposed method achieves the best
of both worlds: while the training time, model size and prediction times are on
par or better compared to the tree-based methods, it attains much better
prediction accuracy that is on par with the deep learning based methods.
Moreover, the proposed approach achieves the best tail-label prediction
accuracy over most state-of-the-art XML methods on some of the large
datasets\footnote{accepted in IJCNN 2023, partial funding from MAPG grant and
IIIT Seed grant at IIIT, Hyderabad, India. Code:
\url{https://github.com/misterpawan/LightDXML}Comment: 9 pages, 2 figures, 5 table
Federated Learning with Imbalanced and Agglomerated Data Distribution for Medical Image Classification
Federated learning (FL), training deep models from decentralized data without
privacy leakage, has drawn great attention recently. Two common issues in FL,
namely data heterogeneity from the local perspective and class imbalance from
the global perspective have limited FL's performance. These two coupling
problems are under-explored, and existing few studies may not be sufficiently
realistic to model data distributions in practical sceneries (e.g. medical
sceneries). One common observation is that the overall class distribution
across clients is imbalanced (e.g. common vs. rare diseases) and data tend to
be agglomerated to those more advanced clients (i.e., the data agglomeration
effect), which cannot be modeled by existing settings. Inspired by real medical
imaging datasets, we identify and formulate a new and more realistic data
distribution denoted as L2 distribution where global class distribution is
highly imbalanced and data distributions across clients are imbalanced but
forming a certain degree of data agglomeration. To pursue effective FL under
this distribution, we propose a novel privacy-preserving framework named FedIIC
that calibrates deep models to alleviate bias caused by imbalanced training. To
calibrate the feature extractor part, intra-client contrastive learning with a
modified similarity measure and inter-client contrastive learning guided by
shared global prototypes are introduced to produce a uniform embedding
distribution of all classes across clients. To calibrate the classification
heads, a softmax cross entropy loss with difficulty-aware logit adjustment is
constructed to ensure balanced decision boundaries of all classes. Experimental
results on publicly-available datasets demonstrate the superior performance of
FedIIC in dealing with both the proposed realistic modeling and the existing
modeling of the two coupling problems
A Survey on Extreme Multi-label Learning
Multi-label learning has attracted significant attention from both academic
and industry field in recent decades. Although existing multi-label learning
algorithms achieved good performance in various tasks, they implicitly assume
the size of target label space is not huge, which can be restrictive for
real-world scenarios. Moreover, it is infeasible to directly adapt them to
extremely large label space because of the compute and memory overhead.
Therefore, eXtreme Multi-label Learning (XML) is becoming an important task and
many effective approaches are proposed. To fully understand XML, we conduct a
survey study in this paper. We first clarify a formal definition for XML from
the perspective of supervised learning. Then, based on different model
architectures and challenges of the problem, we provide a thorough discussion
of the advantages and disadvantages of each category of methods. For the
benefit of conducting empirical studies, we collect abundant resources
regarding XML, including code implementations, and useful tools. Lastly, we
propose possible research directions in XML, such as new evaluation metrics,
the tail label problem, and weakly supervised XML.Comment: A preliminary versio
Review of Extreme Multilabel Classification
Extreme multilabel classification or XML, is an active area of interest in
machine learning. Compared to traditional multilabel classification, here the
number of labels is extremely large, hence, the name extreme multilabel
classification. Using classical one versus all classification wont scale in
this case due to large number of labels, same is true for any other
classifiers. Embedding of labels as well as features into smaller label space
is an essential first step. Moreover, other issues include existence of head
and tail labels, where tail labels are labels which exist in relatively smaller
number of given samples. The existence of tail labels creates issues during
embedding. This area has invited application of wide range of approaches
ranging from bit compression motivated from compressed sensing, tree based
embeddings, deep learning based latent space embedding including using
attention weights, linear algebra based embeddings such as SVD, clustering,
hashing, to name a few. The community has come up with a useful set of metrics
to identify correctly the prediction for head or tail labels.Comment: 46 pages, 13 figure
The Emerging Trends of Multi-Label Learning
Exabytes of data are generated daily by humans, leading to the growing need
for new efforts in dealing with the grand challenges for multi-label learning
brought by big data. For example, extreme multi-label classification is an
active and rapidly growing research area that deals with classification tasks
with an extremely large number of classes or labels; utilizing massive data
with limited supervision to build a multi-label classification model becomes
valuable for practical applications, etc. Besides these, there are tremendous
efforts on how to harvest the strong learning capability of deep learning to
better capture the label dependencies in multi-label learning, which is the key
for deep learning to address real-world classification tasks. However, it is
noted that there has been a lack of systemic studies that focus explicitly on
analyzing the emerging trends and new challenges of multi-label learning in the
era of big data. It is imperative to call for a comprehensive survey to fulfill
this mission and delineate future research directions and new applications.Comment: Accepted to TPAMI 202
STRATEGIES FOR SMALLHOLDERS IN DEVELOPING COUNTRIES: COMMERCIALISATION, DIVERSIFICATION AND EXIT
This paper proposes a strategic framework for policies to assist smallholders in developing countries. It describes the inevitable features of structural change in the agricultural and rural economy, the associated pressures that these changes place on smallholders, and the consequent need for policies to facilitate rather than impede adjustment. A key premise of the framework is that, for the majority of smallholders, the long term (i.e. inter-generational)future lies outside the sector. Hence, long-term policies need to make a distinction between those who potentially have a competitive future in the sector and those who do not. In either case, many of the necessary policies will not be agriculture-specific, so it is important that agricultural policies are framed in a broader economy-wide framework. In addition, a clear distinction needs to be made between short-term policies to reduce poverty and food insecurity and long-term policies to stimulate development. This is because there are intertemporal trade-offs (as well as complementarities) between policies that are likely to be effective in the short-run, and those promising most impact over the long-term. The paper discusses the role of different agricultural and non-agricultural policies in providing the appropriate policy mix in countries at different stages of development.smallholders, rural development, agricultural policy, structural change, Agricultural and Food Policy, Community/Rural/Urban Development, International Development, O20, Q18, R23,
Geo Data Science for Tourism
This reprint describes the recent challenges in tourism seen from the point of view of data science. Thanks to the use of the most popular Data Science concepts, you can easily recognise trends and patterns in tourism, detect the impact of tourism on the environment, and predict future trends in tourism. This reprint starts by describing how to analyse data related to the past, then it moves on to detecting behaviours in the present, and, finally, it describes some techniques to predict future trends. By the end of the reprint, you will be able to use data science to help tourism businesses make better use of data and improve their decision making and operations.
- …