1,940 research outputs found

    Light-weight Deep Extreme Multilabel Classification

    Full text link
    Extreme multi-label (XML) classification refers to the task of supervised multi-label learning that involves a large number of labels. Hence, scalability of the classifier with increasing label dimension is an important consideration. In this paper, we develop a method called LightDXML which modifies the recently developed deep learning based XML framework by using label embeddings instead of feature embedding for negative sampling and iterating cyclically through three major phases: (1) proxy training of label embeddings (2) shortlisting of labels for negative sampling and (3) final classifier training using the negative samples. Consequently, LightDXML also removes the requirement of a re-ranker module, thereby, leading to further savings on time and memory requirements. The proposed method achieves the best of both worlds: while the training time, model size and prediction times are on par or better compared to the tree-based methods, it attains much better prediction accuracy that is on par with the deep learning based methods. Moreover, the proposed approach achieves the best tail-label prediction accuracy over most state-of-the-art XML methods on some of the large datasets\footnote{accepted in IJCNN 2023, partial funding from MAPG grant and IIIT Seed grant at IIIT, Hyderabad, India. Code: \url{https://github.com/misterpawan/LightDXML}Comment: 9 pages, 2 figures, 5 table

    Federated Learning with Imbalanced and Agglomerated Data Distribution for Medical Image Classification

    Full text link
    Federated learning (FL), training deep models from decentralized data without privacy leakage, has drawn great attention recently. Two common issues in FL, namely data heterogeneity from the local perspective and class imbalance from the global perspective have limited FL's performance. These two coupling problems are under-explored, and existing few studies may not be sufficiently realistic to model data distributions in practical sceneries (e.g. medical sceneries). One common observation is that the overall class distribution across clients is imbalanced (e.g. common vs. rare diseases) and data tend to be agglomerated to those more advanced clients (i.e., the data agglomeration effect), which cannot be modeled by existing settings. Inspired by real medical imaging datasets, we identify and formulate a new and more realistic data distribution denoted as L2 distribution where global class distribution is highly imbalanced and data distributions across clients are imbalanced but forming a certain degree of data agglomeration. To pursue effective FL under this distribution, we propose a novel privacy-preserving framework named FedIIC that calibrates deep models to alleviate bias caused by imbalanced training. To calibrate the feature extractor part, intra-client contrastive learning with a modified similarity measure and inter-client contrastive learning guided by shared global prototypes are introduced to produce a uniform embedding distribution of all classes across clients. To calibrate the classification heads, a softmax cross entropy loss with difficulty-aware logit adjustment is constructed to ensure balanced decision boundaries of all classes. Experimental results on publicly-available datasets demonstrate the superior performance of FedIIC in dealing with both the proposed realistic modeling and the existing modeling of the two coupling problems

    A Survey on Extreme Multi-label Learning

    Full text link
    Multi-label learning has attracted significant attention from both academic and industry field in recent decades. Although existing multi-label learning algorithms achieved good performance in various tasks, they implicitly assume the size of target label space is not huge, which can be restrictive for real-world scenarios. Moreover, it is infeasible to directly adapt them to extremely large label space because of the compute and memory overhead. Therefore, eXtreme Multi-label Learning (XML) is becoming an important task and many effective approaches are proposed. To fully understand XML, we conduct a survey study in this paper. We first clarify a formal definition for XML from the perspective of supervised learning. Then, based on different model architectures and challenges of the problem, we provide a thorough discussion of the advantages and disadvantages of each category of methods. For the benefit of conducting empirical studies, we collect abundant resources regarding XML, including code implementations, and useful tools. Lastly, we propose possible research directions in XML, such as new evaluation metrics, the tail label problem, and weakly supervised XML.Comment: A preliminary versio

    Review of Extreme Multilabel Classification

    Full text link
    Extreme multilabel classification or XML, is an active area of interest in machine learning. Compared to traditional multilabel classification, here the number of labels is extremely large, hence, the name extreme multilabel classification. Using classical one versus all classification wont scale in this case due to large number of labels, same is true for any other classifiers. Embedding of labels as well as features into smaller label space is an essential first step. Moreover, other issues include existence of head and tail labels, where tail labels are labels which exist in relatively smaller number of given samples. The existence of tail labels creates issues during embedding. This area has invited application of wide range of approaches ranging from bit compression motivated from compressed sensing, tree based embeddings, deep learning based latent space embedding including using attention weights, linear algebra based embeddings such as SVD, clustering, hashing, to name a few. The community has come up with a useful set of metrics to identify correctly the prediction for head or tail labels.Comment: 46 pages, 13 figure

    The Emerging Trends of Multi-Label Learning

    Full text link
    Exabytes of data are generated daily by humans, leading to the growing need for new efforts in dealing with the grand challenges for multi-label learning brought by big data. For example, extreme multi-label classification is an active and rapidly growing research area that deals with classification tasks with an extremely large number of classes or labels; utilizing massive data with limited supervision to build a multi-label classification model becomes valuable for practical applications, etc. Besides these, there are tremendous efforts on how to harvest the strong learning capability of deep learning to better capture the label dependencies in multi-label learning, which is the key for deep learning to address real-world classification tasks. However, it is noted that there has been a lack of systemic studies that focus explicitly on analyzing the emerging trends and new challenges of multi-label learning in the era of big data. It is imperative to call for a comprehensive survey to fulfill this mission and delineate future research directions and new applications.Comment: Accepted to TPAMI 202

    STRATEGIES FOR SMALLHOLDERS IN DEVELOPING COUNTRIES: COMMERCIALISATION, DIVERSIFICATION AND EXIT

    Get PDF
    This paper proposes a strategic framework for policies to assist smallholders in developing countries. It describes the inevitable features of structural change in the agricultural and rural economy, the associated pressures that these changes place on smallholders, and the consequent need for policies to facilitate rather than impede adjustment. A key premise of the framework is that, for the majority of smallholders, the long term (i.e. inter-generational)future lies outside the sector. Hence, long-term policies need to make a distinction between those who potentially have a competitive future in the sector and those who do not. In either case, many of the necessary policies will not be agriculture-specific, so it is important that agricultural policies are framed in a broader economy-wide framework. In addition, a clear distinction needs to be made between short-term policies to reduce poverty and food insecurity and long-term policies to stimulate development. This is because there are intertemporal trade-offs (as well as complementarities) between policies that are likely to be effective in the short-run, and those promising most impact over the long-term. The paper discusses the role of different agricultural and non-agricultural policies in providing the appropriate policy mix in countries at different stages of development.smallholders, rural development, agricultural policy, structural change, Agricultural and Food Policy, Community/Rural/Urban Development, International Development, O20, Q18, R23,

    Geo Data Science for Tourism

    Get PDF
    This reprint describes the recent challenges in tourism seen from the point of view of data science. Thanks to the use of the most popular Data Science concepts, you can easily recognise trends and patterns in tourism, detect the impact of tourism on the environment, and predict future trends in tourism. This reprint starts by describing how to analyse data related to the past, then it moves on to detecting behaviours in the present, and, finally, it describes some techniques to predict future trends. By the end of the reprint, you will be able to use data science to help tourism businesses make better use of data and improve their decision making and operations.
    • …
    corecore