25 research outputs found

    Survey on highly imbalanced multi-class data

    Get PDF
    Machine learning technology has a massive impact on society because it offers solutions to solve many complicated problems like classification, clustering analysis, and predictions, especially during the COVID-19 pandemic. Data distribution in machine learning has been an essential aspect in providing unbiased solutions. From the earliest literatures published on highly imbalanced data until recently, machine learning research has focused mostly on binary classification data problems. Research on highly imbalanced multi-class data is still greatly unexplored when the need for better analysis and predictions in handling Big Data is required. This study focuses on reviews related to the models or techniques in handling highly imbalanced multi-class data, along with their strengths and weaknesses and related domains. Furthermore, the paper uses the statistical method to explore a case study with a severely imbalanced dataset. This article aims to (1) understand the trend of highly imbalanced multi-class data through analysis of related literatures; (2) analyze the previous and current methods of handling highly imbalanced multi-class data; (3) construct a framework of highly imbalanced multi-class data. The chosen highly imbalanced multi-class dataset analysis will also be performed and adapted to the current methods or techniques in machine learning, followed by discussions on open challenges and the future direction of highly imbalanced multi-class data. Finally, for highly imbalanced multi-class data, this paper presents a novel framework. We hope this research can provide insights on the potential development of better methods or techniques to handle and manipulate highly imbalanced multi-class data

    Multidimensional opinion mining from social data

    Get PDF
    Social media popularity and importance is on the increase due to people using it for various types of social interaction across multiple channels. This thesis focuses on the evolving research area of Social Opinion Mining, tasked with the identification of multiple opinion dimensions, such as subjectivity, sentiment polarity, emotion, affect, sarcasm, and irony, from user-generated content represented across multiple social media platforms and in various media formats, like textual, visual, and audio. Mining people’s social opinions from social sources, such as social media platforms and newswires commenting sections, is a valuable business asset that can be utilised in many ways and in multiple domains, such as Politics, Finance, and Government. The main objective of this research is to investigate how a multidimensional approach to Social Opinion Mining affects fine-grained opinion search and summarisation at an aspect-based level and whether such a multidimensional approach outperforms single dimension approaches in the context of an extrinsic human evaluation conducted in a real-world context: the Malta Government Budget, where five social opinion dimensions are taken into consideration, namely subjectivity, sentiment polarity, emotion, irony, and sarcasm. This human evaluation determines whether the multidimensional opinion summarisation results provide added-value to potential end-users, such as policy-makers and decision-takers, thereby providing a nuanced voice to the general public on their social opinions on topics of a national importance. Results obtained indicate that a more fine-grained aspect-based opinion summary based on the combined dimensions of subjectivity, sentiment polarity, emotion, and sarcasm or irony is more informative and more useful than one based on sentiment polarity only. This research contributes towards the advancement of intelligent search and information retrieval from social data and impacts entities utilising Social Opinion Mining results towards effective policy formulation, policy-making, decision-making, and decision-taking at a strategic level

    24th Nordic Conference on Computational Linguistics (NoDaLiDa)

    Get PDF

    Tracking the Temporal-Evolution of Supernova Bubbles in Numerical Simulations

    Get PDF
    The study of low-dimensional, noisy manifolds embedded in a higher dimensional space has been extremely useful in many applications, from the chemical analysis of multi-phase flows to simulations of galactic mergers. Building a probabilistic model of the manifolds has helped in describing their essential properties and how they vary in space. However, when the manifold is evolving through time, a joint spatio-temporal modelling is needed, in order to fully comprehend its nature. We propose a first-order Markovian process that propagates the spatial probabilistic model of a manifold at fixed time, to its adjacent temporal stages. The proposed methodology is demonstrated using a particle simulation of an interacting dwarf galaxy to describe the evolution of a cavity generated by a Supernov

    The Multilingual Brain – Implications for the Future

    Get PDF

    On Improving Generalization of CNN-Based Image Classification with Delineation Maps Using the CORF Push-Pull Inhibition Operator

    Get PDF
    Deployed image classification pipelines are typically dependent on the images captured in real-world environments. This means that images might be affected by different sources of perturbations (e.g. sensor noise in low-light environments). The main challenge arises by the fact that image quality directly impacts the reliability and consistency of classification tasks. This challenge has, hence, attracted wide interest within the computer vision communities. We propose a transformation step that attempts to enhance the generalization ability of CNN models in the presence of unseen noise in the test set. Concretely, the delineation maps of given images are determined using the CORF push-pull inhibition operator. Such an operation transforms an input image into a space that is more robust to noise before being processed by a CNN. We evaluated our approach on the Fashion MNIST data set with an AlexNet model. It turned out that the proposed CORF-augmented pipeline achieved comparable results on noise-free images to those of a conventional AlexNet classification model without CORF delineation maps, but it consistently achieved significantly superior performance on test images perturbed with different levels of Gaussian and uniform noise

    Tune your brown clustering, please

    Get PDF
    Brown clustering, an unsupervised hierarchical clustering technique based on ngram mutual information, has proven useful in many NLP applications. However, most uses of Brown clustering employ the same default configuration; the appropriateness of this configuration has gone predominantly unexplored. Accordingly, we present information for practitioners on the behaviour of Brown clustering in order to assist hyper-parametre tuning, in the form of a theoretical model of Brown clustering utility. This model is then evaluated empirically in two sequence labelling tasks over two text types. We explore the dynamic between the input corpus size, chosen number of classes, and quality of the resulting clusters, which has an impact for any approach using Brown clustering. In every scenario that we examine, our results reveal that the values most commonly used for the clustering are sub-optimal

    Code switching, language mixing and fused lects : language alternation phenomena in multilingual Mauritius

    Get PDF
    Focusing on a series of multiparty recordings carried out between the months of October and March 2012 and drawing on a theoretical framework based on work of linguists such as Auer (1999), Backus (2005), Bakker (2000), Maschler (2000) and Matras (2000a and 2000b), this thesis traces the evolution of a continuum of language alternation phenomena, ranging from simple code-switching to more complex forms of 'language alloying' (Alvarez- Càccamo 1998) such as mixed codes and fused lects in multilingual Mauritius. Following Auer (2001), the different conversational loci of code-switching are identified. Particular emphasis has been placed upon, amongst others, the conversational locus of playfulness where, for instance, participants' spontaneous lapses into song and dance sequences as they inspire themselves from Bollywood pop songs and creatively embed segments in Hindustani within a predominantly Kreol matrix are noted. Furthermore, in line with Auer (1999), Backus (2005) and Muysken (2000), emerging forms of language mixing such as changes in the way possessive marking is carried in Kreol and instances of semantic shift in Bhojpuri/ Hindustani words like nasha and daan have been highlighted and their pragmatic significance explained with specific reference to the Mauritian context. Finally, in the fused lect stage, specific attention has been provided to one key feature namely phonological blending which has resulted in the coinage of the discourse marker ashe and its eventual use in the process of discourse marker switching. In the light of the above findings, this thesis firstly critiques the strengths and weaknesses of the notion of the code switching (CS) continuum (Auer 1999) itself by revealing the difficulties encountered, at the empirical level, in assigning the correct label to the different types of language alternation phenomena evidenced in this thesis. In the second instance, it considers the impact of such shifts along the language alternation continuum upon language policy and planning in contemporary Mauritius and advocates for a move away from colonial language policies such as the 1957 Education Act in favour of updated ones that are responsive to the language practices of speakers.Linguistics and Modern LanguagesD. Litt. et Phil. (Linguistics

    RFID Technology in Intelligent Tracking Systems in Construction Waste Logistics Using Optimisation Techniques

    Get PDF
    Construction waste disposal is an urgent issue for protecting our environment. This paper proposes a waste management system and illustrates the work process using plasterboard waste as an example, which creates a hazardous gas when land filled with household waste, and for which the recycling rate is less than 10% in the UK. The proposed system integrates RFID technology, Rule-Based Reasoning, Ant Colony optimization and knowledge technology for auditing and tracking plasterboard waste, guiding the operation staff, arranging vehicles, schedule planning, and also provides evidence to verify its disposal. It h relies on RFID equipment for collecting logistical data and uses digital imaging equipment to give further evidence; the reasoning core in the third layer is responsible for generating schedules and route plans and guidance, and the last layer delivers the result to inform users. The paper firstly introduces the current plasterboard disposal situation and addresses the logistical problem that is now the main barrier to a higher recycling rate, followed by discussion of the proposed system in terms of both system level structure and process structure. And finally, an example scenario will be given to illustrate the system’s utilization
    corecore