5,626 research outputs found

    Concept-based Interactive Query Expansion Support Tool (CIQUEST)

    Get PDF
    This report describes a three-year project (2000-03) undertaken in the Information Studies Department at The University of Sheffield and funded by Resource, The Council for Museums, Archives and Libraries. The overall aim of the research was to provide user support for query formulation and reformulation in searching large-scale textual resources including those of the World Wide Web. More specifically the objectives were: to investigate and evaluate methods for the automatic generation and organisation of concepts derived from retrieved document sets, based on statistical methods for term weighting; and to conduct user-based evaluations on the understanding, presentation and retrieval effectiveness of concept structures in selecting candidate terms for interactive query expansion. The TREC test collection formed the basis for the seven evaluative experiments conducted in the course of the project. These formed four distinct phases in the project plan. In the first phase, a series of experiments was conducted to investigate further techniques for concept derivation and hierarchical organisation and structure. The second phase was concerned with user-based validation of the concept structures. Results of phases 1 and 2 informed on the design of the test system and the user interface was developed in phase 3. The final phase entailed a user-based summative evaluation of the CiQuest system. The main findings demonstrate that concept hierarchies can effectively be generated from sets of retrieved documents and displayed to searchers in a meaningful way. The approach provides the searcher with an overview of the contents of the retrieved documents, which in turn facilitates the viewing of documents and selection of the most relevant ones. Concept hierarchies are a good source of terms for query expansion and can improve precision. The extraction of descriptive phrases as an alternative source of terms was also effective. With respect to presentation, cascading menus were easy to browse for selecting terms and for viewing documents. In conclusion the project dissemination programme and future work are outlined

    Lexicon-based sentiment analysis in texts using Formal Concept Analysis

    Get PDF
    In this paper, we present a novel approach for sentiment analysis that uses Formal Concept Analysis (FCA) to create dictionaries for classification. Unlike other methods that rely on pre-defined lexicons, our approach allows for the creation of customised dictionaries that are tailored to the specific data and tasks. By using a dataset of tweets categorised into positive and negative polarity, we show that our approach achieves a better performance than other standard dictionariesThis research is partially supported by the State Agency of Research (AEI), the Spanish Ministry of Science, Innovation, and Universities (MCIU), the European Social Fund (FEDER), the Junta de Andalucía (JA), and the Universidad de Málaga (UMA) through the FPU19/01467 (MCIU) internship and the research projects with reference PGC2018-095869-B-I00, TIN2017-89023-P, PID2021-127870OB-I00 (MCIU/AEI/FEDER, UE) and UMA18-FEDERJA-001 (JA/ UMA/ FEDER, UE). Funding for open access charge: Universidad de Málaga / CBU

    Noise Reduction In Web Data: A Learning Approach Based On Dynamic User Interests

    Get PDF
    One of the prominent challenges internet operators encounter is the abundance of extraneous material inside web content, hence impeding the efficient retrieval of relevant information aligned with their evolving interests. The present state of affairs. In academic research, noise is commonly defined as any extraneous data that does not contribute to the intended analysis or study objectives. This study aims to analyse the primary webpage and suggest noise reduction tools for online data. The primary emphasis is on the reduction of noise about the content and its associated factors. The arrangement or organisation of data on the internet. In this paper, including some data inside a dataset may not be universally applicable or appropriate. The web page's primary content pertains to the user's specific interests, while extraneous info is minimised. Noise can be perceived as disruptive or unwanted sound by an individual. Hence, the acquisition of noisy online data and the allocation of resources to user requests ensure not just a decrease in noise levels. There is an observed correlation between the level indicated in a user profile on the web and a reduction in the occurrence of valuable information loss. The inclusion of information consequently enhances the calibre of an online user profile. The phenomenon of noise refers to unwanted or disruptive sounds that can have negative effects on individuals and the Web Data Learning (NWDL) tool/algorithm exhibits the capacity to acquire knowledge. The proposal suggests the use of noise in web user profiles to enhance data privacy. The work that has suggested the removal of noise data in the context of dynamic user behaviour is being considered. The topic of interest is being discussed. To ascertain the efficacy of the proposed study, A presentation of an experimental design arrangement is provided. The results were achieved in contrast to the presently employed algorithms utilised in the context of noisy online data. The reducing process. The experimental findings indicate that the proposed study examines the dynamic evolution of user interest before the removal of extraneous data. The proposed study makes a significant contribution to Enhancing the calibre of an online user profile through the reduction of content volume. The elimination of noise results in the removal of beneficial information

    Intelligent data mining using artificial neural networks and genetic algorithms : techniques and applications

    Get PDF
    Data Mining (DM) refers to the analysis of observational datasets to find relationships and to summarize the data in ways that are both understandable and useful. Many DM techniques exist. Compared with other DM techniques, Intelligent Systems (ISs) based approaches, which include Artificial Neural Networks (ANNs), fuzzy set theory, approximate reasoning, and derivative-free optimization methods such as Genetic Algorithms (GAs), are tolerant of imprecision, uncertainty, partial truth, and approximation. They provide flexible information processing capability for handling real-life situations. This thesis is concerned with the ideas behind design, implementation, testing and application of a novel ISs based DM technique. The unique contribution of this thesis is in the implementation of a hybrid IS DM technique (Genetic Neural Mathematical Method, GNMM) for solving novel practical problems, the detailed description of this technique, and the illustrations of several applications solved by this novel technique. GNMM consists of three steps: (1) GA-based input variable selection, (2) Multi- Layer Perceptron (MLP) modelling, and (3) mathematical programming based rule extraction. In the first step, GAs are used to evolve an optimal set of MLP inputs. An adaptive method based on the average fitness of successive generations is used to adjust the mutation rate, and hence the exploration/exploitation balance. In addition, GNMM uses the elite group and appearance percentage to minimize the randomness associated with GAs. In the second step, MLP modelling serves as the core DM engine in performing classification/prediction tasks. An Independent Component Analysis (ICA) based weight initialization algorithm is used to determine optimal weights before the commencement of training algorithms. The Levenberg-Marquardt (LM) algorithm is used to achieve a second-order speedup compared to conventional Back-Propagation (BP) training. In the third step, mathematical programming based rule extraction is not only used to identify the premises of multivariate polynomial rules, but also to explore features from the extracted rules based on data samples associated with each rule. Therefore, the methodology can provide regression rules and features not only in the polyhedrons with data instances, but also in the polyhedrons without data instances. A total of six datasets from environmental and medical disciplines were used as case study applications. These datasets involve the prediction of longitudinal dispersion coefficient, classification of electrocorticography (ECoG)/Electroencephalogram (EEG) data, eye bacteria Multisensor Data Fusion (MDF), and diabetes classification (denoted by Data I through to Data VI). GNMM was applied to all these six datasets to explore its effectiveness, but the emphasis is different for different datasets. For example, the emphasis of Data I and II was to give a detailed illustration of how GNMM works; Data III and IV aimed to show how to deal with difficult classification problems; the aim of Data V was to illustrate the averaging effect of GNMM; and finally Data VI was concerned with the GA parameter selection and benchmarking GNMM with other IS DM techniques such as Adaptive Neuro-Fuzzy Inference System (ANFIS), Evolving Fuzzy Neural Network (EFuNN), Fuzzy ARTMAP, and Cartesian Genetic Programming (CGP). In addition, datasets obtained from published works (i.e. Data II & III) or public domains (i.e. Data VI) where previous results were present in the literature were also used to benchmark GNMM’s effectiveness. As a closely integrated system GNMM has the merit that it needs little human interaction. With some predefined parameters, such as GA’s crossover probability and the shape of ANNs’ activation functions, GNMM is able to process raw data until some human-interpretable rules being extracted. This is an important feature in terms of practice as quite often users of a DM system have little or no need to fully understand the internal components of such a system. Through case study applications, it has been shown that the GA-based variable selection stage is capable of: filtering out irrelevant and noisy variables, improving the accuracy of the model; making the ANN structure less complex and easier to understand; and reducing the computational complexity and memory requirements. Furthermore, rule extraction ensures that the MLP training results are easily understandable and transferrable

    Quantification of uncertainty of geometallurgical variables for mine planning optimisation

    Get PDF
    Interest in geometallurgy has increased significantly over the past 15 years or so because of the benefits it brings to mine planning and operation. Its use and integration into design, planning and operation is becoming increasingly critical especially in the context of declining ore grades and increasing mining and processing costs. This thesis, comprising four papers, offers methodologies and methods to quantify geometallurgical uncertainty and enrich the block model with geometallurgical variables, which contribute to improved optimisation of mining operations. This enhanced block model is termed a geometallurgical block model. Bootstrapped non-linear regression models by projection pursuit were built to predict grindability indices and recovery, and quantify model uncertainty. These models are useful for populating the geometallurgical block model with response attributes. New multi-objective optimisation formulations for block caving mining were formulated and solved by a meta-heuristics solver focussing on maximising the project revenue and, at the same time, minimising several risk measures. A novel clustering method, which is able to use both continuous and categorical attributes and incorporate expert knowledge, was also developed for geometallurgical domaining which characterises the deposit according to its metallurgical response. The concept of geometallurgical dilution was formulated and used for optimising production scheduling in an open-pit case study.Thesis (Ph.D.) (Research by Publication) -- University of Adelaide, School of Civil, Environmental and Mining Engineering, 201

    Slave to the Algorithm? Why a \u27Right to an Explanation\u27 Is Probably Not the Remedy You Are Looking For

    Get PDF
    Algorithms, particularly machine learning (ML) algorithms, are increasingly important to individuals’ lives, but have caused a range of concerns revolving mainly around unfairness, discrimination and opacity. Transparency in the form of a “right to an explanation” has emerged as a compellingly attractive remedy since it intuitively promises to open the algorithmic “black box” to promote challenge, redress, and hopefully heightened accountability. Amidst the general furore over algorithmic bias we describe, any remedy in a storm has looked attractive. However, we argue that a right to an explanation in the EU General Data Protection Regulation (GDPR) is unlikely to present a complete remedy to algorithmic harms, particularly in some of the core “algorithmic war stories” that have shaped recent attitudes in this domain. Firstly, the law is restrictive, unclear, or even paradoxical concerning when any explanation-related right can be triggered. Secondly, even navigating this, the legal conception of explanations as “meaningful information about the logic of processing” may not be provided by the kind of ML “explanations” computer scientists have developed, partially in response. ML explanations are restricted both by the type of explanation sought, the dimensionality of the domain and the type of user seeking an explanation. However, “subject-centric explanations (SCEs) focussing on particular regions of a model around a query show promise for interactive exploration, as do explanation systems based on learning a model from outside rather than taking it apart (pedagogical versus decompositional explanations) in dodging developers\u27 worries of intellectual property or trade secrets disclosure. Based on our analysis, we fear that the search for a “right to an explanation” in the GDPR may be at best distracting, and at worst nurture a new kind of “transparency fallacy.” But all is not lost. We argue that other parts of the GDPR related (i) to the right to erasure ( right to be forgotten ) and the right to data portability; and (ii) to privacy by design, Data Protection Impact Assessments and certification and privacy seals, may have the seeds we can use to make algorithms more responsible, explicable, and human-centered

    Exploring embedding vectors for emotion detection

    Get PDF
    Textual data nowadays is being generated in vast volumes. With the proliferation of social media and the prevalence of smartphones, short texts have become a prevalent form of information such as news headlines, tweets and text advertisements. Given the huge volume of short texts available, effective and efficient models to detect the emotions from short texts become highly desirable and in some cases fundamental to a range of applications that require emotion understanding of textual content, such as human computer interaction, marketing, e-learning and health. Emotion detection from text has been an important task in Natural Language Processing (NLP) for many years. Many approaches have been based on the emotional words or lexicons in order to detect emotions. While the word embedding vectors like Word2Vec have been successfully employed in many NLP approaches, the word mover’s distance (WMD) is a method introduced recently to calculate the distance between two documents based on the embedded words. This thesis is investigating the ability to detect or classify emotions in sentences using word vectorization and distance measures. Our results confirm the novelty of using Word2Vec and WMD in predicting the emotions in short text. We propose a new methodology based on identifying “idealised” vectors that cap- ture the essence of an emotion; we define these vectors as having the minimal distance (using some metric function) between a vector and the embeddings of the text that contains the relevant emotion (e.g. a tweet, a sentence). We look for these vectors through searching the space of word embeddings using the covariance matrix adap- tation evolution strategy (CMA-ES). Our method produces state of the art results, surpassing classic supervised learning methods

    Enhancing Promotional Strategy Mapping Using the K-Means Clustering Algorithm to Raise Sales

    Get PDF
    To enhance sales, organizations must improve the alignment of their promotional tactics. Enterprises have the ability to promote their goods in locations where there is demand for them. Facilitating the delivery of the goods would enhance the ease with which clients can carry out their purchases and sales transactions. A corporation's ability to strategically allocate its goods enables it to expand its operations. Prospective clients have a greater array of choices at their disposal than the total number of enterprises operating within the same sector. This is accomplished by using a diverse range of promotional media to enhance the sales of products and services. Optimizing promotional strategies is the first and critical stage in presenting items to clients, as it directly impacts the benefits that the firm will get. So far, the sales process has not been affected by the promotional method. The objective of this research was to use the K-Means Clustering algorithm in a data mining procedure to optimize the categorization of customer data, CRISP-DM is used for the purpose of comprehending and preparing data, constructing models, evaluating them, and deploying them. The CRISP-DM method is employed specifically for the construction of clusters. A non-hierarchical clustering technique called K-Means divides data into many groups according on how similar they are. The program facilitates the determination of appropriate location mapping for promotional purposes. The study results may serve as a foundation for decision-making in order to maximize promotional techniques, using the generated clusters
    • …
    corecore