5,626 research outputs found
Concept-based Interactive Query Expansion Support Tool (CIQUEST)
This report describes a three-year project (2000-03) undertaken in the Information Studies
Department at The University of Sheffield and funded by Resource, The Council for
Museums, Archives and Libraries. The overall aim of the research was to provide user
support for query formulation and reformulation in searching large-scale textual resources
including those of the World Wide Web. More specifically the objectives were: to investigate
and evaluate methods for the automatic generation and organisation of concepts derived from
retrieved document sets, based on statistical methods for term weighting; and to conduct
user-based evaluations on the understanding, presentation and retrieval effectiveness of
concept structures in selecting candidate terms for interactive query expansion.
The TREC test collection formed the basis for the seven evaluative experiments conducted in
the course of the project. These formed four distinct phases in the project plan. In the first
phase, a series of experiments was conducted to investigate further techniques for concept
derivation and hierarchical organisation and structure. The second phase was concerned with
user-based validation of the concept structures. Results of phases 1 and 2 informed on the
design of the test system and the user interface was developed in phase 3. The final phase
entailed a user-based summative evaluation of the CiQuest system.
The main findings demonstrate that concept hierarchies can effectively be generated from
sets of retrieved documents and displayed to searchers in a meaningful way. The approach
provides the searcher with an overview of the contents of the retrieved documents, which in
turn facilitates the viewing of documents and selection of the most relevant ones. Concept
hierarchies are a good source of terms for query expansion and can improve precision. The
extraction of descriptive phrases as an alternative source of terms was also effective. With
respect to presentation, cascading menus were easy to browse for selecting terms and for
viewing documents. In conclusion the project dissemination programme and future work are
outlined
Lexicon-based sentiment analysis in texts using Formal Concept Analysis
In this paper, we present a novel approach for sentiment analysis that uses Formal Concept Analysis (FCA) to create dictionaries for classification. Unlike other methods that rely on pre-defined lexicons, our approach allows for the creation of customised dictionaries that are tailored to the specific data and tasks. By using a dataset of tweets categorised into positive and negative polarity, we show that our approach achieves a better performance than other standard dictionariesThis research is partially supported by the State Agency of Research (AEI), the Spanish Ministry of Science, Innovation, and Universities (MCIU), the European Social Fund (FEDER), the Junta de AndalucĂa (JA), and the Universidad de Málaga (UMA) through the FPU19/01467 (MCIU) internship and the research projects with reference PGC2018-095869-B-I00, TIN2017-89023-P, PID2021-127870OB-I00 (MCIU/AEI/FEDER, UE) and UMA18-FEDERJA-001 (JA/ UMA/ FEDER, UE). Funding for open access charge: Universidad de Málaga / CBU
Noise Reduction In Web Data: A Learning Approach Based On Dynamic User Interests
One of the prominent challenges internet operators encounter is the abundance of extraneous material inside web content, hence impeding the efficient retrieval of relevant information aligned with their evolving interests. The present state of affairs. In academic research, noise is commonly defined as any extraneous data that does not contribute to the intended analysis or study objectives. This study aims to analyse the primary webpage and suggest noise reduction tools for online data. The primary emphasis is on the reduction of noise about the content and its associated factors. The arrangement or organisation of data on the internet. In this paper, including some data inside a dataset may not be universally applicable or appropriate. The web page's primary content pertains to the user's specific interests, while extraneous info is minimised. Noise can be perceived as disruptive or unwanted sound by an individual. Hence, the acquisition of noisy online data and the allocation of resources to user requests ensure not just a decrease in noise levels. There is an observed correlation between the level indicated in a user profile on the web and a reduction in the occurrence of valuable information loss. The inclusion of information consequently enhances the calibre of an online user profile. The phenomenon of noise refers to unwanted or disruptive sounds that can have negative effects on individuals and the Web Data Learning (NWDL) tool/algorithm exhibits the capacity to acquire knowledge. The proposal suggests the use of noise in web user profiles to enhance data privacy. The work that has suggested the removal of noise data in the context of dynamic user behaviour is being considered. The topic of interest is being discussed. To ascertain the efficacy of the proposed study, A presentation of an experimental design arrangement is provided. The results were achieved in contrast to the presently employed algorithms utilised in the context of noisy online data. The reducing process. The experimental findings indicate that the proposed study examines the dynamic evolution of user interest before the removal of extraneous data. The proposed study makes a significant contribution to Enhancing the calibre of an online user profile through the reduction of content volume. The elimination of noise results in the removal of beneficial information
Intelligent data mining using artificial neural networks and genetic algorithms : techniques and applications
Data Mining (DM) refers to the analysis of observational datasets to find
relationships and to summarize the data in ways that are both understandable
and useful. Many DM techniques exist. Compared with other DM techniques,
Intelligent Systems (ISs) based approaches, which include Artificial Neural
Networks (ANNs), fuzzy set theory, approximate reasoning, and derivative-free
optimization methods such as Genetic Algorithms (GAs), are tolerant of
imprecision, uncertainty, partial truth, and approximation. They provide
flexible information processing capability for handling real-life situations. This
thesis is concerned with the ideas behind design, implementation, testing and
application of a novel ISs based DM technique. The unique contribution of this
thesis is in the implementation of a hybrid IS DM technique (Genetic Neural
Mathematical Method, GNMM) for solving novel practical problems, the
detailed description of this technique, and the illustrations of several
applications solved by this novel technique.
GNMM consists of three steps: (1) GA-based input variable selection, (2) Multi-
Layer Perceptron (MLP) modelling, and (3) mathematical programming based
rule extraction. In the first step, GAs are used to evolve an optimal set of MLP
inputs. An adaptive method based on the average fitness of successive
generations is used to adjust the mutation rate, and hence the
exploration/exploitation balance. In addition, GNMM uses the elite group and
appearance percentage to minimize the randomness associated with GAs. In
the second step, MLP modelling serves as the core DM engine in performing
classification/prediction tasks. An Independent Component Analysis (ICA)
based weight initialization algorithm is used to determine optimal weights
before the commencement of training algorithms. The Levenberg-Marquardt
(LM) algorithm is used to achieve a second-order speedup compared to
conventional Back-Propagation (BP) training. In the third step, mathematical
programming based rule extraction is not only used to identify the premises of
multivariate polynomial rules, but also to explore features from the extracted
rules based on data samples associated with each rule. Therefore, the
methodology can provide regression rules and features not only in the
polyhedrons with data instances, but also in the polyhedrons without data
instances.
A total of six datasets from environmental and medical disciplines were used
as case study applications. These datasets involve the prediction of
longitudinal dispersion coefficient, classification of electrocorticography
(ECoG)/Electroencephalogram (EEG) data, eye bacteria Multisensor Data
Fusion (MDF), and diabetes classification (denoted by Data I through to Data VI). GNMM was applied to all these six datasets to explore its effectiveness,
but the emphasis is different for different datasets. For example, the emphasis
of Data I and II was to give a detailed illustration of how GNMM works; Data III
and IV aimed to show how to deal with difficult classification problems; the
aim of Data V was to illustrate the averaging effect of GNMM; and finally Data
VI was concerned with the GA parameter selection and benchmarking GNMM
with other IS DM techniques such as Adaptive Neuro-Fuzzy Inference System
(ANFIS), Evolving Fuzzy Neural Network (EFuNN), Fuzzy ARTMAP, and
Cartesian Genetic Programming (CGP). In addition, datasets obtained from
published works (i.e. Data II & III) or public domains (i.e. Data VI) where
previous results were present in the literature were also used to benchmark
GNMM’s effectiveness.
As a closely integrated system GNMM has the merit that it needs little human
interaction. With some predefined parameters, such as GA’s crossover
probability and the shape of ANNs’ activation functions, GNMM is able to
process raw data until some human-interpretable rules being extracted. This is
an important feature in terms of practice as quite often users of a DM system
have little or no need to fully understand the internal components of such a
system. Through case study applications, it has been shown that the GA-based
variable selection stage is capable of: filtering out irrelevant and noisy
variables, improving the accuracy of the model; making the ANN structure less
complex and easier to understand; and reducing the computational complexity
and memory requirements. Furthermore, rule extraction ensures that the MLP
training results are easily understandable and transferrable
Quantification of uncertainty of geometallurgical variables for mine planning optimisation
Interest in geometallurgy has increased significantly over the past 15 years or
so because of the benefits it brings to mine planning and operation. Its use
and integration into design, planning and operation is becoming increasingly
critical especially in the context of declining ore grades and increasing mining
and processing costs.
This thesis, comprising four papers, offers methodologies and methods to
quantify geometallurgical uncertainty and enrich the block model with geometallurgical
variables, which contribute to improved optimisation of mining
operations. This enhanced block model is termed a geometallurgical block
model.
Bootstrapped non-linear regression models by projection pursuit were built
to predict grindability indices and recovery, and quantify model uncertainty.
These models are useful for populating the geometallurgical block model with
response attributes. New multi-objective optimisation formulations for block
caving mining were formulated and solved by a meta-heuristics solver focussing
on maximising the project revenue and, at the same time, minimising
several risk measures. A novel clustering method, which is able to use
both continuous and categorical attributes and incorporate expert knowledge,
was also developed for geometallurgical domaining which characterises the
deposit according to its metallurgical response. The concept of geometallurgical
dilution was formulated and used for optimising production scheduling in
an open-pit case study.Thesis (Ph.D.) (Research by Publication) -- University of Adelaide, School of Civil, Environmental and Mining Engineering, 201
Slave to the Algorithm? Why a \u27Right to an Explanation\u27 Is Probably Not the Remedy You Are Looking For
Algorithms, particularly machine learning (ML) algorithms, are increasingly important to individuals’ lives, but have caused a range of concerns revolving mainly around unfairness, discrimination and opacity. Transparency in the form of a “right to an explanation” has emerged as a compellingly attractive remedy since it intuitively promises to open the algorithmic “black box” to promote challenge, redress, and hopefully heightened accountability. Amidst the general furore over algorithmic bias we describe, any remedy in a storm has looked attractive. However, we argue that a right to an explanation in the EU General Data Protection Regulation (GDPR) is unlikely to present a complete remedy to algorithmic harms, particularly in some of the core “algorithmic war stories” that have shaped recent attitudes in this domain. Firstly, the law is restrictive, unclear, or even paradoxical concerning when any explanation-related right can be triggered. Secondly, even navigating this, the legal conception of explanations as “meaningful information about the logic of processing” may not be provided by the kind of ML “explanations” computer scientists have developed, partially in response. ML explanations are restricted both by the type of explanation sought, the dimensionality of the domain and the type of user seeking an explanation. However, “subject-centric explanations (SCEs) focussing on particular regions of a model around a query show promise for interactive exploration, as do explanation systems based on learning a model from outside rather than taking it apart (pedagogical versus decompositional explanations) in dodging developers\u27 worries of intellectual property or trade secrets disclosure. Based on our analysis, we fear that the search for a “right to an explanation” in the GDPR may be at best distracting, and at worst nurture a new kind of “transparency fallacy.” But all is not lost. We argue that other parts of the GDPR related (i) to the right to erasure ( right to be forgotten ) and the right to data portability; and (ii) to privacy by design, Data Protection Impact Assessments and certification and privacy seals, may have the seeds we can use to make algorithms more responsible, explicable, and human-centered
Exploring embedding vectors for emotion detection
Textual data nowadays is being generated in vast volumes. With the proliferation of social media and the prevalence of smartphones, short texts have become a prevalent form of information such as news headlines, tweets and text advertisements. Given the huge volume of short texts available, effective and efficient models to detect the emotions from short texts become highly desirable and in some cases fundamental to a range of applications that require emotion understanding of textual content, such as human computer interaction, marketing, e-learning and health.
Emotion detection from text has been an important task in Natural Language Processing (NLP) for many years. Many approaches have been based on the emotional words or lexicons in order to detect emotions. While the word embedding vectors like Word2Vec have been successfully employed in many NLP approaches, the word mover’s distance (WMD) is a method introduced recently to calculate the distance between two documents based on the embedded words. This thesis is investigating the ability to detect or classify emotions in sentences using word vectorization and distance measures. Our results confirm the novelty of using Word2Vec and WMD in predicting the emotions in short text.
We propose a new methodology based on identifying “idealised” vectors that cap- ture the essence of an emotion; we define these vectors as having the minimal distance (using some metric function) between a vector and the embeddings of the text that contains the relevant emotion (e.g. a tweet, a sentence). We look for these vectors through searching the space of word embeddings using the covariance matrix adap- tation evolution strategy (CMA-ES). Our method produces state of the art results, surpassing classic supervised learning methods
Enhancing Promotional Strategy Mapping Using the K-Means Clustering Algorithm to Raise Sales
To enhance sales, organizations must improve the alignment of their promotional tactics. Enterprises have the ability to promote their goods in locations where there is demand for them. Facilitating the delivery of the goods would enhance the ease with which clients can carry out their purchases and sales transactions. A corporation's ability to strategically allocate its goods enables it to expand its operations. Prospective clients have a greater array of choices at their disposal than the total number of enterprises operating within the same sector. This is accomplished by using a diverse range of promotional media to enhance the sales of products and services. Optimizing promotional strategies is the first and critical stage in presenting items to clients, as it directly impacts the benefits that the firm will get. So far, the sales process has not been affected by the promotional method. The objective of this research was to use the K-Means Clustering algorithm in a data mining procedure to optimize the categorization of customer data, CRISP-DM is used for the purpose of comprehending and preparing data, constructing models, evaluating them, and deploying them. The CRISP-DM method is employed specifically for the construction of clusters. A non-hierarchical clustering technique called K-Means divides data into many groups according on how similar they are. The program facilitates the determination of appropriate location mapping for promotional purposes. The study results may serve as a foundation for decision-making in order to maximize promotional techniques, using the generated clusters
- …