410 research outputs found
Computing with Granular Words
Computational linguistics is a sub-field of artificial intelligence; it is an interdisciplinary field dealing with statistical and/or rule-based modeling of natural language from a computational perspective. Traditionally, fuzzy logic is used to deal with fuzziness among single linguistic terms in documents. However, linguistic terms may be related to other types of uncertainty. For instance, different users search âcheap hotelâ in a search engine, they may need distinct pieces of relevant hidden information such as shopping, transportation, weather, etc. Therefore, this research work focuses on studying granular words and developing new algorithms to process them to deal with uncertainty globally. To precisely describe the granular words, a new structure called Granular Information Hyper Tree (GIHT) is constructed. Furthermore, several technologies are developed to cooperate with computing with granular words in spam filtering and query recommendation. Based on simulation results, the GIHT-Bayesian algorithm can get more accurate spam filtering rate than conventional method Naive Bayesian and SVM; computing with granular word also generates better recommendation results based on usersâ assessment when applied it to search engine
Incorporating complex domain knowledge into a recommender system in the healthcare sector
In contrast to other domains, recommender systems in health sector may benefit particularly from the incorporation of medical domain knowledge, as it provides meaningful and personalised recommendations. With recent advances in the area of representation learning enabling the hierarchical embedding of health knowledge into the hyperbolic PoincarĂ© space, this thesis proposes a recommender system for patient-doctor matchmaking based on patientsâ individual health profiles and consultation history. In doing so, a dataset from a private healthcare provider is enriched with PoincarĂ© embeddings of the ICD-9 codes. The proposed model outperforms its conventional counterpart in terms of recommendation accuracy
Large-scale image collection cleansing, summarization and exploration
A perennially interesting topic in the research field of large scale image collection organization is how to effectively and efficiently conduct the tasks of image cleansing, summarization and exploration. The primary objective of such an image organization system is to enhance user exploration experience with redundancy removal and summarization operations on large-scale image collection. An ideal system is to discover and utilize the visual correlation among the images, to reduce the redundancy in large-scale image collection, to organize and visualize the structure of large-scale image collection, and to facilitate exploration and knowledge discovery.
In this dissertation, a novel system is developed for exploiting and navigating large-scale image collection. Our system consists of the following key components: (a) junk image filtering by incorporating bilingual search results; (b) near duplicate image detection by using a coarse-to-fine framework; (c) concept network generation and visualization; (d) image collection summarization via dictionary learning for sparse representation; and (e) a multimedia practice of graffiti image retrieval and exploration.
For junk image filtering, bilingual image search results, which are adopted for the same keyword-based query, are integrated to automatically identify the clusters for the junk images and the clusters for the relevant images. Within relevant image clusters, the results are further refined by removing the duplications under a coarse-to-fine structure. The duplicate pairs are detected with both global feature (partition based color histogram) and local feature (CPAM and SIFT Bag-of-Word model). The duplications are detected and removed from the data collection to facilitate further exploration and visual correlation analysis. After junk image filtering and duplication removal, the visual concepts are further organized and visualized by the proposed concept network. An automatic algorithm is developed to generate such visual concept network which characterizes the visual correlation between image concept pairs. Multiple kernels are combined and a kernel canonical correlation analysis algorithm is used to characterize the diverse visual similarity contexts between the image concepts. The FishEye visualization technique is implemented to facilitate the navigation of image concepts through our image concept network. To better assist the exploration of large scale data collection, we design an efficient summarization algorithm to extract representative examplars. For this collection summarization task, a sparse dictionary (a small set of the most representative images) is learned to represent all the images in the given set, e.g., such sparse dictionary is treated as the summary for the given image set. The simulated annealing algorithm is adopted to learn such sparse dictionary (image summary) by minimizing an explicit optimization function.
In order to handle large scale image collection, we have evaluated both the accuracy performance of the proposed algorithms and their computation efficiency. For each of the above tasks, we have conducted experiments on multiple public available image collections, such as ImageNet, NUS-WIDE, LabelMe, etc. We have observed very promising results compared to existing frameworks. The computation performance is also satisfiable for large-scale image collection applications. The original intention to design such a large-scale image collection exploration and organization system is to better service the tasks of information retrieval and knowledge discovery. For this purpose, we utilize the proposed system to a graffiti retrieval and exploration application and receive positive feedback
From Keyword Search to Exploration: How Result Visualization Aids Discovery on the Web
A key to the Web's success is the power of search. The elegant way in which search results are returned is usually remarkably effective. However, for exploratory search in which users need to learn, discover, and understand novel or complex topics, there is substantial room for improvement. Human computer interaction researchers and web browser designers have developed novel strategies to improve Web search by enabling users to conveniently visualize, manipulate, and organize their Web search results. This monograph offers fresh ways to think about search-related cognitive processes and describes innovative design approaches to browsers and related tools. For instance, while key word search presents users with results for specific information (e.g., what is the capitol of Peru), other methods may let users see and explore the contexts of their requests for information (related or previous work, conflicting information), or the properties that associate groups of information assets (group legal decisions by lead attorney). We also consider the both traditional and novel ways in which these strategies have been evaluated. From our review of cognitive processes, browser design, and evaluations, we reflect on the future opportunities and new paradigms for exploring and interacting with Web search results
Email fraud classifier using machine learning
Treballs Finals de Grau d'Enginyeria Informà tica, Facultat de Matemà tiques, Universitat de Barcelona, Any: 2020, Director: Jordi José Bazån[en] Email is one of the most common methods of communication nowadays. Programs known as malware detection are essential to assist and protect users from the agents that are usually responsible for cyberattacks. This paper focuses on using machine learning algorithms to detect any possible email attacks by analyzing datasets of whitelists and blacklists. This document also includes other methods that try to solve this problem
Sentiment Analysis of the Public Towards the Kanjuruhan Tragedy with the Support Vector Machine Method
A tragedy occurred in the Indonesian football world during the Arema vs. Persebaya match on October 1, 2022, resulting in the loss of approximately 714 lives, including 131 fatalities and 583 injuries. The tragedy is believed to have been caused by tear gas in the spectator stands and the closure of exits at the Kanjuruhan stadium. This event sparked a diverse range of public responses on social media, which can be analyzed through sentiment analysis. In this study, we employed the Support Vector Machine (SVM) algorithm, known for its speed and accuracy in text classification, to process and analyze tweets from October 1 to 31, 2022, as well as YouTube comments related to the Kanjuruhan tragedy from October 1 to November 20, 2022. Among the different SVM kernels, the RBF kernel exhibited the highest accuracy, precision, recall, and F1 scores, reaching 76.40%, 75.74%, 76.40%, and 75.18% respectively, when predicting data with three labels. Furthermore, the RBF kernel showed the best performance for data with two labels, achieving the highest accuracy, precision, recall, and F1-Score, which increased to 81.54%, 81.56%, 81.54%, and 81.56%, respectively
Document Clustering as an approach to template extraction
Dissertation presented as the partial requirement for obtaining a Master's degree in Information Management, specialization in Knowledge Management and Business IntelligenceA great part of customer support is done via the exchange of emails. As the number of emails exchanged daily is constantly increasing, companies need to find approaches to ensure its efficiency. One common strategy is the usage of template emails as an answer. These answers templates are usually found by a human agent through the repetitive usage of the same answer. In this work, we use a clustering approach to find these answer templates. Several clustering algorithms are researched in this work, with a focus on the k-means methodology, as well as other clustering components such as similarity measures and pre-processing steps. As we are dealing with text data, several text representation methods are also compared. Due to the peculiarity of the provided data, we are able to design methodologies to ensure the feasibility of this task and develop strategies to extract the answer templates from the clustering results
Data Mining Application for Healthcare Sector: Predictive Analysis of Heart Attacks
Project Work presented as the partial requirement for obtaining a Master's degree in Information Management, specialization in Knowledge Management and Business IntelligenceCardiovascular diseases are the main cause of the number of deaths in the world, being the heart
disease the most killing one affecting more than 75% of individuals living in countries of low and middle
earnings. Considering all the consequences, firstly for the individualâs health, but also for the health
system and the cost of healthcare (for instance, treatments and medication), specifically for
cardiovascular diseases treatment, it has become extremely important the provision of quality services
by making use of preventive medicine, whose focus is identifying the disease risk, and then, applying
the right action in case of early signs. Therefore, by resorting to DM (Data Mining) and its techniques,
there is the ability to uncover patterns and relationships amongst the objects in healthcare data, giving
the potential to use it more efficiently, and to produce business intelligence and extract knowledge
that will be crucial for future answers about possible diseases and treatments on patients. Nowadays,
the concept of DM is already applied in medical information systems for clinical purposes such as
diagnosis and treatments, that by making use of predictive models can diagnose some group of
diseases, in this case, heart attacks.
The focus of this project consists on applying machine learning techniques to develop a predictive
model based on a real dataset, in order to detect through the analysis of patientâs data whether a
person can have a heart attack or not. At the end, the best model is found by comparing the different
algorithms used and assessing its results, and then, selecting the one with the best measures.
The correct identification of early cardiovascular problems signs through the analysis of patient data
can lead to the possible prevention of heart attacks, to the consequent reduction of complications and
secondary effects that the disease may bring, and most importantly, to the decrease on the number
of deaths in the future. Making use of Data Mining and analytics in healthcare will allow the analysis
of high volumes of data, the development of new predictive models, and the understanding of the
factors and variables that have the most influence and contribution for this disease, which people
should pay attention. Hence, this practical approach is an example of how predictive analytics can have
an important impact in the healthcare sector: through the collection of patientâs data, models learn
from it so that in the future they can predict new unknown cases of heart attacks with better
accuracies. In this way, it contributes to the creation of new models, to the tracking of patientâs health
data, to the improvement of medical decisions, to efficient and faster responses, and to the wellbeing
of the population that can be improved if diseases like this can be predicted and avoided. To conclude, this project aims to present and show how Data Mining techniques are applied in
healthcare and medicine, and how they contribute for the better knowledge of cardiovascular diseases
and for the support of important decisions that will influence the patientâs quality of life
- âŠ