410 research outputs found

    Computing with Granular Words

    Get PDF
    Computational linguistics is a sub-field of artificial intelligence; it is an interdisciplinary field dealing with statistical and/or rule-based modeling of natural language from a computational perspective. Traditionally, fuzzy logic is used to deal with fuzziness among single linguistic terms in documents. However, linguistic terms may be related to other types of uncertainty. For instance, different users search ‘cheap hotel’ in a search engine, they may need distinct pieces of relevant hidden information such as shopping, transportation, weather, etc. Therefore, this research work focuses on studying granular words and developing new algorithms to process them to deal with uncertainty globally. To precisely describe the granular words, a new structure called Granular Information Hyper Tree (GIHT) is constructed. Furthermore, several technologies are developed to cooperate with computing with granular words in spam filtering and query recommendation. Based on simulation results, the GIHT-Bayesian algorithm can get more accurate spam filtering rate than conventional method Naive Bayesian and SVM; computing with granular word also generates better recommendation results based on users’ assessment when applied it to search engine

    Incorporating complex domain knowledge into a recommender system in the healthcare sector

    Get PDF
    In contrast to other domains, recommender systems in health sector may benefit particularly from the incorporation of medical domain knowledge, as it provides meaningful and personalised recommendations. With recent advances in the area of representation learning enabling the hierarchical embedding of health knowledge into the hyperbolic PoincarĂ© space, this thesis proposes a recommender system for patient-doctor matchmaking based on patients’ individual health profiles and consultation history. In doing so, a dataset from a private healthcare provider is enriched with PoincarĂ© embeddings of the ICD-9 codes. The proposed model outperforms its conventional counterpart in terms of recommendation accuracy

    Large-scale image collection cleansing, summarization and exploration

    Get PDF
    A perennially interesting topic in the research field of large scale image collection organization is how to effectively and efficiently conduct the tasks of image cleansing, summarization and exploration. The primary objective of such an image organization system is to enhance user exploration experience with redundancy removal and summarization operations on large-scale image collection. An ideal system is to discover and utilize the visual correlation among the images, to reduce the redundancy in large-scale image collection, to organize and visualize the structure of large-scale image collection, and to facilitate exploration and knowledge discovery. In this dissertation, a novel system is developed for exploiting and navigating large-scale image collection. Our system consists of the following key components: (a) junk image filtering by incorporating bilingual search results; (b) near duplicate image detection by using a coarse-to-fine framework; (c) concept network generation and visualization; (d) image collection summarization via dictionary learning for sparse representation; and (e) a multimedia practice of graffiti image retrieval and exploration. For junk image filtering, bilingual image search results, which are adopted for the same keyword-based query, are integrated to automatically identify the clusters for the junk images and the clusters for the relevant images. Within relevant image clusters, the results are further refined by removing the duplications under a coarse-to-fine structure. The duplicate pairs are detected with both global feature (partition based color histogram) and local feature (CPAM and SIFT Bag-of-Word model). The duplications are detected and removed from the data collection to facilitate further exploration and visual correlation analysis. After junk image filtering and duplication removal, the visual concepts are further organized and visualized by the proposed concept network. An automatic algorithm is developed to generate such visual concept network which characterizes the visual correlation between image concept pairs. Multiple kernels are combined and a kernel canonical correlation analysis algorithm is used to characterize the diverse visual similarity contexts between the image concepts. The FishEye visualization technique is implemented to facilitate the navigation of image concepts through our image concept network. To better assist the exploration of large scale data collection, we design an efficient summarization algorithm to extract representative examplars. For this collection summarization task, a sparse dictionary (a small set of the most representative images) is learned to represent all the images in the given set, e.g., such sparse dictionary is treated as the summary for the given image set. The simulated annealing algorithm is adopted to learn such sparse dictionary (image summary) by minimizing an explicit optimization function. In order to handle large scale image collection, we have evaluated both the accuracy performance of the proposed algorithms and their computation efficiency. For each of the above tasks, we have conducted experiments on multiple public available image collections, such as ImageNet, NUS-WIDE, LabelMe, etc. We have observed very promising results compared to existing frameworks. The computation performance is also satisfiable for large-scale image collection applications. The original intention to design such a large-scale image collection exploration and organization system is to better service the tasks of information retrieval and knowledge discovery. For this purpose, we utilize the proposed system to a graffiti retrieval and exploration application and receive positive feedback

    From Keyword Search to Exploration: How Result Visualization Aids Discovery on the Web

    No full text
    A key to the Web's success is the power of search. The elegant way in which search results are returned is usually remarkably effective. However, for exploratory search in which users need to learn, discover, and understand novel or complex topics, there is substantial room for improvement. Human computer interaction researchers and web browser designers have developed novel strategies to improve Web search by enabling users to conveniently visualize, manipulate, and organize their Web search results. This monograph offers fresh ways to think about search-related cognitive processes and describes innovative design approaches to browsers and related tools. For instance, while key word search presents users with results for specific information (e.g., what is the capitol of Peru), other methods may let users see and explore the contexts of their requests for information (related or previous work, conflicting information), or the properties that associate groups of information assets (group legal decisions by lead attorney). We also consider the both traditional and novel ways in which these strategies have been evaluated. From our review of cognitive processes, browser design, and evaluations, we reflect on the future opportunities and new paradigms for exploring and interacting with Web search results

    Email fraud classifier using machine learning

    Get PDF
    Treballs Finals de Grau d'Enginyeria Informàtica, Facultat de Matemàtiques, Universitat de Barcelona, Any: 2020, Director: Jordi José Bazån[en] Email is one of the most common methods of communication nowadays. Programs known as malware detection are essential to assist and protect users from the agents that are usually responsible for cyberattacks. This paper focuses on using machine learning algorithms to detect any possible email attacks by analyzing datasets of whitelists and blacklists. This document also includes other methods that try to solve this problem

    Sentiment Analysis of the Public Towards the Kanjuruhan Tragedy with the Support Vector Machine Method

    Get PDF
    A tragedy occurred in the Indonesian football world during the Arema vs. Persebaya match on October 1, 2022, resulting in the loss of approximately 714 lives, including 131 fatalities and 583 injuries. The tragedy is believed to have been caused by tear gas in the spectator stands and the closure of exits at the Kanjuruhan stadium. This event sparked a diverse range of public responses on social media, which can be analyzed through sentiment analysis. In this study, we employed the Support Vector Machine (SVM) algorithm, known for its speed and accuracy in text classification, to process and analyze tweets from October 1 to 31, 2022, as well as YouTube comments related to the Kanjuruhan tragedy from October 1 to November 20, 2022. Among the different SVM kernels, the RBF kernel exhibited the highest accuracy, precision, recall, and F1 scores, reaching 76.40%, 75.74%, 76.40%, and 75.18% respectively, when predicting data with three labels. Furthermore, the RBF kernel showed the best performance for data with two labels, achieving the highest accuracy, precision, recall, and F1-Score, which increased to 81.54%, 81.56%, 81.54%, and 81.56%, respectively

    Document Clustering as an approach to template extraction

    Get PDF
    Dissertation presented as the partial requirement for obtaining a Master's degree in Information Management, specialization in Knowledge Management and Business IntelligenceA great part of customer support is done via the exchange of emails. As the number of emails exchanged daily is constantly increasing, companies need to find approaches to ensure its efficiency. One common strategy is the usage of template emails as an answer. These answers templates are usually found by a human agent through the repetitive usage of the same answer. In this work, we use a clustering approach to find these answer templates. Several clustering algorithms are researched in this work, with a focus on the k-means methodology, as well as other clustering components such as similarity measures and pre-processing steps. As we are dealing with text data, several text representation methods are also compared. Due to the peculiarity of the provided data, we are able to design methodologies to ensure the feasibility of this task and develop strategies to extract the answer templates from the clustering results

    Data Mining Application for Healthcare Sector: Predictive Analysis of Heart Attacks

    Get PDF
    Project Work presented as the partial requirement for obtaining a Master's degree in Information Management, specialization in Knowledge Management and Business IntelligenceCardiovascular diseases are the main cause of the number of deaths in the world, being the heart disease the most killing one affecting more than 75% of individuals living in countries of low and middle earnings. Considering all the consequences, firstly for the individual’s health, but also for the health system and the cost of healthcare (for instance, treatments and medication), specifically for cardiovascular diseases treatment, it has become extremely important the provision of quality services by making use of preventive medicine, whose focus is identifying the disease risk, and then, applying the right action in case of early signs. Therefore, by resorting to DM (Data Mining) and its techniques, there is the ability to uncover patterns and relationships amongst the objects in healthcare data, giving the potential to use it more efficiently, and to produce business intelligence and extract knowledge that will be crucial for future answers about possible diseases and treatments on patients. Nowadays, the concept of DM is already applied in medical information systems for clinical purposes such as diagnosis and treatments, that by making use of predictive models can diagnose some group of diseases, in this case, heart attacks. The focus of this project consists on applying machine learning techniques to develop a predictive model based on a real dataset, in order to detect through the analysis of patient’s data whether a person can have a heart attack or not. At the end, the best model is found by comparing the different algorithms used and assessing its results, and then, selecting the one with the best measures. The correct identification of early cardiovascular problems signs through the analysis of patient data can lead to the possible prevention of heart attacks, to the consequent reduction of complications and secondary effects that the disease may bring, and most importantly, to the decrease on the number of deaths in the future. Making use of Data Mining and analytics in healthcare will allow the analysis of high volumes of data, the development of new predictive models, and the understanding of the factors and variables that have the most influence and contribution for this disease, which people should pay attention. Hence, this practical approach is an example of how predictive analytics can have an important impact in the healthcare sector: through the collection of patient’s data, models learn from it so that in the future they can predict new unknown cases of heart attacks with better accuracies. In this way, it contributes to the creation of new models, to the tracking of patient’s health data, to the improvement of medical decisions, to efficient and faster responses, and to the wellbeing of the population that can be improved if diseases like this can be predicted and avoided. To conclude, this project aims to present and show how Data Mining techniques are applied in healthcare and medicine, and how they contribute for the better knowledge of cardiovascular diseases and for the support of important decisions that will influence the patient’s quality of life
    • 

    corecore