125 research outputs found

    On the Selection of Meaningful Association Rules

    Get PDF

    A Multi-modal Approach to Fine-grained Opinion Mining on Video Reviews

    Get PDF
    Despite the recent advances in opinion mining for written reviews, few works have tackled the problem on other sources of reviews. In light of this issue, we propose a multi-modal approach for mining fine-grained opinions from video reviews that is able to determine the aspects of the item under review that are being discussed and the sentiment orientation towards them. Our approach works at the sentence level without the need for time annotations and uses features derived from the audio, video and language transcriptions of its contents. We evaluate our approach on two datasets and show that leveraging the video and audio modalities consistently provides increased performance over text-only baselines, providing evidence these extra modalities are key in better understanding video reviews.Comment: Second Grand Challenge and Workshop on Multimodal Language ACL 202

    FIN-DM: finantsteenuste andmekaeve protsessi mudel

    Get PDF
    Andmekaeve hõlmab reeglite kogumit, protsesse ja algoritme, mis võimaldavad ettevõtetel iga päev kogutud andmetest rakendatavaid teadmisi ammutades suurendada tulusid, vähendada kulusid, optimeerida tooteid ja kliendisuhteid ning saavutada teisi eesmärke. Andmekaeves ja -analüütikas on vaja hästi määratletud metoodikat ja protsesse. Saadaval on mitu andmekaeve ja -analüütika standardset protsessimudelit. Kõige märkimisväärsem ja laialdaselt kasutusele võetud standardmudel on CRISP-DM. Tegu on tegevusalast sõltumatu protsessimudeliga, mida kohandatakse sageli sektorite erinõuetega. CRISP-DMi tegevusalast lähtuvaid kohandusi on pakutud mitmes valdkonnas, kaasa arvatud meditsiini-, haridus-, tööstus-, tarkvaraarendus- ja logistikavaldkonnas. Seni pole aga mudelit kohandatud finantsteenuste sektoris, millel on omad valdkonnapõhised erinõuded. Doktoritöös käsitletakse seda lünka finantsteenuste sektoripõhise andmekaeveprotsessi (FIN-DM) kavandamise, arendamise ja hindamise kaudu. Samuti uuritakse, kuidas kasutatakse andmekaeve standardprotsesse eri tegevussektorites ja finantsteenustes. Uurimise käigus tuvastati mitu tavapärase raamistiku kohandamise stsenaariumit. Lisaks ilmnes, et need meetodid ei keskendu piisavalt sellele, kuidas muuta andmekaevemudelid tarkvaratoodeteks, mida saab integreerida organisatsioonide IT-arhitektuuri ja äriprotsessi. Peamised finantsteenuste valdkonnas tuvastatud kohandamisstsenaariumid olid seotud andmekaeve tehnoloogiakesksete (skaleeritavus), ärikesksete (tegutsemisvõime) ja inimkesksete (diskrimineeriva mõju leevendus) aspektidega. Seejärel korraldati tegelikus finantsteenuste organisatsioonis juhtumiuuring, mis paljastas 18 tajutavat puudujääki CRISP- DMi protsessis. Uuringu andmete ja tulemuste abil esitatakse doktoritöös finantsvaldkonnale kohandatud CRISP-DM nimega FIN-DM ehk finantssektori andmekaeve protsess (Financial Industry Process for Data Mining). FIN-DM laiendab CRISP-DMi nii, et see toetab privaatsust säilitavat andmekaevet, ohjab tehisintellekti eetilisi ohte, täidab riskijuhtimisnõudeid ja hõlmab kvaliteedi tagamist kui osa andmekaeve elutsüklisData mining is a set of rules, processes, and algorithms that allow companies to increase revenues, reduce costs, optimize products and customer relationships, and achieve other business goals, by extracting actionable insights from the data they collect on a day-to-day basis. Data mining and analytics projects require well-defined methodology and processes. Several standard process models for conducting data mining and analytics projects are available. Among them, the most notable and widely adopted standard model is CRISP-DM. It is industry-agnostic and often is adapted to meet sector-specific requirements. Industry- specific adaptations of CRISP-DM have been proposed across several domains, including healthcare, education, industrial and software engineering, logistics, etc. However, until now, there is no existing adaptation of CRISP-DM for the financial services industry, which has its own set of domain-specific requirements. This PhD Thesis addresses this gap by designing, developing, and evaluating a sector-specific data mining process for financial services (FIN-DM). The PhD thesis investigates how standard data mining processes are used across various industry sectors and in financial services. The examination identified number of adaptations scenarios of traditional frameworks. It also suggested that these approaches do not pay sufficient attention to turning data mining models into software products integrated into the organizations' IT architectures and business processes. In the financial services domain, the main discovered adaptation scenarios concerned technology-centric aspects (scalability), business-centric aspects (actionability), and human-centric aspects (mitigating discriminatory effects) of data mining. Next, an examination by means of a case study in the actual financial services organization revealed 18 perceived gaps in the CRISP-DM process. Using the data and results from these studies, the PhD thesis outlines an adaptation of CRISP-DM for the financial sector, named the Financial Industry Process for Data Mining (FIN-DM). FIN-DM extends CRISP-DM to support privacy-compliant data mining, to tackle AI ethics risks, to fulfill risk management requirements, and to embed quality assurance as part of the data mining life-cyclehttps://www.ester.ee/record=b547227

    A Comprehensive Bibliometric Analysis on Social Network Anonymization: Current Approaches and Future Directions

    Full text link
    In recent decades, social network anonymization has become a crucial research field due to its pivotal role in preserving users' privacy. However, the high diversity of approaches introduced in relevant studies poses a challenge to gaining a profound understanding of the field. In response to this, the current study presents an exhaustive and well-structured bibliometric analysis of the social network anonymization field. To begin our research, related studies from the period of 2007-2022 were collected from the Scopus Database then pre-processed. Following this, the VOSviewer was used to visualize the network of authors' keywords. Subsequently, extensive statistical and network analyses were performed to identify the most prominent keywords and trending topics. Additionally, the application of co-word analysis through SciMAT and the Alluvial diagram allowed us to explore the themes of social network anonymization and scrutinize their evolution over time. These analyses culminated in an innovative taxonomy of the existing approaches and anticipation of potential trends in this domain. To the best of our knowledge, this is the first bibliometric analysis in the social network anonymization field, which offers a deeper understanding of the current state and an insightful roadmap for future research in this domain.Comment: 73 pages, 28 figure

    Mobile app recommendations using deep learning and big data

    Get PDF
    Dissertation presented as the partial requirement for obtaining a Master's degree in Statistics and Information Management, specialization in Marketing Research e CRMRecommender systems were first introduced to solve information overload problems in enterprises. Over the last decades, recommender systems have found applications in several major websites related to e-commerce, music and video streaming, travel and movie sites, social media and mobile app stores. Several methods have been proposed over the years to build recommender systems. The most popular approaches are based on collaborative filtering techniques, which leverage the similarities between consumer tastes. But the current state of the art in recommender systems is deep-learning methods, which can leverage not only item consumption data but also content, context, and user attributes. Mobile app stores generate data with Big Data properties from app consumption data, behavioral, geographic, demographic, social network and user-generated content data, which includes reviews, comments and search queries. In this dissertation, we propose a deep-learning architecture for recommender systems in mobile app stores that leverage most of these data sources. We analyze three issues related to the impact of the data sources, the impact of embedding layer pretraining and the efficiency of using Kernel methods to improve app scoring at a Big Data scale. An experiment is conducted on a Portuguese Android app store. Results suggest that models can be improved by combining structured and unstructured data. The results also suggest that embedding layer pretraining is essential to obtain good results. Some evidence is provided showing that Kernel-based methods might not be efficient when deployed in Big Data contexts

    Associative pattern mining for supervised learning

    Get PDF
    The Internet era has revolutionized computational sciences and automated data collection techniques, made large amounts of previously inaccessible data available and, consequently, broadened the scope of exploratory computing research. As a result, data mining, which is still an emerging field of research, has gained importance because of its ability to analyze and discover previously unknown, hidden, and useful knowledge from these large amounts of data. One aspect of data mining, known as frequent pattern mining, has recently gained importance due to its ability to find associative relationships among the parts of data, thereby aiding a type of supervised learning known as associative learning . The purpose of this dissertation is two-fold: to develop and demonstrate supervised associative learning in non-temporal data for multi-class classification and to develop a new frequent pattern mining algorithm for time varying (temporal) data which alleviates the current issues in analyzing this data for knowledge discovery. In order to use associative relationships for classification, we have to algorithmically learn their discriminatory power. While it is well known that multiple sets of features work better for classification, we claim that the isomorphic relationships among the features work even better and, therefore, can be used as higher order features. To validate this claim, we exploit these relationships as input features for classification instead of using the underlying raw features. The next part of this dissertation focuses on building a new classifier using associative relationships as a basis for the multi-class classification problem. Most of the existing associative classifiers represent the instances from a class in a row-based format wherein one row represents features of one instance and extract association rules from the entire dataset. The rules formed in this way are known as class constrained rules, as they have class labels on the right side of the rules. We argue that this class constrained representation schema lacks important information that is necessary for multi-class classification. Further, most existing works use either the intraclass or inter-class importance of the association rules, both of which sets of techniques offer empirical benefits. We hypothesize that both intra-class and inter-class variations are important for fast and accurate multi-class classification. We also present a novel weighted association rule-based classification mechanism that uses frequent relationships among raw features from an instance as the basis for classifying the instance into one of the many classes. The relationships are weighted according to both their intra-class and inter-class importance. The final part of this dissertation concentrates on mining time varying data. This problem is known as inter-transaction association rule mining in the data-mining field. Most of the existing work transforms the time varying data into a static format and then use multiple scans over the new data to extract patterns. We present a unique index-based algorithmic framework for inter-transaction association rule mining. Our proposed technique requires only one scan of the original database. Further, the proposed technique can also provide the location information of each extracted pattern. We use mathematical induction to prove that the new representation scheme captures all underlying frequent relationships

    Data mining techniques for complex application domains

    Get PDF
    The emergence of advanced communication techniques has increased availability of large collection of data in electronic form in a number of application domains including healthcare, e- business, and e-learning. Everyday a large amount of records are stored electronically. However, finding useful information from such a large data collection is a challenging issue. Data mining technology aims automatically extracting hidden knowledge from large data repositories exploiting sophisticated algorithms. The hidden knowledge in the electronic data may be potentially utilized to facilitate the procedures, productivity, and reliability of several application domains. The PhD activity has been focused on novel and effective data mining approaches to tackle the complex data coming from two main application domains: Healthcare data analysis and Textual data analysis. The research activity, in the context of healthcare data, addressed the application of different data mining techniques to discover valuable knowledge from real exam-log data of patients. In particular, efforts have been devoted to the extraction of medical pathways, which can be exploited to analyze the actual treatments followed by patients. The derived knowledge not only provides useful information to deal with the treatment procedures but may also play an important role in future predictions of potential patient risks associated with medical treatments. The research effort in textual data analysis is twofold. On the one hand, a novel approach to discovery of succinct summaries of large document collections has been proposed. On the other hand, the suitability of an established descriptive data mining to support domain experts in making decisions has been investigated. Both research activities are focused on adopting widely exploratory data mining techniques to textual data analysis, which require overcoming intrinsic limitations for traditional algorithms for handling textual documents efficiently and effectively
    corecore