Search CORE

30 research outputs found

Recommended from our members

Detecting Anomalously Similar Entities in Unlabeled Data

Author: Friedland Lisa D.
Publication venue: ScholarWorks@UMass Amherst
Publication date: 14/11/2016
Field of study

In this work, the goal is to detect closely-linked entities within a data set. The entities of interest have a tie causing them to be similar, such as a shared origin or a channel of influence. Given a collection of people or other entities with their attributes or behavior, we identify unusually similar pairs, and we pose the question: Are these two people linked, or can their similarity be explained by chance? Computing similarities is a core operation in many domains, but two constraints differentiate our version of the problem. First, the score assigned to a pair should account for the probability of a coincidental match. Second, no training data is provided; we must learn about the system from the unlabeled data and make reasonable assumptions about the linked pairs. This problem has applications to social network analysis, where it can be valuable to identify implicit relationships among people from indicators of coordinated activity. It also arises in situations where we must decide whether two similar observations correspond to two different entities or to the same entity observed twice. This dissertation explores how to assess such ties and, in particular, how the similarity scores should depend on not only the two entities in question but also properties of the entire data set. We develop scoring functions that incorporate both the similarity and rarity of a pair. Then, using these functions, we investigate the statistical power of a data set to reveal (or conceal) such pairs. In the dissertation, we develop generative models of linked pairs and independent entities and use them to derive scoring functions for pairs in three different domains: people with job histories, Gaussian-distributed points in Euclidean space, and people (or entities) in a bipartite affiliation graph. For the first, we present a case study in fraud detection that highlights the potential, as well as the complexities, of using these methods to address real-world problems. In the latter two domains, we develop an inference framework to estimate whether two entities were more likely generated independently or as a pair. In these settings, we analyze how the scoring function works in terms of similarity and rarity; how well it can detect pairs as a function of the data set; and how it differs from existing similarity functions when applied to real data

ScholarWorks@UMass Amherst

Intelligent Information Access to Linked Data - Weaving the Cultural Heritage Web

Author: Kummer Robert
Publication venue
Publication date: 01/01/2013
Field of study

The subject of the dissertation is an information alignment experiment of two cultural heritage information systems (ALAP): The Perseus Digital Library and Arachne. In modern societies, information integration is gaining importance for many tasks such as business decision making or even catastrophe management. It is beyond doubt that the information available in digital form can offer users new ways of interaction. Also, in the humanities and cultural heritage communities, more and more information is being published online. But in many situations the way that information has been made publicly available is disruptive to the research process due to its heterogeneity and distribution. Therefore integrated information will be a key factor to pursue successful research, and the need for information alignment is widely recognized. ALAP is an attempt to integrate information from Perseus and Arachne, not only on a schema level, but to also perform entity resolution. To that end, technical peculiarities and philosophical implications of the concepts of identity and co-reference are discussed. Multiple approaches to information integration and entity resolution are discussed and evaluated. The methodology that is used to implement ALAP is mainly rooted in the fields of information retrieval and knowledge discovery. First, an exploratory analysis was performed on both information systems to get a first impression of the data. After that, (semi-)structured information from both systems was extracted and normalized. Then, a clustering algorithm was used to reduce the number of needed entity comparisons. Finally, a thorough matching was performed on the different clusters. ALAP helped with identifying challenges and highlighted the opportunities that arise during the attempt to align cultural heritage information systems

Kölner UniversitätsPublikationsServer

Concepts and Methods from Artificial Intelligence in Modern Information Systems – Contributions to Data-driven Decision-making and Business Processes

Author: Schiller Alexander
Publication venue
Publication date: 15/01/2020
Field of study

Today, organizations are facing a variety of challenging, technology-driven developments, three of the most notable ones being the surge in uncertain data, the emergence of unstructured data and a complex, dynamically changing environment. These developments require organizations to transform in order to stay competitive. Artificial Intelligence with its fields decision-making under uncertainty, natural language processing and planning offers valuable concepts and methods to address the developments. The dissertation at hand utilizes and furthers these contributions in three focal points to address research gaps in existing literature and to provide concrete concepts and methods for the support of organizations in the transformation and improvement of data-driven decision-making, business processes and business process management. In particular, the focal points are the assessment of data quality, the analysis of textual data and the automated planning of process models. In regard to data quality assessment, probability-based approaches for measuring consistency and identifying duplicates as well as requirements for data quality metrics are suggested. With respect to analysis of textual data, the dissertation proposes a topic modeling procedure to gain knowledge from CVs as well as a model based on sentiment analysis to explain ratings from customer reviews. Regarding automated planning of process models, concepts and algorithms for an automated construction of parallelizations in process models, an automated adaptation of process models and an automated construction of multi-actor process models are provided

University of Regensburg Publication Server

Business intelligence on Scalable architectures

Author: Sidló Csaba István
Publication venue
Publication date: 01/01/2011
Field of study

ELTE Digital Institutional Repository (EDIT)

Enrichment of Wind Turbine Health History for Condition-Based Maintenance

Author: COX ROGER
Publication venue
Publication date: 01/01/2022
Field of study

This research develops a methodology for and shows the benefit of linking records of wind turbine maintenance. It analyses commercially sensitive real-world maintenance records with the aim of improving the productivity of offshore wind farms. The novel achievements of this research are that it applies multi-feature record linkage techniques to maintenance data, that it applies statistical techniques for the interval estimation of a binomial proportion to record linkage techniques and that it estimates the distribution of the coverage error of statistical techniques for the interval estimation of a binomial proportion. The main contribution of this research is a process for the enrichment of offshore wind turbine health history. The economic productivity of a wind farm depends on the price of electricity and on the suitability of the weather, both of which are beyond the control of a maintenance team, but also on the cost of operating the wind farm, on the cost of maintaining the wind turbines and on how much of the wind farm’s potential production of electricity is lost to outages. Improvements in maintenance scheduling, in condition-based maintenance, in troubleshooting and in the measurement of maintenance effectiveness all require knowledge of the health history of the plant. To this end, this thesis presents new techniques for linking together existing records of offshore wind turbine health history. Multi-feature record linkage techniques are used to link records of maintenance data together. Both the quality of record linkage and the uncertainty of that quality are assessed. The quality of record linkage was measured by comparing the generated set of linked records to a gold standard set of linked records identified in collaboration with offshore wind turbine maintenance experts. The process for the enrichment of offshore wind turbine health history developed in this research requires a vector of weights and thresholds. The agreement and disagreement weights for each feature indicate the importance of the feature to the quality of record linkage. This research uses differential evolution to globally optimise this vector of weights and thresholds. There is inevitably some uncertainty associated with the measurement of the quality of record linkage, and consequently with the optimum values for the weights and thresholds; this research not only measures the quality of record linkage but also identifies robust techniques for the estimation of its uncertainty.

Durham e-Theses

Data-stream driven Fuzzy-granular approaches for system maintenance

Author: Decker de Sousa Leticia <1981>
Publication venue: Alma Mater Studiorum - Università di Bologna
Publication date: 16/06/2022
Field of study

Intelligent systems are currently inherent to the society, supporting a synergistic human-machine collaboration. Beyond economical and climate factors, energy consumption is strongly affected by the performance of computing systems. The quality of software functioning may invalidate any improvement attempt. In addition, data-driven machine learning algorithms are the basis for human-centered applications, being their interpretability one of the most important features of computational systems. Software maintenance is a critical discipline to support automatic and life-long system operation. As most software registers its inner events by means of logs, log analysis is an approach to keep system operation. Logs are characterized as Big data assembled in large-flow streams, being unstructured, heterogeneous, imprecise, and uncertain. This thesis addresses fuzzy and neuro-granular methods to provide maintenance solutions applied to anomaly detection (AD) and log parsing (LP), dealing with data uncertainty, identifying ideal time periods for detailed software analyses. LP provides deeper semantics interpretation of the anomalous occurrences. The solutions evolve over time and are general-purpose, being highly applicable, scalable, and maintainable. Granular classification models, namely, Fuzzy set-Based evolving Model (FBeM), evolving Granular Neural Network (eGNN), and evolving Gaussian Fuzzy Classifier (eGFC), are compared considering the AD problem. The evolving Log Parsing (eLP) method is proposed to approach the automatic parsing applied to system logs. All the methods perform recursive mechanisms to create, update, merge, and delete information granules according with the data behavior. For the first time in the evolving intelligent systems literature, the proposed method, eLP, is able to process streams of words and sentences. Essentially, regarding to AD accuracy, FBeM achieved (85.64+-3.69)%; eGNN reached (96.17+-0.78)%; eGFC obtained (92.48+-1.21)%; and eLP reached (96.05+-1.04)%. Besides being competitive, eLP particularly generates a log grammar, and presents a higher level of model interpretability

AMS Tesi di Dottorato

온라인 게임에서 유저의 행태에 관한 연구

Author: 안대환
Publication venue: 서울대학교 대학원
Publication date: 01/02/2018
Field of study

학위논문 (박사)-- 서울대학교 대학원 : 경영대학 경영학과, 2018. 2. 유병준.This dissertation consists of two essays on user behavior in online games. In the first essay, I identified multi-botting cheaters and measured their impacts using basic information in database such as user ID, playtime and item purchase record. I addressed the data availability issue and proposed a method for companies with limited data and resources. I also avoided large-scale transaction processing or complex development, which are fairly common in existing cheating detection methods. With respect to identifying cheaters, we used algorithms named DTW (Dynamic Time Warping) and JWD (Jaro–Winkler distance). I also measured the effects of using hacking tool by employing DID (Difference in Differences). My analysis results show some counter-intuitive results. Overall, cheaters constitute a minute part of users in terms of numbers – only about 0.25%. However, they hold approximately 12% of revenue. Furthermore, the usage of hacking tools causes a 102% and 79% increase in playtime and purchase respectively right after users start to use hacking tools. According to additional analysis, it could be shown that the positive effects of hacking tools are not just short-term. My granger causality test also reveals that cheating users activity does not affect other users' purchases or playtime trend. In the second essay, I propose a methodology to deal with churn prediction that meets two major purposes in the mobile casual game context. First, reducing the cost of data preparation, which is growing its importance in the big-data environment. Second, coming up with an algorithm that shows favorable performance comparable to that of the state-of-the-art. As a result, we succeed in greatly lowering the cost of the data preparation process by employing the sequence structure of the log data as it is. In addition, our sequence classification model based on CNN-LSTM shows superior results compared to the models of previous studies.Essay 1. Is Cheating Always Bad? A study of cheating identification and measurement of the effect 1 1. Introduction 2 2. Literature Review 8 3. Data 16 4. Hypotheses 17 5. Methodology 20 5.1 Cheating Identification 20 5.2 Measurement of Cheating Tool Usage Effect 28 6. Result 33 6.1 Cheating Identification 33 6.2 Measurement of Cheating Tool Usage Effect 33 7. Additional Analysis 35 7.1 Lifespan of Cheating Users 35 7.2 Granger Causality Test 36 8. Discussion and Conclusion 37 9. References 48 Essay 2. Churn Prediction in Mobile Casual Game: A Deep Sequence Classification Approach 61 1. Introduction 62 2. Definition of Churn 64 3. Related Works 65 4. Data 66 5. Methodology 66 5.1 Data Preparation 66 5.2 Prediction Model 71 6. Result and Discussion 74 7. References 77Docto

SNU Open Repository and Archive

Health of an aging America : issues on data for policy analysis

Author
Publication venue
Publication date
Field of study

The papers in this report were background to a study conducted by the Panel on Statistics for an Aging Population, of the Committee on National Statistics, focusing on data needed over the next decade for health policy analysis for an aging America.Includes bibliographies.198

CDC Stacks

Data-driven corrective maintenance:MR root cause analysis from machine logs

Author: Shahrestani A.
Publication venue
Publication date: 31/01/2020
Field of study

Pure OAI Repository

Artificial Intelligence and Cognitive Computing

Author
Publication venue: 'MDPI AG'
Publication date: 11/01/2022
Field of study

Artificial intelligence (AI) is a subject garnering increasing attention in both academia and the industry today. The understanding is that AI-enhanced methods and techniques create a variety of opportunities related to improving basic and advanced business functions, including production processes, logistics, financial management and others. As this collection demonstrates, AI-enhanced tools and methods tend to offer more precise results in the fields of engineering, financial accounting, tourism, air-pollution management and many more. The objective of this collection is to bring these topics together to offer the reader a useful primer on how AI-enhanced tools and applications can be of use in today’s world. In the context of the frequently fearful, skeptical and emotion-laden debates on AI and its value added, this volume promotes a positive perspective on AI and its impact on society. AI is a part of a broader ecosystem of sophisticated tools, techniques and technologies, and therefore, it is not immune to developments in that ecosystem. It is thus imperative that inter- and multidisciplinary research on AI and its ecosystem is encouraged. This collection contributes to that

Directory of Open Access Books (DOAB)