Search CORE

75 research outputs found

Applications of Bayesian Recommender Systems in Online Environments

Author: Mckenzie Jack
Publication venue
Publication date: 31/12/2021
Field of study

The University of Manchester - Institutional Repository

Probabilistic Human-Robot Information Fusion

Author: Kaupp Tobias
Publication venue: Faculty of Engineering and Information Technologies, School of Aerospace, Mechanical and Mechatronic Engineering
Publication date: 01/01/2008
Field of study

This thesis is concerned with combining the perceptual abilities of mobile robots and human operators to execute tasks cooperatively. It is generally agreed that a synergy of human and robotic skills offers an opportunity to enhance the capabilities of today’s robotic systems, while also increasing their robustness and reliability. Systems which incorporate both human and robotic information sources have the potential to build complex world models, essential for both automated and human decision making. In this work, humans and robots are regarded as equal team members who interact and communicate on a peer-to-peer basis. Human-robot communication is addressed using probabilistic representations common in robotics. While communication can in general be bidirectional, this work focuses primarily on human-to-robot information flow. More specifically, the approach advocated in this thesis is to let robots fuse their sensor observations with observations obtained from human operators. While robotic perception is well-suited for lower level world descriptions such as geometric properties, humans are able to contribute perceptual information on higher abstraction levels. Human input is translated into the machine representation via Human Sensor Models. A common mathematical framework for humans and robots reinforces the notion of true peer-to-peer interaction. Human-robot information fusion is demonstrated in two application domains: (1) scalable information gathering, and (2) cooperative decision making. Scalable information gathering is experimentally demonstrated on a system comprised of a ground vehicle, an unmanned air vehicle, and two human operators in a natural environment. Information from humans and robots was fused in a fully decentralised manner to build a shared environment representation on multiple abstraction levels. Results are presented in the form of information exchange patterns, qualitatively demonstrating the benefits of human-robot information fusion. The second application domain adds decision making to the human-robot task. Rational decisions are made based on the robots’ current beliefs which are generated by fusing human and robotic observations. Since humans are considered a valuable resource in this context, operators are only queried for input when the expected benefit of an observation exceeds the cost of obtaining it. The system can be seen as adjusting its autonomy at run-time based on the uncertainty in the robots’ beliefs. A navigation task is used to demonstrate the adjustable autonomy system experimentally. Results from two experiments are reported: a quantitative evaluation of human-robot team effectiveness, and a user study to compare the system to classical teleoperation. Results show the superiority of the system with respect to performance, operator workload, and usability

Sydney eScholarship

Information overload in structured data

Author: Yanardag Delul Pinar
Publication venue: 'Purdue University (bepress)'
Publication date: 01/01/2016
Field of study

Information overload refers to the difficulty of making decisions caused by too much information. In this dissertation, we address information overload problem in two separate structured domains, namely, graphs and text. Graph kernels have been proposed as an efficient and theoretically sound approach to compute graph similarity. They decompose graphs into certain sub-structures, such as subtrees, or subgraphs. However, existing graph kernels suffer from a few drawbacks. First, the dimension of the feature space associated with the kernel often grows exponentially as the complexity of sub-structures increase. One immediate consequence of this behavior is that small, non-informative, sub-structures occur more frequently and cause information overload. Second, as the number of features increase, we encounter sparsity: only a few informative sub-structures will co-occur in multiple graphs. In the first part of this dissertation, we propose to tackle the above problems by exploiting the dependency relationship among sub-structures. First, we propose a novel framework that learns the latent representations of sub-structures by leveraging recent advancements in deep learning. Second, we propose a general smoothing framework that takes structural similarity into account, inspired by state-of-the-art smoothing techniques used in natural language processing. Both the proposed frameworks are applicable to popular graph kernel families, and achieve significant performance improvements over state-of-the-art graph kernels. In the second part of this dissertation, we tackle information overload in text. We first focus on a popular social news aggregation website, Reddit, and design a submodular recommender system that tailors a personalized frontpage for individual users. Second, we propose a novel submodular framework to summarize videos, where both transcript and comments are available. Third, we demonstrate how to apply filtering techniques to select a small subset of informative features from virtual machine logs in order to predict resource usage

Purdue E-Pubs

The Democratization of News - Analysis and Behavior Modeling of Users in the Context of Online News Consumption

Author: Reubold Jan Ludwig
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 09/01/2023
Field of study

Die Erfindung des Internets ebnete den Weg für die Demokratisierung von Information. Die Tatsache, dass Nachrichten für die breite Öffentlichkeit zugänglicher wurden, barg wichtige politische Versprechen, wie zum Beispiel das Erreichen von zuvor uninformierten und daher oft inaktiven Bürgern. Diese konnten sich nun dank des Internets tagesaktuell über das politische Geschehen informieren und selbst politisch engagieren. Während viele Politiker und Journalisten ein Jahrzehnt lang mit dieser Entwicklung zufrieden waren, änderte sich die Situation mit dem Aufkommen der sozialen Online-Netzwerke (OSN). Diese OSNs sind heute nahezu allgegenwärtig – so beziehen inzwischen

67\%

der Amerikaner zumindest einen Teil ihrer Nachrichten über die sozialen Medien. Dieser Trend hat die Kosten für die Veröffentlichung von Inhalten weiter gesenkt. Dies sah zunächst nach einer positiven Entwicklung aus, stellt inzwischen jedoch ein ernsthaftes Problem für Demokratien dar. Anstatt dass eine schier unendliche Menge an leicht zugänglichen Informationen uns klüger machen, wird die Menge an Inhalten zu einer Belastung. Eine ausgewogene Nachrichtenauswahl muss einer Flut an Beiträgen und Themen weichen, die durch das digitale soziale Umfeld des Nutzers gefiltert werden. Dies fördert die politische Polarisierung und ideologische Segregation. Mehr als die Hälfte der OSN-Nutzer trauen zudem den Nachrichten, die sie lesen, nicht mehr (

54\%

machen sich Sorgen wegen Falschnachrichten). In dieses Bild passt, dass Studien berichten, dass Nutzer von OSNs dem Populismus extrem linker und rechter politischer Akteure stärker ausgesetzt sind, als Personen ohne Zugang zu sozialen Medien. Um die negativen Effekt dieser Entwicklung abzumildern, trägt meine Arbeit zum einen zum Verständnis des Problems bei und befasst sich mit Grundlagenforschung im Bereich der Verhaltensmodellierung. Abschließend beschäftigen wir uns mit der Gefahr der Beeinflussung der Internetnutzer durch soziale Bots und präsentieren eine auf Verhaltensmodellierung basierende Lösung. Zum besseren Verständnis des Nachrichtenkonsums deutschsprachiger Nutzer in OSNs, haben wir deren Verhalten auf Twitter analysiert und die Reaktionen auf kontroverse - teils verfassungsfeindliche - und nicht kontroverse Inhalte verglichen. Zusätzlich untersuchten wir die Existenz von Echokammern und ähnlichen Phänomenen. Hinsichtlich des Nutzerverhaltens haben wir uns auf Netzwerke konzentriert, die ein komplexeres Nutzerverhalten zulassen. Wir entwickelten probabilistische Verhaltensmodellierungslösungen für das Clustering und die Segmentierung von Zeitserien. Neben den Beiträgen zum Verständnis des Problems haben wir Lösungen zur Erkennung automatisierter Konten entwickelt. Diese Bots nehmen eine wichtige Rolle in der frühen Phase der Verbreitung von Fake News ein. Unser Expertenmodell - basierend auf aktuellen Deep-Learning-Lösungen - identifiziert, z. B., automatisierte Accounts anhand ihres Verhaltens. Meine Arbeit sensibilisiert für diese negative Entwicklung und befasst sich mit der Grundlagenforschung im Bereich der Verhaltensmodellierung. Auch wird auf die Gefahr der Beeinflussung durch soziale Bots eingegangen und eine auf Verhaltensmodellierung basierende Lösung präsentiert

KITopen

Financial distress and bankruptcy prediction using accounting, market and macroeconomic variables

Author: Hernandez Tinoco Mario
Publication venue: University of Leeds
Publication date: 01/09/2013
Field of study

This thesis investigates the information content of different types of variables in the field of financial distress/default prediction. Specifically, the thesis tests empirically, for the first time, the utility of combining accounting data, market-based variables and macroeconomic indicators to explain corporate credit risk. Models for listed companies in the United Kingdom are developed for the prediction of financial distress and corporate failure. The models used a combination of accounting data, stock market information, proxies for changes in the macroeconomic environment, and industry controls. Furthermore, novel finance-based and technical definitions of firm distress and failure are introduced as outcome variables. The thesis produced binary and polytomous models with enhanced predictive accuracy, practical value, and macro dependent dynamics that have relevance for stress testing. The results unambiguously show the advantages, in terms of predictive accuracy and timeliness, of combining these types of variables. Unlike previous research works that employed discrete choice, non-linear regression methodologies, this thesis provided new evidence on the effects of the different types of variables on the probability of falling into each of the individual outcomes (e.g., financial distress, corporate failure). The analysis of graphic representations of changes in predicted probabilities, a primer in the field of risk modelling, offered new insights with regard to the behaviour of the vectors of predicted probabilities following a given change in the magnitude of a specific covariate. Additionally, and in line with the main area of study, the thesis provides historical evidence on the types of variables and the information sharing mechanisms employed by American and British investors and financial institutions to assess the riskiness of individuals, businesses and fixed-income instruments before the emergence of modern institutions such as the credit rating agencies and prior to the development of complex statistical models, filling thus a crucial gap in the credit risk literature

White Rose E-theses Online

ALGORITHMS FOR CONSTRAINT-BASED LEARNING OF BAYESIAN NETWORK STRUCTURES WITH LARGE NUMBERS OF VARIABLES

Author: Martijn De Jongh
Martijn De Jongh
Publication venue
Publication date: 01/01/2014
Field of study

Bayesian networks (BNs) are highly practical and successful tools for modeling probabilistic knowledge. They can be constructed by an expert, learned from data, or by a combination of the two. A popular approach to learning the structure of a BN is the constraint-based search (CBS) approach, with the PC algorithm being a prominent example. In recent years, we have been experiencing a data deluge. We have access to more data, big and small, than ever before. The exponential nature of BN algorithms, however, hinders large-scale analysis. Developments in parallel and distributed computing have made the computational power required for large-scale data processing widely available, yielding opportunities for developing parallel and distributed algorithms for BN learning and inference. In this dissertation, (1) I propose two MapReduce versions of the PC algorithm, aimed at solving an increasingly common case: data is not necessarily massive in the number of records, but more and more so in the number of variables. (2) When the number of data records is small, the PC algorithm experiences problems in independence testing. Empirically, I explore a contradiction in the literature on how to resolve the case of having insufficient data when testing the independence of two variables: declare independence or dependence. (3) When BNs learned from data become complex in terms of graph density, they may require more parameters than we can feasibly store. I propose and evaluate five approaches to pruning a BN structure to guarantee that it will be tractable for storage and inference. I follow this up by proposing three approaches to improving the classification accuracy of a BN by modifying its structure

CiteSeerX

D-Scholarship@Pitt

Recommended from our members

Accelerating Iterative Computations for Large-Scale Data Processing

Author: Yin Jiangtao
Publication venue: ScholarWorks@UMass Amherst
Publication date: 10/11/2016
Field of study

Recent advances in sensing, storage, and networking technologies are creating massive amounts of data at an unprecedented scale and pace. Large-scale data processing is commonly leveraged to make sense of these data, which will enable companies, governments, and organizations, to make better decisions and bring convenience to our daily life. However, the massive amount of data involved makes it challenging to perform data processing in a timely manner. On the one hand, huge volumes of data might not even fit into the disk of a single machine. On the other hand, data mining and machine learning algorithms, which are usually involved in large-scale data processing, typically require time-consuming iterative computations. Therefore, it is imperative to efficiently perform iterative computations on large computer clusters or cloud using highly-parallel and shared-nothing distributed systems. This research aims to explore new forms of iterative computations that reduce unnecessary computations so as to accelerate large-scale data processing in a distributed environment. We propose the iterative computation transformation for well-known data mining and machine learning algorithms, such as expectation-maximization, nonnegative matrix factorization, belief propagation, and graph algorithms (e.g., PageRank). These algorithms have been used in a wide range of application domains. First, we show how to accelerate expectation-maximization algorithms with frequent updates in a distributed environment. Then, we illustrate the way of efficiently scaling distributed nonnegative matrix factorization with block-wise updates. Next, our approach of scaling distributed belief propagation with prioritized block updates is presented. Last, we illustrate how to efficiently perform distributed incremental computation on evolving graphs. We will elaborate how to implement these transformed iterative computations on existing distributed programming models such as the MapReduce-based model, as well as develop new scalable and efficient distributed programming models and frameworks when necessary. The goal of these supporting distributed frameworks is to lift the burden of the programmers in specifying transformation of iterative computations and communication mechanisms, and automatically optimize the execution of the computation. Our techniques are evaluated extensively to demonstrate their efficiency. While the techniques we propose are in the context of specific algorithms, they address the challenges commonly faced in many other algorithms

ScholarWorks@UMass Amherst

A system for large-scale image and video retrieval on everyday scenes

Author: Zachariah Arun George
Publication venue: 'University of Missouri Libraries'
Publication date
Field of study

There has been a growing amount of multimedia data generated on the web todayin terms of size and diversity. This has made accurate content retrieval with these large and complex collections of data a challenging problem. Motivated by the need for systems that can enable scalable and efficient search, we propose QIK (Querying Images Using Contextual Knowledge). QIK leverages advances in deep learning (DL) and natural language processing (NLP) for scene understanding to enable large-scale multimedia retrieval on everyday scenes with common objects. The system consists of three major components: Indexer, Query Processor, and Video Processor. Given an image, the Indexer performs probabilistic image understanding (PIU). The PIU generated consists of the most probable captions, parsed and represented by tree structures using NLP techniques, and detected objects. The PIU's are stored and indexed in a database system. For a query image, the Query Processor generates the most probable caption and parses it into the corresponding tree structure. Then an optimized tree-pattern query is constructed and executed on the database to retrieve a set of candidate images. The candidate images fetched are ranked using the tree-edit distance metric computed on the tree structures. Given a video, the Video Processor extracts a sequence of key scenes that are posed to the Query Processor to retrieve a set of candidate scenes. The candidate scene parse trees corresponding to a video are extracted and are ranked based on the number of matching scenes. We evaluated the performance of our system for large-scale image and video retrieval tasks on datasets containing everyday scenes and observed that our system could outperform state-ofthe- art techniques in terms of mean average precision.Includes bibliographical references

University of Missouri: MOspace

Big data-driven multimodal traffic management : trends and challenges

Author: Gautama Sidharta
Semanjski Ivana
Publication venue: IARIA, The International Academy, Research and Industry Association
Publication date: 01/01/2018
Field of study

Ghent University Academic Bibliography

Business Analytics for Non-profit Marketing and Online Advertising

Author: Chang Wei
Publication venue
Publication date: 02/07/2013
Field of study

Business analytics is facing formidable challenges in the Internet era. Data collected from business website often contain hundreds of millions of records; the goal of analysis frequently involves predicting rare events; and substantial noise in the form of errors or unstructured text cannot be interpreted automatically. It is thus necessary to identify pertinent techniques or new method to tackle these difficulties. Learning–to-rank, an emerging approach in information retrieval research has attracted our attention for its superiority in handling noisy data with rare events. In this dissertation, we introduce this technique to the marketing science community, apply it to predict customers’ responses to donation solicitations by the American Red Cross, and show that it outperforms traditional regression methods. We adapt the original learning-to-rank algorithm to better serve the needs of business applications relevant to such solicitations. The proposed algorithm is effective and efficient is predicting potential donors. Namely, through the adapted learning-to-rank algorithm, we are able to identify the most important 20% of potential donors, who would provide 80% of the actual donations. The latter half of the dissertation is dedicated to the application of business analytics to online advertising. The goal is to model visitors’ click-through probability on advertising video clips at a hedonic video website. We build a hierarchical linear model with latent variables and show its superiority in comparison to two other benchmark models. This research helps online business managers derive insights into the site visitors’ characteristics that affect their click-through propensity, and recommends managerial actions to increase advertising effectiveness

D-Scholarship@Pitt