14 research outputs found

    Fair and Representative Subset Selection from Data Streams

    Get PDF
    We study the problem of extracting a small subset of representative items from a large data stream. In many data mining and machine learning applications such as social network analysis and recommender systems, this problem can be formulated as maximizing a monotone submodular function subject to a cardinality constraint k. In this work, we consider the setting where data items in the stream belong to one of several disjoint groups and investigate the optimization problem with an additional fairness constraint that limits selection to a given number of items from each group. We then propose efficient algorithms for the fairness-aware variant of the streaming submodular maximization problem. In particular, we first give a (1/2-ε)-approximation algorithm that requires O((1/ε) log(k/ε)) passes over the stream for any constant ε>0. Moreover, we give a single-pass streaming algorithm that has the same approximation ratio of (1/2-ε) when unlimited buffer sizes and post-processing time are permitted, and discuss how to adapt it to more practical settings where the buffer sizes are bounded. Finally, we demonstrate the efficiency and effectiveness of our proposed algorithms on two real-world applications, namely maximum coverage on large graphs and personalized recommendation.Peer reviewe

    LIPIcs, Volume 274, ESA 2023, Complete Volume

    Get PDF
    LIPIcs, Volume 274, ESA 2023, Complete Volum

    LIPIcs, Volume 244, ESA 2022, Complete Volume

    Get PDF
    LIPIcs, Volume 244, ESA 2022, Complete Volum

    Probabilistic Modeling of Rumour Stance and Popularity in Social Media

    Get PDF
    Social media tends to be rife with rumours when new reports are released piecemeal during breaking news events. One can mine multiple reactions expressed by social media users in those situations, exploring users’ stance towards rumours, ultimately enabling the flagging of highly disputed rumours as being potentially false. Moreover, rumours in social media exhibit complex temporal patterns. Some rumours are discussed with an increasing number of tweets per unit of time whereas other rumours fail to gain ground. This thesis develops probabilistic models of rumours in social media driven by two applications: rumour stance classification and modeling temporal dynamics of rumours. Rumour stance classification is the task of classifying the stance expressed in an individual tweet towards a rumour. Modeling temporal dynamics of rumours is an application where rumour prevalence is modeled over time. Both applications provide insights into how a rumour attracts attention from the social media community. These can assist journalists with their work on rumour tracking and debunking, and can be used in downstream applications such as systems for rumour veracity classification. In this thesis, we develop models based on probabilistic approaches. We motivate Gaussian processes and point processes as appropriate tools and show how features not considered in previous work can be included. We show that for both applications, transfer learning approaches are successful, supporting the hypothesis that there is a common underlying signal across different rumours. We furthermore introduce novel machine learning techniques which have the potential to be used in other applications: convolution kernels for streams of text over continuous time and a sequence classification algorithm based on point processes

    Modeling Events and Interactions through Temporal Processes -- A Survey

    Full text link
    In real-world scenario, many phenomena produce a collection of events that occur in continuous time. Point Processes provide a natural mathematical framework for modeling these sequences of events. In this survey, we investigate probabilistic models for modeling event sequences through temporal processes. We revise the notion of event modeling and provide the mathematical foundations that characterize the literature on the topic. We define an ontology to categorize the existing approaches in terms of three families: simple, marked, and spatio-temporal point processes. For each family, we systematically review the existing approaches based based on deep learning. Finally, we analyze the scenarios where the proposed techniques can be used for addressing prediction and modeling aspects.Comment: Image replacement

    LIPIcs, Volume 261, ICALP 2023, Complete Volume

    Get PDF
    LIPIcs, Volume 261, ICALP 2023, Complete Volum

    Uncertainty in Artificial Intelligence: Proceedings of the Thirty-Fourth Conference

    Get PDF

    Automatic machine learning:methods, systems, challenges

    Get PDF

    Automatic machine learning:methods, systems, challenges

    Get PDF
    This open access book presents the first comprehensive overview of general methods in Automatic Machine Learning (AutoML), collects descriptions of existing systems based on these methods, and discusses the first international challenge of AutoML systems. The book serves as a point of entry into this quickly-developing field for researchers and advanced students alike, as well as providing a reference for practitioners aiming to use AutoML in their work. The recent success of commercial ML applications and the rapid growth of the field has created a high demand for off-the-shelf ML methods that can be used easily and without expert knowledge. Many of the recent machine learning successes crucially rely on human experts, who select appropriate ML architectures (deep learning architectures or more traditional ML workflows) and their hyperparameters; however the field of AutoML targets a progressive automation of machine learning, based on principles from optimization and machine learning itself

    Untangling hotel industry’s inefficiency: An SFA approach applied to a renowned Portuguese hotel chain

    Get PDF
    The present paper explores the technical efficiency of four hotels from Teixeira Duarte Group - a renowned Portuguese hotel chain. An efficiency ranking is established from these four hotel units located in Portugal using Stochastic Frontier Analysis. This methodology allows to discriminate between measurement error and systematic inefficiencies in the estimation process enabling to investigate the main inefficiency causes. Several suggestions concerning efficiency improvement are undertaken for each hotel studied.info:eu-repo/semantics/publishedVersio
    corecore