40 research outputs found

    Privacy-Preserving Data Collection and Sharing in Modern Mobile Internet Systems

    Get PDF
    With the ubiquity and widespread use of mobile devices such as laptops, smartphones, smartwatches, and IoT devices, large volumes of user data are generated and recorded. While there is great value in collecting, analyzing and sharing this data for improving products and services, data privacy poses a major concern. This dissertation research addresses the problem of privacy-preserving data collection and sharing in the context of both mobile trajectory data and mobile Internet access data. The first contribution of this dissertation research is the design and development of a system for utility-aware synthesis of differentially private and attack-resilient location traces, called AdaTrace. Given a set of real location traces, AdaTrace executes a four-phase process consisting of feature extraction, synopsis construction, noise injection, and generation of synthetic location traces. Compared to representative prior approaches, the location traces generated by AdaTrace offer up to 3-fold improvement in utility, measured using a variety of utility metrics and datasets, while preserving both differential privacy and attack resilience. The second contribution of this dissertation research is the design and development of locally private protocols for privacy-sensitive collection of mobile and Web user data. Motivated by the excessive utility loss of existing Local Differential Privacy (LDP) protocols under small user populations, this dissertation introduces the notion of Condensed Local Differential Privacy (CLDP) and a suite of protocols satisfying CLDP to enable the collection of various types of user data, ranging from ordinal data types in finite metric spaces (malware infection statistics), to non-ordinal items (OS versions and transaction categories), and to sequences of ordinal or non-ordinal items. Using cybersecurity data and case studies from Symantec, a major cybersecurity vendor, we show that proposed CLDP protocols are practical for key tasks including malware outbreak detection, OS vulnerability analysis, and inspecting suspicious activities on infected machines. The third contribution of this dissertation research is the development of a framework and a prototype system for evaluating privacy-utility tradeoffs of different LDP protocols, called LDPLens. LDPLens introduces metrics to evaluate protocol tradeoffs based on factors such as the utility metric, the data collection scenario, and the user-specified adversary metric. We develop a common Bayesian adversary model to analyze LDP protocols, and we formally and experimentally analyze Adversarial Success Rate (ASR) under each protocol. Motivated by the findings that numerous factors impact the ASR and utility behaviors of LDP protocols, we develop LDPLens to provide effective recommendations for finding the most suitable protocol in a given setting. Our three case studies with real-world datasets demonstrate that using the protocol recommended by LDPLens can offer substantial reduction in utility loss or in ASR, compared to using a randomly chosen protocol.Ph.D

    ICTERI 2020: ІКТ в освіті, дослідженнях та промислових застосуваннях. Інтеграція, гармонізація та передача знань 2020: Матеріали 16-ї Міжнародної конференції. Том II: Семінари. Харків, Україна, 06-10 жовтня 2020 р.

    Get PDF
    This volume represents the proceedings of the Workshops co-located with the 16th International Conference on ICT in Education, Research, and Industrial Applications, held in Kharkiv, Ukraine, in October 2020. It comprises 101 contributed papers that were carefully peer-reviewed and selected from 233 submissions for the five workshops: RMSEBT, TheRMIT, ITER, 3L-Person, CoSinE, MROL. The volume is structured in six parts, each presenting the contributions for a particular workshop. The topical scope of the volume is aligned with the thematic tracks of ICTERI 2020: (I) Advances in ICT Research; (II) Information Systems: Technology and Applications; (III) Academia/Industry ICT Cooperation; and (IV) ICT in Education.Цей збірник представляє матеріали семінарів, які були проведені в рамках 16-ї Міжнародної конференції з ІКТ в освіті, наукових дослідженнях та промислових застосуваннях, що відбулася в Харкові, Україна, у жовтні 2020 року. Він містить 101 доповідь, які були ретельно рецензовані та відібрані з 233 заявок на участь у п'яти воркшопах: RMSEBT, TheRMIT, ITER, 3L-Person, CoSinE, MROL. Збірник складається з шести частин, кожна з яких представляє матеріали для певного семінару. Тематична спрямованість збірника узгоджена з тематичними напрямками ICTERI 2020: (I) Досягнення в галузі досліджень ІКТ; (II) Інформаційні системи: Технології і застосування; (ІІІ) Співпраця в галузі ІКТ між академічними і промисловими колами; і (IV) ІКТ в освіті

    Human Practice. Digital Ecologies. Our Future. : 14. Internationale Tagung Wirtschaftsinformatik (WI 2019) : Tagungsband

    Get PDF
    Erschienen bei: universi - Universitätsverlag Siegen. - ISBN: 978-3-96182-063-4Aus dem Inhalt: Track 1: Produktion & Cyber-Physische Systeme Requirements and a Meta Model for Exchanging Additive Manufacturing Capacities Service Systems, Smart Service Systems and Cyber- Physical Systems—What’s the difference? Towards a Unified Terminology Developing an Industrial IoT Platform – Trade-off between Horizontal and Vertical Approaches Machine Learning und Complex Event Processing: Effiziente Echtzeitauswertung am Beispiel Smart Factory Sensor retrofit for a coffee machine as condition monitoring and predictive maintenance use case Stakeholder-Analyse zum Einsatz IIoT-basierter Frischeinformationen in der Lebensmittelindustrie Towards a Framework for Predictive Maintenance Strategies in Mechanical Engineering - A Method-Oriented Literature Analysis Development of a matching platform for the requirement-oriented selection of cyber physical systems for SMEs Track 2: Logistic Analytics An Empirical Study of Customers’ Behavioral Intention to Use Ridepooling Services – An Extension of the Technology Acceptance Model Modeling Delay Propagation and Transmission in Railway Networks What is the impact of company specific adjustments on the acceptance and diffusion of logistic standards? Robust Route Planning in Intermodal Urban Traffic Track 3: Unternehmensmodellierung & Informationssystemgestaltung (Enterprise Modelling & Information Systems Design) Work System Modeling Method with Different Levels of Specificity and Rigor for Different Stakeholder Purposes Resolving Inconsistencies in Declarative Process Models based on Culpability Measurement Strategic Analysis in the Realm of Enterprise Modeling – On the Example of Blockchain-Based Initiatives for the Electricity Sector Zwischenbetriebliche Integration in der Möbelbranche: Konfigurationen und Einflussfaktoren Novices’ Quality Perceptions and the Acceptance of Process Modeling Grammars Entwicklung einer Definition für Social Business Objects (SBO) zur Modellierung von Unternehmensinformationen Designing a Reference Model for Digital Product Configurators Terminology for Evolving Design Artifacts Business Role-Object Specification: A Language for Behavior-aware Structural Modeling of Business Objects Generating Smart Glasses-based Information Systems with BPMN4SGA: A BPMN Extension for Smart Glasses Applications Using Blockchain in Peer-to-Peer Carsharing to Build Trust in the Sharing Economy Testing in Big Data: An Architecture Pattern for a Development Environment for Innovative, Integrated and Robust Applications Track 4: Lern- und Wissensmanagement (e-Learning and Knowledge Management) eGovernment Competences revisited – A Literature Review on necessary Competences in a Digitalized Public Sector Say Hello to Your New Automated Tutor – A Structured Literature Review on Pedagogical Conversational Agents Teaching the Digital Transformation of Business Processes: Design of a Simulation Game for Information Systems Education Conceptualizing Immersion for Individual Learning in Virtual Reality Designing a Flipped Classroom Course – a Process Model The Influence of Risk-Taking on Knowledge Exchange and Combination Gamified Feedback durch Avatare im Mobile Learning Alexa, Can You Help Me Solve That Problem? - Understanding the Value of Smart Personal Assistants as Tutors for Complex Problem Tasks Track 5: Data Science & Business Analytics Matching with Bundle Preferences: Tradeoff between Fairness and Truthfulness Applied image recognition: guidelines for using deep learning models in practice Yield Prognosis for the Agrarian Management of Vineyards using Deep Learning for Object Counting Reading Between the Lines of Qualitative Data – How to Detect Hidden Structure Based on Codes Online Auctions with Dual-Threshold Algorithms: An Experimental Study and Practical Evaluation Design Features of Non-Financial Reward Programs for Online Reviews: Evaluation based on Google Maps Data Topic Embeddings – A New Approach to Classify Very Short Documents Based on Predefined Topics Leveraging Unstructured Image Data for Product Quality Improvement Decision Support for Real Estate Investors: Improving Real Estate Valuation with 3D City Models and Points of Interest Knowledge Discovery from CVs: A Topic Modeling Procedure Online Product Descriptions – Boost for your Sales? Entscheidungsunterstützung durch historienbasierte Dienstreihenfolgeplanung mit Pattern A Semi-Automated Approach for Generating Online Review Templates Machine Learning goes Measure Management: Leveraging Anomaly Detection and Parts Search to Improve Product-Cost Optimization Bedeutung von Predictive Analytics für den theoretischen Erkenntnisgewinn in der IS-Forschung Track 6: Digitale Transformation und Dienstleistungen Heuristic Theorizing in Software Development: Deriving Design Principles for Smart Glasses-based Systems Mirroring E-service for Brick and Mortar Retail: An Assessment and Survey Taxonomy of Digital Platforms: A Platform Architecture Perspective Value of Star Players in the Digital Age Local Shopping Platforms – Harnessing Locational Advantages for the Digital Transformation of Local Retail Outlets: A Content Analysis A Socio-Technical Approach to Manage Analytics-as-a-Service – Results of an Action Design Research Project Characterizing Approaches to Digital Transformation: Development of a Taxonomy of Digital Units Expectations vs. Reality – Benefits of Smart Services in the Field of Tension between Industry and Science Innovation Networks and Digital Innovation: How Organizations Use Innovation Networks in a Digitized Environment Characterising Social Reading Platforms— A Taxonomy-Based Approach to Structure the Field Less Complex than Expected – What Really Drives IT Consulting Value Modularity Canvas – A Framework for Visualizing Potentials of Service Modularity Towards a Conceptualization of Capabilities for Innovating Business Models in the Industrial Internet of Things A Taxonomy of Barriers to Digital Transformation Ambidexterity in Service Innovation Research: A Systematic Literature Review Design and success factors of an online solution for cross-pillar pension information Track 7: IT-Management und -Strategie A Frugal Support Structure for New Software Implementations in SMEs How to Structure a Company-wide Adoption of Big Data Analytics The Changing Roles of Innovation Actors and Organizational Antecedents in the Digital Age Bewertung des Kundennutzens von Chatbots für den Einsatz im Servicedesk Understanding the Benefits of Agile Software Development in Regulated Environments Are Employees Following the Rules? On the Effectiveness of IT Consumerization Policies Agile and Attached: The Impact of Agile Practices on Agile Team Members’ Affective Organisational Commitment The Complexity Trap – Limits of IT Flexibility for Supporting Organizational Agility in Decentralized Organizations Platform Openness: A Systematic Literature Review and Avenues for Future Research Competence, Fashion and the Case of Blockchain The Digital Platform Otto.de: A Case Study of Growth, Complexity, and Generativity Track 8: eHealth & alternde Gesellschaft Security and Privacy of Personal Health Records in Cloud Computing Environments – An Experimental Exploration of the Impact of Storage Solutions and Data Breaches Patientenintegration durch Pfadsysteme Digitalisierung in der Stressprävention – eine qualitative Interviewstudie zu Nutzenpotenzialen User Dynamics in Mental Health Forums – A Sentiment Analysis Perspective Intent and the Use of Wearables in the Workplace – A Model Development Understanding Patient Pathways in the Context of Integrated Health Care Services - Implications from a Scoping Review Understanding the Habitual Use of Wearable Activity Trackers On the Fit in Fitness Apps: Studying the Interaction of Motivational Affordances and Users’ Goal Orientations in Affecting the Benefits Gained Gamification in Health Behavior Change Support Systems - A Synthesis of Unintended Side Effects Investigating the Influence of Information Incongruity on Trust-Relations within Trilateral Healthcare Settings Track 9: Krisen- und Kontinuitätsmanagement Potentiale von IKT beim Ausfall kritischer Infrastrukturen: Erwartungen, Informationsgewinnung und Mediennutzung der Zivilbevölkerung in Deutschland Fake News Perception in Germany: A Representative Study of People’s Attitudes and Approaches to Counteract Disinformation Analyzing the Potential of Graphical Building Information for Fire Emergency Responses: Findings from a Controlled Experiment Track 10: Human-Computer Interaction Towards a Taxonomy of Platforms for Conversational Agent Design Measuring Service Encounter Satisfaction with Customer Service Chatbots using Sentiment Analysis Self-Tracking and Gamification: Analyzing the Interplay of Motivations, Usage and Motivation Fulfillment Erfolgsfaktoren von Augmented-Reality-Applikationen: Analyse von Nutzerrezensionen mit dem Review-Mining-Verfahren Designing Dynamic Decision Support for Electronic Requirements Negotiations Who is Stressed by Using ICTs? A Qualitative Comparison Analysis with the Big Five Personality Traits to Understand Technostress Walking the Middle Path: How Medium Trade-Off Exposure Leads to Higher Consumer Satisfaction in Recommender Agents Theory-Based Affordances of Utilitarian, Hedonic and Dual-Purposed Technologies: A Literature Review Eliciting Customer Preferences for Shopping Companion Apps: A Service Quality Approach The Role of Early User Participation in Discovering Software – A Case Study from the Context of Smart Glasses The Fluidity of the Self-Concept as a Framework to Explain the Motivation to Play Video Games Heart over Heels? An Empirical Analysis of the Relationship between Emotions and Review Helpfulness for Experience and Credence Goods Track 11: Information Security and Information Privacy Unfolding Concerns about Augmented Reality Technologies: A Qualitative Analysis of User Perceptions To (Psychologically) Own Data is to Protect Data: How Psychological Ownership Determines Protective Behavior in a Work and Private Context Understanding Data Protection Regulations from a Data Management Perspective: A Capability-Based Approach to EU-GDPR On the Difficulties of Incentivizing Online Privacy through Transparency: A Qualitative Survey of the German Health Insurance Market What is Your Selfie Worth? A Field Study on Individuals’ Valuation of Personal Data Justification of Mass Surveillance: A Quantitative Study An Exploratory Study of Risk Perception for Data Disclosure to a Network of Firms Track 12: Umweltinformatik und nachhaltiges Wirtschaften Kommunikationsfäden im Nadelöhr – Fachliche Prozessmodellierung der Nachhaltigkeitskommunikation am Kapitalmarkt Potentiale und Herausforderungen der Materialflusskostenrechnung Computing Incentives for User-Based Relocation in Carsharing Sustainability’s Coming Home: Preliminary Design Principles for the Sustainable Smart District Substitution of hazardous chemical substances using Deep Learning and t-SNE A Hierarchy of DSMLs in Support of Product Life-Cycle Assessment A Survey of Smart Energy Services for Private Households Door-to-Door Mobility Integrators as Keystone Organizations of Smart Ecosystems: Resources and Value Co-Creation – A Literature Review Ein Entscheidungsunterstützungssystem zur ökonomischen Bewertung von Mieterstrom auf Basis der Clusteranalyse Discovering Blockchain for Sustainable Product-Service Systems to enhance the Circular Economy Digitale Rückverfolgbarkeit von Lebensmitteln: Eine verbraucherinformatische Studie Umweltbewusstsein durch audiovisuelles Content Marketing? Eine experimentelle Untersuchung zur Konsumentenbewertung nachhaltiger Smartphones Towards Predictive Energy Management in Information Systems: A Research Proposal A Web Browser-Based Application for Processing and Analyzing Material Flow Models using the MFCA Methodology Track 13: Digital Work - Social, mobile, smart On Conversational Agents in Information Systems Research: Analyzing the Past to Guide Future Work The Potential of Augmented Reality for Improving Occupational First Aid Prevent a Vicious Circle! The Role of Organizational IT-Capability in Attracting IT-affine Applicants Good, Bad, or Both? Conceptualization and Measurement of Ambivalent User Attitudes Towards AI A Case Study on Cross-Hierarchical Communication in Digital Work Environments ‘Show Me Your People Skills’ - Employing CEO Branding for Corporate Reputation Management in Social Media A Multiorganisational Study of the Drivers and Barriers of Enterprise Collaboration Systems-Enabled Change The More the Merrier? The Effect of Size of Core Team Subgroups on Success of Open Source Projects The Impact of Anthropomorphic and Functional Chatbot Design Features in Enterprise Collaboration Systems on User Acceptance Digital Feedback for Digital Work? Affordances and Constraints of a Feedback App at InsurCorp The Effect of Marker-less Augmented Reality on Task and Learning Performance Antecedents for Cyberloafing – A Literature Review Internal Crowd Work as a Source of Empowerment - An Empirical Analysis of the Perception of Employees in a Crowdtesting Project Track 14: Geschäftsmodelle und digitales Unternehmertum Dividing the ICO Jungle: Extracting and Evaluating Design Archetypes Capturing Value from Data: Exploring Factors Influencing Revenue Model Design for Data-Driven Services Understanding the Role of Data for Innovating Business Models: A System Dynamics Perspective Business Model Innovation and Stakeholder: Exploring Mechanisms and Outcomes of Value Creation and Destruction Business Models for Internet of Things Platforms: Empirical Development of a Taxonomy and Archetypes Revitalizing established Industrial Companies: State of the Art and Success Principles of Digital Corporate Incubators When 1+1 is Greater than 2: Concurrence of Additional Digital and Established Business Models within Companies Special Track 1: Student Track Investigating Personalized Price Discrimination of Textile-, Electronics- and General Stores in German Online Retail From Facets to a Universal Definition – An Analysis of IoT Usage in Retail Is the Technostress Creators Inventory Still an Up-To-Date Measurement Instrument? Results of a Large-Scale Interview Study Application of Media Synchronicity Theory to Creative Tasks in Virtual Teams Using the Example of Design Thinking TrustyTweet: An Indicator-based Browser-Plugin to Assist Users in Dealing with Fake News on Twitter Application of Process Mining Techniques to Support Maintenance-Related Objectives How Voice Can Change Customer Satisfaction: A Comparative Analysis between E-Commerce and Voice Commerce Business Process Compliance and Blockchain: How Does the Ethereum Blockchain Address Challenges of Business Process Compliance? Improving Business Model Configuration through a Question-based Approach The Influence of Situational Factors and Gamification on Intrinsic Motivation and Learning Evaluation von ITSM-Tools für Integration und Management von Cloud-Diensten am Beispiel von ServiceNow How Software Promotes the Integration of Sustainability in Business Process Management Criteria Catalog for Industrial IoT Platforms from the Perspective of the Machine Tool Industry Special Track 3: Demos & Prototyping Privacy-friendly User Location Tracking with Smart Devices: The BeaT Prototype Application-oriented robotics in nursing homes Augmented Reality for Set-up Processe Mixed Reality for supporting Remote-Meetings Gamification zur Motivationssteigerung von Werkern bei der Betriebsdatenerfassung Automatically Extracting and Analyzing Customer Needs from Twitter: A “Needmining” Prototype GaNEsHA: Opportunities for Sustainable Transportation in Smart Cities TUCANA: A platform for using local processing power of edge devices for building data-driven services Demonstrator zur Beschreibung und Visualisierung einer kritischen Infrastruktur Entwicklung einer alltagsnahen persuasiven App zur Bewegungsmotivation für ältere Nutzerinnen und Nutzer A browser-based modeling tool for studying the learning of conceptual modeling based on a multi-modal data collection approach Exergames & Dementia: An interactive System for People with Dementia and their Care-Network Workshops Workshop Ethics and Morality in Business Informatics (Workshop Ethik und Moral in der Wirtschaftsinformatik – EMoWI’19) Model-Based Compliance in Information Systems - Foundations, Case Description and Data Set of the MobIS-Challenge for Students and Doctoral Candidates Report of the Workshop on Concepts and Methods of Identifying Digital Potentials in Information Management Control of Systemic Risks in Global Networks - A Grand Challenge to Information Systems Research Die Mitarbeiter von morgen - Kompetenzen künftiger Mitarbeiter im Bereich Business Analytics Digitaler Konsum: Herausforderungen und Chancen der Verbraucherinformati

    Database Streaming Compression on Memory-Limited Machines

    Get PDF
    Dynamic Huffman compression algorithms operate on data-streams with a bounded symbol list. With these algorithms, the complete list of symbols must be contained in main memory or secondary storage. A horizontal format transaction database that is streaming can have a very large item list. Many nodes tax both the processing hardware primary memory size, and the processing time to dynamically maintain the tree. This research investigated Huffman compression of a transaction-streaming database with a very large symbol list, where each item in the transaction database schema’s item list is a symbol to compress. The constraint of a large symbol list is, in this research, equivalent to the constraint of a memory-limited machine. A large symbol set will result if each item in a large database item list is a symbol to compress in a database stream. In addition, database streams may have some temporal component spanning months or years. Finally, the horizontal format is the format most suited to a streaming transaction database because the transaction IDs are not known beforehand This research prototypes an algorithm that will compresses a transaction database stream. There are several advantages to the memory limited dynamic Huffman algorithm. Dynamic Huffman algorithms are single pass algorithms. In many instances a second pass over the data is not possible, such as with streaming databases. Previous dynamic Huffman algorithms are not memory limited, they are asymptotic to O(n), where n is the number of distinct item IDs. Memory is required to grow to fit the n items. The improvement of the new memory limited Dynamic Huffman algorithm is that it would have an O(k) asymptotic memory requirement; where k is the maximum number of nodes in the Huffman tree, k \u3c n, and k is a user chosen constant. The new memory limited Dynamic Huffman algorithm compresses horizontally encoded transaction databases that do not contain long runs of 0’s or 1’s

    Engineering Smart Software Services for Intelligent Pervasive Systems

    Get PDF
    Pervasive computing systems, envisioned as systems that blend with the physical environment to enhance the quality of life of its users, are rapidly becoming a not so distant reality. However, many challenges must be addressed before realizing the goal of having such computing systems as part of our everyday life. One such challenge is related to the problem of how to develop in a systematic way the software that lies behind pervasive systems, operating them and allowing them to intelligently adapt both to users' changing needs and to variations in the environment. In spite of the important strides done in recent years concerning the engineering of software that places the actual, immediate needs and preferences of users in the center of attention, to the best of our knowledge no work has been devoted to the study of the engineering process for building software for pervasive systems. In this dissertation we focus on the engineering process to build smart software services for pervasive systems. Specifically, we first introduce as our first major contribution a model for the systematic construction of software for pervasive systems, which has been derived using analytical, evidence-based, and empirical methodologies. Then, on the basis of the proposed model, we investigate two essential mechanisms that provide support for the engineering of value-added software services for smart environments, namely the learning of users' daily routines and the continuous identification of users. For the case of learning users' daily routines, we propose what is our second main contribution: a novel approach that discovers periodic-frequent routines in event data from sensors and smart devices deployed at home. For the continuous identification of users we propose what is our third major contribution: a novel approach based on behavioral biometrics which is able to recognize identities without requiring any specific gesture, action, or activity from the users. The two approaches proposed have been extensively evaluated through studies in the lab, based on synthetic data, and in the wild, showing that they can be effectively applied to different scenarios and environments. In sum, the engineering model proposed in this dissertation is expected to serve as a basis to further the research and development efforts in key aspects that are necessary to build value-added smart software services that bring pervasive systems closer to the way they have been envisioned. Furthermore, the approaches proposed for learning users' daily routines and recognizing users' identities in smart environments are aimed at contributing to the investigation and development of the data analytics technology necessary for the smart adaptation and evolution of the software in pervasive systems to users' needs

    Collaborative Planning and Event Monitoring Over Supply Chain Network

    Get PDF
    The shifting paradigm of supply chain management is manifesting increasing reliance on automated collaborative planning and event monitoring through information-bounded interaction across organizations. An end-to-end support for the course of actions is turning vital in faster incident response and proactive decision making. Many current platforms exhibit limitations to handle supply chain planning and monitoring in decentralized setting where participants may divide their responsibilities and share computational load of the solution generation. In this thesis, we investigate modeling and solution generation techniques for shared commodity delivery planning and event monitoring problems in a collaborative setting. In particular, we first elaborate a new model of Multi-Depot Vehicle Routing Problem (MDVRP) to jointly serve customer demands using multiple vehicles followed by a heuristic technique to search near-optimal solutions for such problem instances. Secondly, we propose two distributed mechanisms, namely: Passive Learning and Active Negotiation, to find near-optimal MDVRP solutions while executing the heuristic algorithm at the participant's side. Thirdly, we illustrate a collaboration mechanism to cost-effectively deploy execution monitors over supply chain network in order to collect in-field plan execution data. Finally, we describe a distributed approach to collaboratively monitor associations among recent events from an incoming stream of plan execution data. Experimental results over known datasets demonstrate the efficiency of the approaches to handle medium and large problem instances. The work has also produced considerable knowledge on the collaborative transportation planning and execution event monitoring

    Efficient Algorithms to Compute Hierarchical Summaries from Big Data Streams

    Full text link
    Many data stream applications have hierarchical data; containing time, geographic locations, product information, clickstreams, server logs, IP addresses. A hierarchical summary of such volumous data offers multiple advantages including compactness, quick understanding, and abstraction. The goal of this thesis is to design algorithmic approaches for summarizing hierarchical data streams. First, this thesis provides a theoretical analysis of the benchmark hierarchical heavy hitters' algorithms and uncovers their shortcomings such as requiring high theoretical memory, updates and coverage problem. To address these shortcomings, this thesis proposes efficient algorithms which offer deterministic estimation accuracy using O(η/ε) worst-case memory and O(η) worst-case time complexity per item, where ε ∈ [0,1] is a user defined parameter and η is a small constant derived from the data. The proposed hierarchical heavy hitters' algorithms are shown to have improved significantly over existing algorithms both theoretically as well as empirically. Next, this thesis introduces a new concept called hierarchically correlated heavy hitters, which is different from existing hierarchical summarization techniques. The thesis provides a formal definition of the proposed concept and compares it with existing hierarchical summarization approaches both at definition level and empirically. It also proposes an efficient hierarchy-aware algorithm for computing hierarchically correlated heavy hitters. The proposed algorithm offers deterministic estimation accuracy using O(η / (ε_p * ε_s )) worst-case memory and O(η) worst-case time complexity per item, where η is as defined previously, and ε_p ∈ [0,1], ε_s ∈ [0,1] are other user defined parameters. Finally, the thesis proposes a special hierarchical data structure and algorithm to summarize spatiotemporal data. It can be used to extract interesting and useful patterns from high-speed spatiotemporal data streams at multiple spatial and temporal granularities. Theoretical and empirical analysis are provided, which show that the proposed data structure is very efficient concerning data storage and response to queries. It updates a single item in O(1) time and responds to a point query in O(1) time. Importantly, the memory requirement of the proposed data structure is independent of the size of the data and only depends on user-supplied parameters ψ ⃗ and φ ⃗. In summary, this thesis provides a general framework consisting of a set of algorithms and data structures to compute hierarchical summaries of the big data streams. All of the proposed algorithms exploit a lattice structure built from the hierarchical attributes of the data to compute different hierarchical summaries, which can be used to address various data analytic issues in many emerging applications

    Mining Association Rules Events over Data Streams

    Get PDF
    Data streams have gained considerable attention in data analysis and data mining communities because of the emergence of a new classes of applications, such as monitoring, supply chain execution, sensor networks, oilfield and pipeline operations, financial marketing and health data industries. Telecommunication advancements have provided us with easy access to stream data produced by various applications. Data in streams differ from static data stored in data warehouses or database. Data streams are continuous, arrive at high-speeds and change through time. Traditional data mining algorithms assume presence of data in conventional storage means where data mining is performed centrally with the luxury of accessing the data multiple times, using powerful processors, providing offline output with no time constraints. Such algorithms are not suitable for dynamic data streams. Stream data needs to be mined promptly as it might not be feasible to store such volume of data. In addition, streams reflect live status of the environment generating it, so prompt analysis may provide early detection of faults, delays, performance measurements, trend analysis and other diagnostics. This thesis focuses on developing a data stream association rule mining algorithm among co-occurring events. The proposed algorithm mines association rules over data streams incrementally in a centralized setting. We are interested in association rules that meet a provided minimum confidence threshold and have a lift value greater than 1. We refer to such association rules as strong rules. Experiments on several datasets demonstrate that the proposed algorithms is efficient and effective in extracting association rules from data streams, thus having a faster processing time and better memory management
    corecore