Search CORE

1,891 research outputs found

The Minimum Description Length Principle for Pattern Mining: A Survey

Author: Galbrun Esther
Publication venue
Publication date: 28/07/2021
Field of study

This is about the Minimum Description Length (MDL) principle applied to pattern mining. The length of this description is kept to the minimum. Mining patterns is a core task in data analysis and, beyond issues of efficient enumeration, the selection of patterns constitutes a major challenge. The MDL principle, a model selection method grounded in information theory, has been applied to pattern mining with the aim to obtain compact high-quality sets of patterns. After giving an outline of relevant concepts from information theory and coding, as well as of work on the theory behind the MDL and similar principles, we review MDL-based methods for mining various types of data and patterns. Finally, we open a discussion on some issues regarding these methods, and highlight currently active related data analysis problems

arXiv.org e-Print Archive

Event detection in high throughput social media

Author: Weiler Michael
Publication venue: Ludwig-Maximilians-Universität München
Publication date: 20/12/2016
Field of study

Enumerating Top-k Quasi-Cliques

Author: Das Apurba
Sanei-Mehri Seyed-Vahid
Tirthapura Srikanta
Tirthapura Srikanta
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2018
Field of study

Quasi-cliques are dense incomplete subgraphs of a graph that generalize the notion of cliques. Enumerating quasi-cliques from a graph is a robust way to detect densely connected structures with applications to bio-informatics and social network analysis. However, enumerating quasi-cliques in a graph is a challenging problem, even harder than the problem of enumerating cliques. We consider the enumeration of top-k degree-based quasi-cliques, and make the following contributions: (1) We show that even the problem of detecting if a given quasi-clique is maximal (i.e. not contained within another quasi-clique) is NP-hard (2) We present a novel heuristic algorithm KernelQC to enumerate the k largest quasi-cliques in a graph. Our method is based on identifying kernels of extremely dense subgraphs within a graph, following by growing subgraphs around these kernels, to arrive at quasi-cliques with the required densities (3) Experimental results show that our algorithm accurately enumerates quasi-cliques from a graph, is much faster than current state-of-the-art methods for quasi-clique enumeration (often more than three orders of magnitude faster), and can scale to larger graphs than current methods.Comment: 10 page

arXiv.org e-Print Archive

Digital Repository @ Iowa State University (ISU)

Quality of Information in Mobile Crowdsensing: Survey and Research Challenges

Author: Bhattacharjee Shameek
Das Sajal
Ghosh Nirnay
Melodia Tommaso
Restuccia Francesco
Publication venue
Publication date: 06/09/2017
Field of study

Smartphones have become the most pervasive devices in people's lives, and are clearly transforming the way we live and perceive technology. Today's smartphones benefit from almost ubiquitous Internet connectivity and come equipped with a plethora of inexpensive yet powerful embedded sensors, such as accelerometer, gyroscope, microphone, and camera. This unique combination has enabled revolutionary applications based on the mobile crowdsensing paradigm, such as real-time road traffic monitoring, air and noise pollution, crime control, and wildlife monitoring, just to name a few. Differently from prior sensing paradigms, humans are now the primary actors of the sensing process, since they become fundamental in retrieving reliable and up-to-date information about the event being monitored. As humans may behave unreliably or maliciously, assessing and guaranteeing Quality of Information (QoI) becomes more important than ever. In this paper, we provide a new framework for defining and enforcing the QoI in mobile crowdsensing, and analyze in depth the current state-of-the-art on the topic. We also outline novel research challenges, along with possible directions of future work.Comment: To appear in ACM Transactions on Sensor Networks (TOSN

arXiv.org e-Print Archive

Crossref

Missouri University of Science and Technology (Missouri S&T): Scholars' Mine

ActiveRemediation: The Search for Lead Pipes in Flint, Michigan

Author: Abernethy Jacob
Chojnacki Alex
Farahi Arya
Schwartz Eric
Webb Jared
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 17/08/2018
Field of study

We detail our ongoing work in Flint, Michigan to detect pipes made of lead and other hazardous metals. After elevated levels of lead were detected in residents' drinking water, followed by an increase in blood lead levels in area children, the state and federal governments directed over $125 million to replace water service lines, the pipes connecting each home to the water system. In the absence of accurate records, and with the high cost of determining buried pipe materials, we put forth a number of predictive and procedural tools to aid in the search and removal of lead infrastructure. Alongside these statistical and machine learning approaches, we describe our interactions with government officials in recommending homes for both inspection and replacement, with a focus on the statistical model that adapts to incoming information. Finally, in light of discussions about increased spending on infrastructure development by the federal government, we explore how our approach generalizes beyond Flint to other municipalities nationwide.Comment: 10 pages, 10 figures, To appear in KDD 2018, For associated promotional video, see https://www.youtube.com/watch?v=YbIn_axYu9

arXiv.org e-Print Archive

Crossref