Search CORE

1,972 research outputs found

Scalable anomaly detection in large homogeneous populations

Author: Abraham
Abraham
Basseville
Bertsekas
Boyd
Boyd
Candès
Chandola
Desforges
Donoho
Duda
Eskin
Fox
Gustafsson
Hastie
Henrik Ohlsson
Jain
Lennart Ljung
Ljung
Parra
Patton
Rousseeuw
S. Shankar Sastry
Sina Khoshfetrat Pakazad
Tan
Tianshi Chen
Tibsharani
Yuan
Publication venue: 'Elsevier BV'
Publication date
Field of study

Event detection in location-based social networks

Author: Capdevila Pujol Joan
Cerquides Jesús
Torres Viñals Jordi
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

With the advent of social networks and the rise of mobile technologies, users have become ubiquitous sensors capable of monitoring various real-world events in a crowd-sourced manner. Location-based social networks have proven to be faster than traditional media channels in reporting and geo-locating breaking news, i.e. Osama Bin Laden’s death was first confirmed on Twitter even before the announcement from the communication department at the White House. However, the deluge of user-generated data on these networks requires intelligent systems capable of identifying and characterizing such events in a comprehensive manner. The data mining community coined the term, event detection , to refer to the task of uncovering emerging patterns in data streams . Nonetheless, most data mining techniques do not reproduce the underlying data generation process, hampering to self-adapt in fast-changing scenarios. Because of this, we propose a probabilistic machine learning approach to event detection which explicitly models the data generation process and enables reasoning about the discovered events. With the aim to set forth the differences between both approaches, we present two techniques for the problem of event detection in Twitter : a data mining technique called Tweet-SCAN and a machine learning technique called Warble. We assess and compare both techniques in a dataset of tweets geo-located in the city of Barcelona during its annual festivities. Last but not least, we present the algorithmic changes and data processing frameworks to scale up the proposed techniques to big data workloads.This work is partially supported by Obra Social “la Caixa”, by the Spanish Ministry of Science and Innovation under contract (TIN2015-65316), by the Severo Ochoa Program (SEV2015-0493), by SGR programs of the Catalan Government (2014-SGR-1051, 2014-SGR-118), Collectiveware (TIN2015-66863-C2-1-R) and BSC/UPC NVIDIA GPU Center of Excellence.We would also like to thank the reviewers for their constructive feedback.Peer ReviewedPostprint (author's final draft

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Digital.CSIC

Hierarchical Change Point Detection on Dynamic Networks

Author: Chakrabarti Aniket
Parthasarathy Srinivasan
Sivakoff David
Wang Yu
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 17/06/2017
Field of study

This paper studies change point detection on networks with community structures. It proposes a framework that can detect both local and global changes in networks efficiently. Importantly, it can clearly distinguish the two types of changes. The framework design is generic and as such several state-of-the-art change point detection algorithms can fit in this design. Experiments on both synthetic and real-world networks show that this framework can accurately detect changes while achieving up to 800X speedup.Comment: 9 pages, ACM WebSci'1

arXiv.org e-Print Archive

Crossref

D-AREdevil: a novel approach for discovering disease-associated rare cell populations in mass cytometry data

Author: SUFFIOTTI Madeleine
Publication venue: Université de Lausanne, Faculté de biologie et médecine
Publication date: 01/01/2020
Field of study

Background: The advances in single-cell technologies such as mass cytometry provides increasing resolution of the complexity of cellular samples, allowing researchers to deeper investigate and understand the cellular heterogeneity and possibly detect and discover previously undetectable rare cell populations. The identification of rare cell populations is of paramount importance for understanding the onset, progression and pathogenesis of many diseases. However, their identification remains challenging due to the always increasing dimensionality and throughput of the data generated. Aim: This study aimed at implementing a straightforward approach that efficiently supports a data analyst to identify disease-associated rare cell populations in large and complex biological samples and within reasonable limits of time and computational infrastructure. Methods: We proposed a novel computational framework called D-AREdevil (disease- associated rare cells detection) for cytometry datasets. The main characteristic of our computational framework is the combination of an anomaly detection algorithm (i.e. LOF, or FiRE) that provides a continuous score for individual cells with one of the best performing and fastest unsupervised clustering methods (i.e. FlowSOM). In our approach, the LOF score serves to select a set of candidate cells belonging to one or more subgroups of similar rare cell populations. Then, we tested these subgroups of rare cells for association with a patient group, disease type, clinical outcome or other characteristic of interest. Results: We reported in this study the properties and implementation of D-AREdevil and presented an evaluation of its performances and applications on three different testing datasets based on mass cytometry data. We generated data mixed with one or more known rare cell populations at varying frequencies (below 1%) and tested the ability of our approach to identify those cells in order to bring them to the attention of the data analyst. This is a key step in the process of finding cell subgroups that are associated with a disease or outcome of interest, when their existence and identification is not previously known and has yet to be discovered. Conclusions: We proposed a novel computational framework with demostrated good sensitivity and precision in detecting target rare cell poopulations present at very low frequencies in the total datasets (<1%). -- Contexte: Les avancées en technologies sur cellules individuelles telles que la cytométrie de masse offrent une meilleure résolution de la complexité des échantillons cellulaires, permettant aux chercheurs d’étudier et de comprendre plus en profondeur l’hétérogénéité cellulaire et éventuellement de détecter et découvrir des populations de cellules rares auparavant indétectables. L’identification de populations de cellules rares est importante pour comprendre l’apparition, la progression et la pathogenèse de nombreuses maladies. Cependant, leur identification reste difficile en raison de la haute dimensionnalité et du débit toujours croissants de données générées. But: Cette étude met en œuvre une approche simple et efficace pour identifier des populations de cellules rares associées à une maladie dans des échantillons biologiques vastes et complexes dans des limites de temps et d’infrastructure de calcul raisonnables. Méthodes: Nous proposons un nouveau cadre de calcul appelé D-AREdevil (détection de cellules rares associées à une maladie) pour l’analyse de données de cytométrie de masse. La principale caractéristique de notre cadre computationnel est la combinaison d’un algorithme de détection d’anomalies (LOF ou FiRE) qui fournit un score continu pour chaque cellule avec l’une des méthodes de regroupement non-supervisé les plus performantes et les plus rapides (FlowSOM). Dans notre approche, le score LOF sert à sélectionner un ensemble de cellules candidates appartenant à un ou plusieurs sous-groupes de populations de cellules rares similaires. Ensuite, nous testons ces sous-groupes de cellules rares pour déterminer s’ils sont associées avec un groupe de patients, un type de maladie, un résultat clinique ou une autre caractéristique d’intérêt. Résultats: Dans cette étude, nous avons rapporté les propriétés et l’implémentation de D-AREdevil, et présenté une évaluation de ses performances et applications sur trois jeux de données différents de cytométrie de masse. Nous avons généré des données mélangées contenant une ou plusieurs populations de cellules rares connues à des fréquences variables (inférieures à 1%) et nous avons testé la capacité de notre approche à identifier ces cellules afin de les porter à l’attention de l’analyste. Il s’agit là d’une étape clé dans le processus de recherche de sous-groupes de cellules qui sont associés à une maladie ou à un résultat d’intérêt qui est encore inconnu. Conclusions: Nous proposons un nouveau cadre de calcul avec une bonne sensibilité et une bonne précision dans la détection de cellules rares qui sont présentes à de très basses fréquences dans l’ensemble des données (<1%)

Serveur académique lausannois

Anti-fragile ICT Systems

Author: Hole Kjell Jørgen
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 10/02/2021
Field of study

This book introduces a novel approach to the design and operation of large ICT systems. It views the technical solutions and their stakeholders as complex adaptive systems and argues that traditional risk analyses cannot predict all future incidents with major impacts. To avoid unacceptable events, it is necessary to establish and operate anti-fragile ICT systems that limit the impact of all incidents, and which learn from small-impact incidents how to function increasingly well in changing environments. The book applies four design principles and one operational principle to achieve anti-fragility for different classes of incidents. It discusses how systems can achieve high availability, prevent malware epidemics, and detect anomalies. Analyses of Netflix’s media streaming solution, Norwegian telecom infrastructures, e-government platforms, and Numenta’s anomaly detection software show that cloud computing is essential to achieving anti-fragility for classes of events with negative impacts

Directory of Open Access Books (DOAB)

PIKS: A Technique to Identify Actionable Trends for Policy-Makers Through Open Healthcare Data

Author: Dey Soumyabrata
Garai Subrata
Peng Hang
Rao A. Ravishankar
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 04/04/2023
Field of study

With calls for increasing transparency, governments are releasing greater amounts of data in multiple domains including finance, education and healthcare. The efficient exploratory analysis of healthcare data constitutes a significant challenge. Key concerns in public health include the quick identification and analysis of trends, and the detection of outliers. This allows policies to be rapidly adapted to changing circumstances. We present an efficient outlier detection technique, termed PIKS (Pruned iterative-k means searchlight), which combines an iterative k-means algorithm with a pruned searchlight based scan. We apply this technique to identify outliers in two publicly available healthcare datasets from the New York Statewide Planning and Research Cooperative System, and California's Office of Statewide Health Planning and Development. We provide a comparison of our technique with three other existing outlier detection techniques, consisting of auto-encoders, isolation forests and feature bagging. We identified outliers in conditions including suicide rates, immunity disorders, social admissions, cardiomyopathies, and pregnancy in the third trimester. We demonstrate that the PIKS technique produces results consistent with other techniques such as the auto-encoder. However, the auto-encoder needs to be trained, which requires several parameters to be tuned. In comparison, the PIKS technique has far fewer parameters to tune. This makes it advantageous for fast, "out-of-the-box" data exploration. The PIKS technique is scalable and can readily ingest new datasets. Hence, it can provide valuable, up-to-date insights to citizens, patients and policy-makers. We have made our code open source, and with the availability of open data, other researchers can easily reproduce and extend our work. This will help promote a deeper understanding of healthcare policies and public health issues

arXiv.org e-Print Archive

Matched filters for noisy induced subgraph detection

Author: Lyzinski Vince
Park Youngser
Priebe Carey E.
Sussman Daniel
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 03/06/2018
Field of study

First author draftWe consider the problem of finding the vertex correspondence between two graphs with different number of vertices where the smaller graph is still potentially large. We propose a solution to this problem via a graph matching matched filter: padding the smaller graph in different ways and then using graph matching methods to align it to the larger network. Under a statistical model for correlated pairs of graphs, which yields a noisy copy of the small graph within the larger graph, the resulting optimization problem can be guaranteed to recover the true vertex correspondence between the networks, though there are currently no efficient algorithms for solving this problem. We consider an approach that exploits a partially known correspondence and show via varied simulations and applications to the Drosophila connectome that in practice this approach can achieve good performance.https://arxiv.org/abs/1803.02423https://arxiv.org/abs/1803.0242

Boston University Institutional Repository (OpenBU)

Matched Filters for Noisy Induced Subgraph Detection

Author: Lyzinski Vince
Park Youngser
Priebe Carey E.
Sussman Daniel L.
Publication venue
Publication date: 03/06/2018
Field of study

The problem of finding the vertex correspondence between two noisy graphs with different number of vertices where the smaller graph is still large has many applications in social networks, neuroscience, and computer vision. We propose a solution to this problem via a graph matching matched filter: centering and padding the smaller adjacency matrix and applying graph matching methods to align it to the larger network. The centering and padding schemes can be incorporated into any algorithm that matches using adjacency matrices. Under a statistical model for correlated pairs of graphs, which yields a noisy copy of the small graph within the larger graph, the resulting optimization problem can be guaranteed to recover the true vertex correspondence between the networks. However, there are currently no efficient algorithms for solving this problem. To illustrate the possibilities and challenges of such problems, we use an algorithm that can exploit a partially known correspondence and show via varied simulations and applications to {\it Drosophila} and human connectomes that this approach can achieve good performance.Comment: 41 pages, 7 figure

arXiv.org e-Print Archive

Boston University Institutional Repository (OpenBU)