512 research outputs found
SALSA: A Novel Dataset for Multimodal Group Behavior Analysis
Studying free-standing conversational groups (FCGs) in unstructured social
settings (e.g., cocktail party ) is gratifying due to the wealth of information
available at the group (mining social networks) and individual (recognizing
native behavioral and personality traits) levels. However, analyzing social
scenes involving FCGs is also highly challenging due to the difficulty in
extracting behavioral cues such as target locations, their speaking activity
and head/body pose due to crowdedness and presence of extreme occlusions. To
this end, we propose SALSA, a novel dataset facilitating multimodal and
Synergetic sociAL Scene Analysis, and make two main contributions to research
on automated social interaction analysis: (1) SALSA records social interactions
among 18 participants in a natural, indoor environment for over 60 minutes,
under the poster presentation and cocktail party contexts presenting
difficulties in the form of low-resolution images, lighting variations,
numerous occlusions, reverberations and interfering sound sources; (2) To
alleviate these problems we facilitate multimodal analysis by recording the
social interplay using four static surveillance cameras and sociometric badges
worn by each participant, comprising the microphone, accelerometer, bluetooth
and infrared sensors. In addition to raw data, we also provide annotations
concerning individuals' personality as well as their position, head, body
orientation and F-formation information over the entire event duration. Through
extensive experiments with state-of-the-art approaches, we show (a) the
limitations of current methods and (b) how the recorded multiple cues
synergetically aid automatic analysis of social interactions. SALSA is
available at http://tev.fbk.eu/salsa.Comment: 14 pages, 11 figure
Mean birds: Detecting aggression and bullying on Twitter
In recent years, bullying and aggression against social media users have grown significantly, causing serious consequences to victims of all demographics. Nowadays, cyberbullying affects more than half of young social media users worldwide, suffering from prolonged and/or coordinated digital harassment. Also, tools and technologies geared to understand and mitigate it are scarce and mostly ineffective. In this paper, we present a principled and scalable approach to detect bullying and aggressive behavior on Twitter. We propose a robust methodology for extracting text, user, and network-based attributes, studying the properties of bullies and aggressors, and what features distinguish them from regular users. We find that bullies post less, participate in fewer online communities, and are less popular than normal users. Aggressors are relatively popular and tend to include more negativity in their posts. We evaluate our methodology using a corpus of 1.6M tweets posted over 3 months, and show that machine learning classification algorithms can accurately detect users exhibiting bullying and aggressive behavior, with over 90% AUC
Recommended from our members
Oscillation-specific nodal alterations in early to middle stages Parkinsons disease.
Background: Different oscillations of brain networks could carry different dimensions of brain integration. We aimed to investigate oscillation-specific nodal alterations in patients with Parkinsons disease (PD) across early stage to middle stage by using graph theory-based analysis. Methods: Eighty-eight PD patients including 39 PD patients in the early stage (EPD) and 49 patients in the middle stage (MPD) and 36 controls were recruited in the present study. Graph theory-based network analyses from three oscillation frequencies (slow-5: 0.01-0.027 Hz; slow-4: 0.027-0.073 Hz; slow-3: 0.073-0.198 Hz) were analyzed. Nodal metrics (e.g. nodal degree centrality, betweenness centrality and nodal efficiency) were calculated. Results: Our results showed that (1) a divergent effect of oscillation frequencies on nodal metrics, especially on nodal degree centrality and nodal efficiency, that the anteroventral neocortex and subcortex had high nodal metrics within low oscillation frequencies while the posterolateral neocortex had high values within the relative high oscillation frequency was observed, which visually showed that network was perturbed in PD; (2) PD patients in early stage relatively preserved nodal properties while MPD patients showed widespread abnormalities, which was consistently detected within all three oscillation frequencies; (3) the involvement of basal ganglia could be specifically observed within slow-5 oscillation frequency in MPD patients; (4) logistic regression and receiver operating characteristic curve analyses demonstrated that some of those oscillation-specific nodal alterations had the ability to well discriminate PD patients from controls or MPD from EPD patients at the individual level; (5) occipital disruption within high frequency (slow-3) made a significant influence on motor impairment which was dominated by akinesia and rigidity. Conclusions: Coupling various oscillations could provide potentially useful information for large-scale network and progressive oscillation-specific nodal alterations were observed in PD patients across early to middle stages
Behavioral analysis in cybersecurity using machine learning: a study based on graph representation, class imbalance and temporal dissection
The main goal of this thesis is to improve behavioral cybersecurity analysis using machine learning, exploiting graph structures, temporal dissection, and addressing imbalance problems.This main objective is divided into four specific goals:
OBJ1: To study the influence of the temporal resolution on highlighting micro-dynamics in the entity behavior classification problem. In real use cases, time-series information could be not enough for describing the entity behavior classification. For this reason, we plan to exploit graph structures for integrating both structured and unstructured data in a representation of entities and their relationships. In this way, it will be possible to appreciate not only the single temporal communication but the whole behavior of these entities. Nevertheless, entity behaviors evolve over time and therefore, a static graph may not be enoughto describe all these changes. For this reason, we propose to use a temporal dissection for creating temporal subgraphs and therefore, analyze the influence of the temporal resolution on the graph creation and the entity behaviors within. Furthermore, we propose to study how the temporal granularity should be used for highlighting network micro-dynamics and short-term behavioral changes which can be a hint of suspicious activities. OBJ2: To develop novel sampling methods that work with disconnected graphs for addressing imbalanced problems avoiding component topology changes. Graph imbalance problem is a very common and challenging task and traditional graph sampling techniques that work directly on these structures cannot be used without modifying the graph’s intrinsic information or introducing bias. Furthermore, existing techniques have shown to be limited when disconnected graphs are used. For this reason, novel resampling methods for balancing the number of nodes that can be directly applied over disconnected graphs, without altering component topologies, need to be introduced. In particular, we propose to take advantage of the existence of disconnected graphs to detect and replicate the most relevant graph components without changing their topology, while considering traditional data-level strategies for handling the entity behaviors within. OBJ3: To study the usefulness of the generative adversarial networks for addressing the class imbalance problem in cybersecurity applications. Although traditional data-level pre-processing techniques have shown to be effective for addressing class imbalance problems, they have also shown downside effects when highly variable datasets are used, as it happens in cybersecurity. For this reason, new techniques that can exploit the overall data distribution for learning highly variable behaviors should be investigated. In this sense, GANs have shown promising results in the image and video domain, however, their extension to tabular data is not trivial. For this reason, we propose to adapt GANs for working with cybersecurity data and exploit their ability in learning and reproducing the input distribution for addressing the class imbalance problem (as an oversampling technique). Furthermore, since it is not possible to find a unique GAN solution that works for every scenario, we propose to study several GAN architectures with several training configurations to detect which is the best option for a cybersecurity application. OBJ4: To analyze temporal data trends and performance drift for enhancing cyber threat analysis. Temporal dynamics and incoming new data can affect the quality of the predictions compromising the model reliability. This phenomenon makes models get outdated without noticing. In this sense, it is very important to be able to extract more insightful information from the application domain analyzing data trends, learning processes, and performance drifts over time. For this reason, we propose to develop a systematic approach for analyzing how the data quality and their amount affect the learning process. Moreover, in the contextof CTI, we propose to study the relations between temporal performance drifts and the input data distribution for detecting possible model limitations, enhancing cyber threat analysis.Programa de Doctorado en Ciencias y TecnologĂas Industriales (RD 99/2011) Industria Zientzietako eta Teknologietako Doktoretza Programa (ED 99/2011
Memory in network flows and its effects on spreading dynamics and community detection
Random walks on networks is the standard tool for modelling spreading
processes in social and biological systems. This first-order Markov approach is
used in conventional community detection, ranking, and spreading analysis
although it ignores a potentially important feature of the dynamics: where flow
moves to may depend on where it comes from. Here we analyse pathways from
different systems, and while we only observe marginal consequences for disease
spreading, we show that ignoring the effects of second-order Markov dynamics
has important consequences for community detection, ranking, and information
spreading. For example, capturing dynamics with a second-order Markov model
allows us to reveal actual travel patterns in air traffic and to uncover
multidisciplinary journals in scientific communication. These findings were
achieved only by using more available data and making no additional
assumptions, and therefore suggest that accounting for higher-order memory in
network flows can help us better understand how real systems are organized and
function.Comment: 23 pages and 16 figure
A Bioinformatics Approach to Synthetic Lethal Interactions in Cancer with Gene Expression Data
Introduction
Synthetic lethal genetic interactions are re-emerging as an important concept in the post-genomics era due to their potential for use in precision medicine against cancers. Synthetic lethal drug design exploits the functional redundancy of genes disrupted in cancers (including tumour suppressors) to develop specific treatments against them. CDH1, which encodes E-cadherin, is a tumour supressor gene with loss of function in breast and stomach cancers. Experimental screens have identified candidate synthetic lethal interactions with CDH1, which can be further supported with bioinformatics analysis. Furthermore, gene expression data enables investigation of synthetic lethal pathways and the structure of synthetic lethal genes.
Methods
A computational methodology, the Synthetic Lethal Prediction Tool (SLIPT) was developed to detect synthetic lethal interactions in gene expression data. The application of this methodology is demonstrated on interactions with CDH1 in breast and stomach cancer data from The Cancer Genome Atlas (TCGA) project. Synthetic lethal genes and pathways were further investigated with unsupervised clustering, gene set over-representation analysis, metagenes, and permutation resampling. In particular, analyses focused on comparing SLIPT gene candidates to an experimental short interfering RNA (siRNA) screen. Network analysis methods were applied to the most supported pathways to test for pathway structure between synthetic lethal candidates. Simulation and modelling was used to assess the statistical performance of SLIPT, including simulated data with correlation structures from graph structures.
Results
Many candidate synthetic lethal partners of CDH1 were detected in TCGA breast cancer. These genes clustered into several distinct groups, with distinct biological functions and elevated expression in different clinical subtypes. While the number of genes detected by both SLIPT and siRNA was not significant, these contained significantly enriched pathways. In particular, G αi signalling, cytoplasmic microfibres, and extracellular fibrin clotting were robustly supported by both approaches, which is consistent with the known cytoskeletal and cell signalling roles of E-cadherin. Many of these pathways were replicated in stomach cancer data. The pathways supported only by SLIPT included regulation of immune signalling and translation, which were not expected to be detected in an isogenic cell line model but are still candidates for further investigation.
Synthetic lethal candidates detected by SLIPT and siRNA were compared within the graph structures of the candidate synthetic lethal pathways. SLIPT genes had lower centrality and were consistently upstream of siRNA candidates, specifically in the G αi signalling pathway.
A statistical model of synthetic lethality was used to simulate gene expression data with known synthetic lethal partners for a gene. The SLIPT methodology had high statistical performance when detecting few synthetic lethal partners, which diminished with more synthetic lethal partners or lower sample size. The SLIPT methodology performed better than Pearson correlation or the χ 2 -test. In particular, it performed well with high specificity for datasets containing thousands of genes, or genes positively correlated with the query gene (as expected to occur in gene expression data). SLIPT was robust across correlation structures, including those derived from complex pathway structures, and often distinguished synthetic lethal genes from those positively or negatively correlated with them.
Thus this thesis has developed, evaluated, and applied a bioinformatics approach for the discovery of synthetic lethal genes from gene expression data. This approach has been demonstrated to detect biologically informative and clinically relevant candidate synthetic lethal partners for CDH1 in breast and stomach cancers
- …