6 research outputs found

    Leadership in Action: How Top Hackers Behave A Big-Data Approach with Text-Mining and Sentiment Analysis

    Get PDF
    This paper examines hacker behavior in dark forums and identifies its significant predictors in the light of leadership theory for communities of practice. We combine techniques from online forum features as well as text-mining and sentiment-analysis of messages. We create a multinomial logistic regression model to achieve role-based hacker classification and validate our model with actual hacker forum data. We identify total number of messages, number of threads, hacker keyword frequency, and sentiments as the most significant predictors of expert hacker behavior. We also demonstrate that while disseminating technical knowledge, the hacker community follows Pareto principle. As a recommendation for future research, we build a unique keyword lexicon of the most significant terms derived by tf-idf measure. Such investigation of hacker behavior is particularly relevant for organizations in proactive prevention of cyber-attacks. Foresight on online hacker behavior can help businesses save losses from breaches and additional costs of attack-preventive measures

    Stilometrie in der Rechtslinguistik. Nutzung korpuslinguistischer Verfahren fĂŒr die Analyse deutschsprachiger Urteile

    Get PDF
    Mit aktuellen korpuslinguistischen Verfahren analysieren wir eine Sammlung von Entscheidungen des EuropĂ€ischen Gerichtshofs in deren deutscher Fassung. In Fortsetzung zum Beitrag Berteloot/Mielke/Wolff 2018 nehmen wir stilometrische Analyseverfahren in den Blick, die in den digitalen Geisteswissenschaften intensiv genutzt werden, um z.B. das Problem der Zuordnung von Autorenschaft zu untersuchen. Daran angelehnte Verfahren konnten im USamerikanischen Bereich bereits erfolgreich auf die Analyse von Urteilen des Supreme Court ĂŒbertragen werden. Wir stellen fĂŒr unser Beispielkorpus ausgewĂ€hlte Ergebnisse solcher stilometrischer Analysen vor und diskutieren deren Aussagekraft

    State of the art 2015: a literature review of social media intelligence capabilities for counter-terrorism

    Get PDF
    Overview This paper is a review of how information and insight can be drawn from open social media sources. It focuses on the specific research techniques that have emerged, the capabilities they provide, the possible insights they offer, and the ethical and legal questions they raise. These techniques are considered relevant and valuable in so far as they can help to maintain public safety by preventing terrorism, preparing for it, protecting the public from it and pursuing its perpetrators. The report also considers how far this can be achieved against the backdrop of radically changing technology and public attitudes towards surveillance. This is an updated version of a 2013 report paper on the same subject, State of the Art. Since 2013, there have been significant changes in social media, how it is used by terrorist groups, and the methods being developed to make sense of it.  The paper is structured as follows: Part 1 is an overview of social media use, focused on how it is used by groups of interest to those involved in counter-terrorism. This includes new sections on trends of social media platforms; and a new section on Islamic State (IS). Part 2 provides an introduction to the key approaches of social media intelligence (henceforth ‘SOCMINT’) for counter-terrorism. Part 3 sets out a series of SOCMINT techniques. For each technique a series of capabilities and insights are considered, the validity and reliability of the method is considered, and how they might be applied to counter-terrorism work explored. Part 4 outlines a number of important legal, ethical and practical considerations when undertaking SOCMINT work

    Semantic feature reduction and hybrid feature selection for clustering of Arabic Web pages

    Get PDF
    In the literature, high-dimensional data reduces the efficiency of clustering algorithms. Clustering the Arabic text is challenging because semantics of the text involves deep semantic processing. To overcome the problems, the feature selection and reduction methods have become essential to select and identify the appropriate features in reducing high-dimensional space. There is a need to develop a suitable design for feature selection and reduction methods that would result in a more relevant, meaningful and reduced representation of the Arabic texts to ease the clustering process. The research developed three different methods for analyzing the features of the Arabic Web text. The first method is based on hybrid feature selection that selects the informative term representation within the Arabic Web pages. It incorporates three different feature selection methods known as Chi-square, Mutual Information and Term Frequency–Inverse Document Frequency to build a hybrid model. The second method is a latent document vectorization method used to represent the documents as the probability distribution in the vector space. It overcomes the problems of high-dimension by reducing the dimensional space. To extract the best features, two document vectorizer methods have been implemented, known as the Bayesian vectorizer and semantic vectorizer. The third method is an Arabic semantic feature analysis used to improve the capability of the Arabic Web analysis. It ensures a good design for the clustering method to optimize clustering ability when analysing these Web pages. This is done by overcoming the problems of term representation, semantic modeling and dimensional reduction. Different experiments were carried out with k-means clustering on two different data sets. The methods provided solutions to reduce high-dimensional data and identify the semantic features shared between similar Arabic Web pages that are grouped together in one cluster. These pages were clustered according to the semantic similarities between them whereby they have a small Davies–Bouldin index and high accuracy. This study contributed to research in clustering algorithm by developing three methods to identify the most relevant features of the Arabic Web pages

    The Stylometric Processing of Sensory Open Source Data

    Get PDF
    This research project’s end goal is on the Lone Wolf Terrorist. The project uses an exploratory approach to the self-radicalisation problem by creating a stylistic fingerprint of a person's personality, or self, from subtle characteristics hidden in a person's writing style. It separates the identity of one person from another based on their writing style. It also separates the writings of suicide attackers from ‘normal' bloggers by critical slowing down; a dynamical property used to develop early warning signs of tipping points. It identifies changes in a person's moods, or shifts from one state to another, that might indicate a tipping point for self-radicalisation. Research into authorship identity using personality is a relatively new area in the field of neurolinguistics. There are very few methods that model how an individual's cognitive functions present themselves in writing. Here, we develop a novel algorithm, RPAS, which draws on cognitive functions such as aging, sensory processing, abstract or concrete thinking through referential activity emotional experiences, and a person's internal gender for identity. We use well-known techniques such as Principal Component Analysis, Linear Discriminant Analysis, and the Vector Space Method to cluster multiple anonymous-authored works. Here we use a new approach, using seriation with noise to separate subtle features in individuals. We conduct time series analysis using modified variants of 1-lag autocorrelation and the coefficient of skewness, two statistical metrics that change near a tipping point, to track serious life events in an individual through cognitive linguistic markers. In our journey of discovery, we uncover secrets about the Elizabethan playwrights hidden for over 400 years. We uncover markers for depression and anxiety in modern-day writers and identify linguistic cues for Alzheimer's disease much earlier than other studies using sensory processing. In using these techniques on the Lone Wolf, we can separate their writing style used before their attacks that differs from other writing
    corecore