12,262 research outputs found

    A forensics and compliance auditing framework for critical infrastructure protection

    Get PDF
    Contemporary societies are increasingly dependent on products and services provided by Critical Infrastructure (CI) such as power plants, energy distribution networks, transportation systems and manufacturing facilities. Due to their nature, size and complexity, such CIs are often supported by Industrial Automation and Control Systems (IACS), which are in charge of managing assets and controlling everyday operations. As these IACS become larger and more complex, encompassing a growing number of processes and interconnected monitoring and actuating devices, the attack surface of the underlying CIs increases. This situation calls for new strategies to improve Critical Infrastructure Protection (CIP) frameworks, based on evolved approaches for data analytics, able to gather insights from the CI. In this paper, we propose an Intrusion and Anomaly Detection System (IADS) framework that adopts forensics and compliance auditing capabilities at its core to improve CIP. Adopted forensics techniques help to address, for instance, post-incident analysis and investigation, while the support of continuous auditing processes simplifies compliance management and service quality assessment. More specifically, after discussing the rationale for such a framework, this paper presents a formal description of the proposed components and functions and discusses how the framework can be implemented using a cloud-native approach, to address both functional and non-functional requirements. An experimental analysis of the framework scalability is also provided.info:eu-repo/semantics/publishedVersio

    Sentiment analysis of financial Twitter posts on Twitter with the machine learning classifiers

    Get PDF
    This paper presents a sentiment analysis combining the lexicon-based and machine learning (ML)-based approaches in Turkish to investigate the public mood for the prediction of stock market behavior in BIST30, Borsa Istanbul. Our main motivation behind this study is to apply sentiment analysis to financial-related tweets in Turkish. We import 17189 tweets posted as "#Borsaistanbul, #Bist, #Bist30, #Bist100″ on Twitter between November 7, 2022, and November 15, 2022, via a MAXQDA 2020, a qualitative data analysis program. For the lexicon-based side, we use a multilingual sentiment offered by the Orange program to label the polarities of the 17189 samples as positive, negative, and neutral labels. Neutral labels are discarded for the machine learning experiments. For the machine learning side, we select 9076 data as positive and negative to implement the classification problem with six different supervised machine learning classifiers conducted in Python 3.6 with the sklearn library. In experiments, 80 % of the selected data is used for the training phase and the rest is used for the testing and validation phase. Results of the experiments show that the Support Vector Machine and Multilayer Perceptron classifier perform better than other classifiers with 0.89 and 0.88 accuracy and AUC values of 0.8729 and 0.8647 respectively. Other classifiers obtain approximately a 78,5 % accuracy rate. It is possible to increase sentiment analysis accuracy with parameter optimization on a larger, cleaner, and more balanced dataset by changing the pre-processing steps. This work can be expanded in the future to develop better sentiment analysis using deep learning approaches

    Meta-semantic practices in social interaction. Definitions and specifications provided in response to Was heißt X (‘what does X mean’)

    Get PDF
    In social interaction, different kinds of word-meaning can become problematic for participants. This study analyzes two meta-semantic practices, definitions and specifications, which are used in response to clarification requests in German implemented by the format Was heißt X (‘What does X mean?’). In the data studied, definitions are used to convey generalizable lexical meanings of mostly technical terms. These terms are either unknown to requesters, or, in pedagogical contexts, requesters ask in order to check the addressee’s knowledge. Specifications, in contrast, clarify aspects of local speaker meanings of ordinary expressions (e.g., reference, participants in an event, standards applied to scalar expressions). Both definitions and specifications are recipient-designed with respect to the (presumed) knowledge of the addressee and tailored to the topical and practical relevancies of the current interaction. Both practices attest to the flexibility and situatedness of speakers’ semantic understandings and to the systematicity of using meta-semantic practices differentially for different kinds of semantic problems. Data are come from mundane and institutional interaction in German from the public corpus FOLK

    Information actors beyond modernity and coloniality in times of climate change:A comparative design ethnography on the making of monitors for sustainable futures in Curaçao and Amsterdam, between 2019-2022

    Get PDF
    In his dissertation, Mr. Goilo developed a cutting-edge theoretical framework for an Anthropology of Information. This study compares information in the context of modernity in Amsterdam and coloniality in Curaçao through the making process of monitors and develops five ways to understand how information can act towards sustainable futures. The research also discusses how the two contexts, that is modernity and coloniality, have been in informational symbiosis for centuries which is producing negative informational side effects within the age of the Anthropocene. By exploring the modernity-coloniality symbiosis of information, the author explains how scholars, policymakers, and data-analysts can act through historical and structural roots of contemporary global inequities related to the production and distribution of information. Ultimately, the five theses propose conditions towards the collective production of knowledge towards a more sustainable planet

    Statistical analysis of grouped text documents

    Get PDF
    L'argomento di questa tesi sono i modelli statistici per l'analisi dei dati testuali, con particolare attenzione ai contesti in cui i campioni di testo sono raggruppati. Quando si ha a che fare con dati testuali, il primo problema è quello di elaborarli, per renderli compatibili dal punto di vista computazionale e metodologico con i metodi matematici e statistici prodotti e continuamente sviluppati dalla comunità scientifica. Per questo motivo, la tesi passa in rassegna i metodi esistenti per la rappresentazione analitica e l'elaborazione di campioni di dati testuali, compresi i "Vector Space Models", le "rappresentazioni distribuite" di parole e documenti e i "contextualized embeddings". Questa rassegna comporta la standardizzazione di una notazione che, anche all'interno dello stesso approccio di rappresentazione, appare molto eterogenea in letteratura. Vengono poi esplorati due domini di applicazione: i social media e il turismo culturale. Per quanto riguarda il primo, viene proposto uno studio sull'autodescrizione di gruppi diversi di individui sulla piattaforma StockTwits, dove i mercati finanziari sono gli argomenti dominanti. La metodologia proposta ha integrato diversi tipi di dati, sia testuali che variabili categoriche. Questo studio ha agevolato la comprensione sul modo in cui le persone si presentano online e ha trovato stutture di comportamento ricorrenti all'interno di gruppi di utenti. Per quanto riguarda il turismo culturale, la tesi approfondisce uno studio condotto nell'ambito del progetto "Data Science for Brescia - Arts and Cultural Places", in cui è stato addestrato un modello linguistico per classificare le recensioni online scritte in italiano in quattro aree semantiche distinte relative alle attrazioni culturali della città di Brescia. Il modello proposto permette di identificare le attrazioni nei documenti di testo, anche quando non sono esplicitamente menzionate nei metadati del documento, aprendo così la possibilità di espandere il database relativo a queste attrazioni culturali con nuove fonti, come piattaforme di social media, forum e altri spazi online. Infine, la tesi presenta uno studio metodologico che esamina la specificità di gruppo delle parole, analizzando diversi stimatori di specificità di gruppo proposti in letteratura. Lo studio ha preso in considerazione documenti testuali raggruppati con variabile di "outcome" e variabile di gruppo. Il suo contributo consiste nella proposta di modellare il corpus di documenti come una distribuzione multivariata, consentendo la simulazione di corpora di documenti di testo con caratteristiche predefinite. La simulazione ha fornito preziose indicazioni sulla relazione tra gruppi di documenti e parole. Inoltre, tutti i risultati possono essere liberamente esplorati attraverso un'applicazione web, i cui componenti sono altresì descritti in questo manoscritto. In conclusione, questa tesi è stata concepita come una raccolta di studi, ognuno dei quali suggerisce percorsi di ricerca futuri per affrontare le sfide dell'analisi dei dati testuali raggruppati.The topic of this thesis is statistical models for the analysis of textual data, emphasizing contexts in which text samples are grouped. When dealing with text data, the first issue is to process it, making it computationally and methodologically compatible with the existing mathematical and statistical methods produced and continually developed by the scientific community. Therefore, the thesis firstly reviews existing methods for analytically representing and processing textual datasets, including Vector Space Models, distributed representations of words and documents, and contextualized embeddings. It realizes this review by standardizing a notation that, even within the same representation approach, appears highly heterogeneous in the literature. Then, two domains of application are explored: social media and cultural tourism. About the former, a study is proposed about self-presentation among diverse groups of individuals on the StockTwits platform, where finance and stock markets are the dominant topics. The methodology proposed integrated various types of data, including textual and categorical data. This study revealed insights into how people present themselves online and found recurring patterns within groups of users. About the latter, the thesis delves into a study conducted as part of the "Data Science for Brescia - Arts and Cultural Places" Project, where a language model was trained to classify Italian-written online reviews into four distinct semantic areas related to cultural attractions in the Italian city of Brescia. The model proposed allows for the identification of attractions in text documents, even when not explicitly mentioned in document metadata, thus opening possibilities for expanding the database related to these cultural attractions with new sources, such as social media platforms, forums, and other online spaces. Lastly, the thesis presents a methodological study examining the group-specificity of words, analyzing various group-specificity estimators proposed in the literature. The study considered grouped text documents with both outcome and group variables. Its contribution consists of the proposal of modeling the corpus of documents as a multivariate distribution, enabling the simulation of corpora of text documents with predefined characteristics. The simulation provided valuable insights into the relationship between groups of documents and words. Furthermore, all its results can be freely explored through a web application, whose components are also described in this manuscript. In conclusion, this thesis has been conceived as a collection of papers. It aimed to contribute to the field with both applications and methodological proposals, and each study presented here suggests paths for future research to address the challenges in the analysis of grouped textual data

    Logical disagreement : an epistemological study

    Get PDF
    While the epistemic significance of disagreement has been a popular topic in epistemology for at least a decade, little attention has been paid to logical disagreement. This monograph is meant as a remedy. The text starts with an extensive literature review of the epistemology of (peer) disagreement and sets the stage for an epistemological study of logical disagreement. The guiding thread for the rest of the work is then three distinct readings of the ambiguous term ‘logical disagreement’. Chapters 1 and 2 focus on the Ad Hoc Reading according to which logical disagreements occur when two subjects take incompatible doxastic attitudes toward a specific proposition in or about logic. Chapter 2 presents a new counterexample to the widely discussed Uniqueness Thesis. Chapters 3 and 4 focus on the Theory Choice Reading of ‘logical disagreement’. According to this interpretation, logical disagreements occur at the level of entire logical theories rather than individual entailment-claims. Chapter 4 concerns a key question from the philosophy of logic, viz., how we have epistemic justification for claims about logical consequence. In Chapters 5 and 6 we turn to the Akrasia Reading. On this reading, logical disagreements occur when there is a mismatch between the deductive strength of one’s background logic and the logical theory one prefers (officially). Chapter 6 introduces logical akrasia by analogy to epistemic akrasia and presents a novel dilemma. Chapter 7 revisits the epistemology of peer disagreement and argues that the epistemic significance of central principles from the literature are at best deflated in the context of logical disagreement. The chapter also develops a simple formal model of deep disagreement in Default Logic, relating this to our general discussion of logical disagreement. The monograph ends in an epilogue with some reflections on the potential epistemic significance of convergence in logical theorizing

    Spatial adaptive settlement systems in archaeology. Modelling long-term settlement formation from spatial micro interactions

    Get PDF
    Despite research history spanning more than a century, settlement patterns still hold a promise to contribute to the theories of large-scale processes in human history. Mostly they have been presented as passive imprints of past human activities and spatial interactions they shape have not been studied as the driving force of historical processes. While archaeological knowledge has been used to construct geographical theories of evolution of settlement there still exist gaps in this knowledge. Currently no theoretical framework has been adopted to explore them as spatial systems emerging from micro-choices of small population units. The goal of this thesis is to propose a conceptual model of adaptive settlement systems based on complex adaptive systems framework. The model frames settlement system formation processes as an adaptive system containing spatial features, information flows, decision making population units (agents) and forming cross scale feedback loops between location choices of individuals and space modified by their aggregated choices. The goal of the model is to find new ways of interpretation of archaeological locational data as well as closer theoretical integration of micro-level choices and meso-level settlement structures. The thesis is divided into five chapters, the first chapter is dedicated to conceptualisation of the general model based on existing literature and shows that settlement systems are inherently complex adaptive systems and therefore require tools of complexity science for causal explanations. The following chapters explore both empirical and theoretical simulated settlement patterns based dedicated to studying selected information flows and feedbacks in the context of the whole system. Second and third chapters explore the case study of the Stone Age settlement in Estonia comparing residential location choice principles of different periods. In chapter 2 the relation between environmental conditions and residential choice is explored statistically. The results confirm that the relation is significant but varies between different archaeological phenomena. In the third chapter hunter-fisher-gatherer and early agrarian Corded Ware settlement systems were compared spatially using inductive models. The results indicated a large difference in their perception of landscape regarding suitability for habitation. It led to conclusions that early agrarian land use significantly extended land use potential and provided a competitive spatial benefit. In addition to spatial differences, model performance was compared and the difference was discussed in the context of proposed adaptive settlement system model. Last two chapters present theoretical agent-based simulation experiments intended to study effects discussed in relation to environmental model performance and environmental determinism in general. In the fourth chapter the central place foragingmodel was embedded in the proposed model and resource depletion, as an environmental modification mechanism, was explored. The study excluded the possibility that mobility itself would lead to modelling effects discussed in the previous chapter. The purpose of the last chapter is the disentanglement of the complex relations between social versus human-environment interactions. The study exposed non-linear spatial effects expected population density can have on the system and the general robustness of environmental inductive models in archaeology to randomness and social effect. The model indicates that social interactions between individuals lead to formation of a group agency which is determined by the environment even if individual cognitions consider the environment insignificant. It also indicates that spatial configuration of the environment has a certain influence towards population clustering therefore providing a potential pathway to population aggregation. Those empirical and theoretical results showed the new insights provided by the complex adaptive systems framework. Some of the results, including the explanation of empirical results, required the conceptual model to provide a framework of interpretation

    A systematic literature review on source code similarity measurement and clone detection: techniques, applications, and challenges

    Full text link
    Measuring and evaluating source code similarity is a fundamental software engineering activity that embraces a broad range of applications, including but not limited to code recommendation, duplicate code, plagiarism, malware, and smell detection. This paper proposes a systematic literature review and meta-analysis on code similarity measurement and evaluation techniques to shed light on the existing approaches and their characteristics in different applications. We initially found over 10000 articles by querying four digital libraries and ended up with 136 primary studies in the field. The studies were classified according to their methodology, programming languages, datasets, tools, and applications. A deep investigation reveals 80 software tools, working with eight different techniques on five application domains. Nearly 49% of the tools work on Java programs and 37% support C and C++, while there is no support for many programming languages. A noteworthy point was the existence of 12 datasets related to source code similarity measurement and duplicate codes, of which only eight datasets were publicly accessible. The lack of reliable datasets, empirical evaluations, hybrid methods, and focuses on multi-paradigm languages are the main challenges in the field. Emerging applications of code similarity measurement concentrate on the development phase in addition to the maintenance.Comment: 49 pages, 10 figures, 6 table

    Towards A Practical High-Assurance Systems Programming Language

    Full text link
    Writing correct and performant low-level systems code is a notoriously demanding job, even for experienced developers. To make the matter worse, formally reasoning about their correctness properties introduces yet another level of complexity to the task. It requires considerable expertise in both systems programming and formal verification. The development can be extremely costly due to the sheer complexity of the systems and the nuances in them, if not assisted with appropriate tools that provide abstraction and automation. Cogent is designed to alleviate the burden on developers when writing and verifying systems code. It is a high-level functional language with a certifying compiler, which automatically proves the correctness of the compiled code and also provides a purely functional abstraction of the low-level program to the developer. Equational reasoning techniques can then be used to prove functional correctness properties of the program on top of this abstract semantics, which is notably less laborious than directly verifying the C code. To make Cogent a more approachable and effective tool for developing real-world systems, we further strengthen the framework by extending the core language and its ecosystem. Specifically, we enrich the language to allow users to control the memory representation of algebraic data types, while retaining the automatic proof with a data layout refinement calculus. We repurpose existing tools in a novel way and develop an intuitive foreign function interface, which provides users a seamless experience when using Cogent in conjunction with native C. We augment the Cogent ecosystem with a property-based testing framework, which helps developers better understand the impact formal verification has on their programs and enables a progressive approach to producing high-assurance systems. Finally we explore refinement type systems, which we plan to incorporate into Cogent for more expressiveness and better integration of systems programmers with the verification process
    corecore