990 research outputs found

    Information Flow for Web Security and Privacy

    Get PDF
    The use of libraries is prevalent in modern web development. But how to ensure sensitive data is not being leaked through these libraries? This is the first challenge this thesis aims to solve. We propose the use of information-flow control by developing a principled approach to allow information-flow tracking in libraries, even if the libraries are written in a language not supporting information-flow control. The approach allows library functions to have unlabel\ua0and relabel models that explain how values are unlabeled and relabeled when marshaled between the labeled program and the unlabeled library. The approach handles primitive values and lists, records, higher-order functions, and references through the use of lazy marshaling.Web pages can combine benign properties of a user\u27s browser to a fingerprint, which can identify the user. Fingerprinting can be intrusive and often happens without the user\u27s consent. The second challenge this thesis aims to solve is to bridge the gap between the principled approach of handling libraries, to practical use in the information-flow aware JavaScript interpreter JSFlow. We extend JSFlow to handle libraries and be deployed in a browser, enabling information-flow tracking on web pages to detect fingerprinting.Modern browsers allow for browser modifications through browser\ua0extensions. These extensions can be intrusive by, e.g., blocking content ormodifying the DOM, and it can be in the interest of web pages to detect which extensions are installed in the browser. The third challenge this thesis aims to solve is finding which browser extensions are executing in a user\u27s browser, and investigate how the installed browser extensions can be used to decrease the privacy of users. We do this by conducting several large-scale studies and show that due to added security by browser vendors, a web page may uniquely identify a user based on the installed browser extension alone.It is popular to use filter lists to block unwanted content such as ads and tracking scripts on web pages. These filter lists are usually crowd-sourced andmainly focus on English speaking regions. Non-English speaking regions should use a supplementary filter list, but smaller linguistic regions may not have an up to date filter list. The fourth challenge this thesis aims to solve is how to automatically generate supplementary filter lists for regions which currently do not have an up to date filter list

    Aspect Based Sentiment Analysis using Various Supervised Classification Techniques: An Overview

    Get PDF
    The Sentiment Analysis (SA) work is concerned with identifying aspect terms and categories and categorising emotions (positive, negatively, conflict, and neutral) in ratings and reviews. When it comes to subjectivity, it's typical to divide sentences into objective phrases that include accurate information and subjective statements that include express ideas, beliefs, and perspectives on a given topic. Various existing researchers have already done a lot of work in sentiment analysis with various methods, including aspect extraction. This paper proposed a systematic literature analysis of numerous sentiment analysis using supervised and unsupervised classification techniques. We investigate a few features extraction Natural language Processing (NLP) techniques used to identify aspects of machine learning for the detection of sentiment. An extensive experiment analysis, we discuss the findings of the study, challenges of the current and define the problem statement for the future directio

    Detecting Sexual Content at the Sentence Level in First Millennium Latin Texts

    Full text link
    In this study, we propose to evaluate the use of deep learning methods for semantic classification at the sentence level to accelerate the process of corpus building in the field of humanities and linguistics, a traditional and time-consuming task. We introduce a novel corpus comprising around 2500 sentences spanning from 300 BCE to 900 CE including sexual semantics (medical, erotica, etc.). We evaluate various sentence classification approaches and different input embedding layers, and show that all consistently outperform simple token-based searches. We explore the integration of idiolectal and sociolectal metadata embeddings (centuries, author, type of writing), but find that it leads to overfitting. Our results demonstrate the effectiveness of this approach, achieving high precision and true positive rates (TPR) of respectively 70.60% and 86.33% using HAN. We evaluate the impact of the dataset size on the model performances (420 instead of 2013), and show that, while our models perform worse, they still offer a high enough precision and TPR, even without MLM, respectively 69% and 51%. Given the result, we provide an analysis of the attention mechanism as a supporting added value for humanists in order to produce more data

    Decoding social media speak: developing a speech act theory research agenda

    Get PDF
    Purpose – Drawing on the theoretical domain of speech act theory (SAT) and a discussion of its suitability for setting the agenda for social media research, this study aims to explore a range of research directions that are both relevant and conceptually robust, to stimulate the advancement of knowledge and understanding of online verbatim data. Design/methodology/approach – Examining previously published cross-disciplinary research, the study identifies how recent conceptual and empirical advances in SAT may further guide the development of text analytics in a social media context. Findings – Decoding content and function word use in customers’ social media communication can enhance the efficiency of determining potential impacts of customer reviews, sentiment strength, the quality of contributions in social media, customers’ socialization perceptions in online communities and deceptive messages. Originality/value – Considering the variety of managerial demand, increasing and diverging social media formats, expanding archives, rapid development of software tools and fast-paced market changes, this study provides an urgently needed, theory-driven, coherent research agenda to guide the conceptual development of text analytics in a social media context

    The Nature of Ephemeral Secrets in Reverse Engineering Tasks

    Get PDF
    Reverse engineering is typically carried out on static binary objects, such as files or compiled programs. Often the goal of reverse engineering is to extract a secret that is ephemeral and only exists while the system is running. Automation and dynamic analysis enable reverse engineers to extract ephemeral secrets from dynamic systems, obviating the need for analyzing static artifacts such as executable binaries. I support this thesis through four automated reverse engineering efforts: (1) named entity extraction to track Chinese Internet censorship based on keywords; (2) dynamic information flow tracking to locate secret keys in memory for a live program; (3) man-in-the-middle to emulate server behavior for extracting cryptographic secrets; and, (4) large-scale measurement and data mining of TCP/IP handshake behaviors to reveal machines on the Internet vulnerable to TCP/IP hijacking and other attacks. In each of these cases, automation enables the extraction of ephemeral secrets, often in situations where there is no accessible static binary object containing the secret. Furthermore, each project was contingent on building an automated system that interacted with the dynamic system in order to extract the secret(s). This general approach provides a new perspective, increasing the types of systems that can be reverse engineered and provides a promising direction for the future of reverse engineering

    Parsing dialogue and argumentative structures

    Get PDF
    Le présent manuscrit présente de nouvelles techniques d'extraction des structures : du dialogue de groupe, d'une part; de textes argumentatifs, d'autre part. Déceler la structure de longs textes et de conversations est une étape cruciale afin de reconstruire leur signification sous-jacente. La difficulté de cette tâche est largement reconnue, sachant que le discours est une description de haut niveau du langage, et que le dialogue de groupe inclut de nombreux phénomènes linguistiques complexes. Historiquement, la représentation du discours a fortement évolué, partant de relations locales, formant des collections non-structurées, vers des arbres, puis des graphes contraints. Nos travaux utilisent ce dernier paradigme, via la Théorie de Représentation du Discours Segmenté. Notre recherche se base sur un corpus annoté de discussions en ligne en anglais, issues du jeu de société Les Colons de Catane. De par la nature stratégique des conversations, et la liberté que permet le format électronique des discussions, ces dialogues contiennent des Unités Discursives Complexes, des fils de discussion intriqués, parmi d'autres propriétés que la littérature actuelle sur l'analyse du discours ignore en général. Nous discutons de deux investigations liées à notre corpus. La première étend la définition de la contrainte de la frontière droite, une formalisation de certains principes de cohérence de la structure du discours, pour l'adapter au dialogue de groupe. La seconde fait la démonstration d'un processus d'extraction de données permettant à un joueur artificiel des Colons d'obtenir un avantage stratégique en déduisant les possessions de ses adversaires à partir de leurs négociations. Nous proposons de nouvelles méthodes d'analyse du dialogue, utilisant conjointement apprentissage automatisé, algorithmes de graphes et optimisation linéaire afin de produire des structures riches et expressives, avec une précision supérieure comparée aux efforts existants. Nous décrivons notre méthode d'analyse du discours par contraintes, d'abord sur des arbres en employant la construction d'un arbre couvrant maximal, puis sur des graphes orientés acycliques en utilisant la programmation linéaire par entiers avec une collection de contraintes originales. Nous appliquons enfin ces méthodes sur les structures de l'argumentation, avec un corpus de textes en anglais et en allemand, parallèlement annotés avec deux structures du discours et une argumentative. Nous comparons les trois couches d'annotation et expérimentons sur l'analyse de l'argumentation, obtenant de meilleurs résultats, relativement à des travaux similaires.This work presents novel techniques for parsing the structures of multi-party dialogue and argumentative texts. Finding the structure of extended texts and conversations is a critical step towards the extraction of their underlying meaning. The task is notoriously hard, as discourse is a high-level description of language, and multi-party dialogue involves many complex linguistic phenomena. Historically, representation of discourse moved from local relationships, forming unstructured collections, towards trees, then constrained graphs. Our work uses the latter framework, through Segmented Discourse Representation Theory. We base our research on a annotated corpus of English chats from the board game The Settlers of Catan. Per the strategic nature of the conversation and the freedom of online chat, these dialogues exhibit complex discourse units, interwoven threads, among other features which are mostly overlooked by the current parsing literature. We discuss two corpus-related experiments. The first expands the definition of the Right Frontier Constraint, a formalization of discourse coherence principles, to adapt it to multi-party dialogue. The second demonstrates a data extraction process giving a strategic advantage to an artificial player of Settlers by inferring its opponents' assets from chat negotiations. We propose new methods to parse dialogue, using jointly machine learning, graph algorithms and linear optimization, to produce rich and expressive structures with greater accuracy than previous attempts. We describe our method of constrained discourse parsing, first on trees using the Maximum Spanning Tree algorithm, then on directed acyclic graphs using Integer Linear Programming with a number of original constraints. We finally apply these methods to argumentative structures, on a corpus of English and German texts, jointly annotated in two discourse representation frameworks and one argumentative. We compare the three annotation layers, and experiment on argumentative parsing, achieving better performance than similar works
    corecore