Search CORE

75 research outputs found

An automatic translation scheme from prolog to the andorra kernel language

Author: Bueno Carrillo Francisco
Hermenegildo Manuel V.
Publication venue: Facultad de Informática (UPM)
Publication date: 01/06/1992
Field of study

The Andorra family of languages (which includes the Andorra Kernel Language -AKL) is aimed, in principie, at simultaneously supporting the programming styles of Prolog and committed choice languages. On the other hand, AKL requires a somewhat detailed specification of control by the user. This could be avoided by programming in Prolog to run on AKL. However, Prolog programs cannot be executed directly on AKL. This is due to a number of factors, from more or less trivial syntactic differences to more involved issues such as the treatment of cut and making the exploitation of certain types of parallelism possible. This paper provides basic guidelines for constructing an automatic compiler of Prolog programs into AKL, which can bridge those differences. In addition to supporting Prolog, our style of translation achieves independent and-parallel execution where possible, which is relevant since this type of parallel execution preserves, through the translation, the user-perceived "complexity" of the original Prolog program

Archivo Digital UPM

A Statically Typed Logic Context Query Language With Parametric Polymorphism and Subtyping

Author: Rho Tobias
Publication venue: Universitäts- und Landesbibliothek Bonn
Publication date
Field of study

The objective of this thesis is programming language support for context-sensitive program adaptations. Driven by the requirements for context-aware adaptation languages, a statically typed Object-oriented logic Context Query Language (OCQL) was developed, which is suitable for integration with adaptation languages based on the Java type system. The ambient information considered in context-aware applications often originates from several, potentially distributed sources. OCQL employs the Semantic Web-language RDF Schema to structure and combine distributed context information. OCQL offers parametric polymorphism, subtyping, and a fixed set of meta-predicates. Its type system is based on mode analysis and a subset of Java Generics. For this reason a mode-inference approach for normal logic programs that considers variable aliasing and sharing was extended to cover all-solution predicates. OCQL is complemented by a service-oriented context-management infrastructure that supports the integration of OCQL with runtime adaptation approaches. The applicability of the language and its infrastructure were demonstrated with the context-aware aspect language CSLogicAJ. CSLogicAJ aspects encapsulate context-aware behavior and define in which contextual situation and program execution state the behavior is woven into the running program. The thesis concludes with a case study analyzing how runtime adaptation of mobile applications can be supported by pure object-, service- and context-aware aspect-orientation. Our study has shown that CSLogicAJ can improve the modularization of context-aware applications and reduce anticipation of runtime adaptations when compared to other approaches

bonndoc – Der Publikationsserver der Universität Bonn

Analysis of hate speech detection in social media

Author: Sánchez Lladó Ferran
Publication venue
Publication date: 20/07/2021
Field of study

Treballs Finals de Grau d'Enginyeria Informàtica, Facultat de Matemàtiques, Universitat de Barcelona, Any: 2021, Director: Maria Salamó Llorente[en] The presence of social networks has increased in our daily lives and have become platforms for sharing information. But, it also can be used for sending hate messages or for propagating false news. Users can take advantage of their anonymity to provide these toxic interactions. Furthermore, some groups of people (minorities) get disproportionately more targeted than the rest. This raises the problem of how to detect if a message contains hate speech. A solution could be the use of machine learning models that would be in charge of this decision. In addition, it could handle the enormous amount of texts interchanged daily. However, there are many approaches to tackle the problem, which are divided mainly into two groups. The first one is through the use of classical algorithms to extract information from the text. The other one is through the use of deep learning models that can understand some context that allows for better predictions. The main objectives of the project are the exploration and comparison of different types of models and techniques. The diverse models are trained with three distinct toxicity datasets, of two natural language processing competitions. Generally, the best performing model is BERT or SBERT, both models based on the deep learning approach, with metric scores much higher than any model based on the traditional methods. The results show the vast potential of Natural Language Processing for the detection of hate speech. Although the best models did not have a very high perplexity, a more reliable model could be trained with more training data or new architectures. Even at the current state, the models could be used as an external font for helping humans in the decision-making process. Moreover, these models could filter the most confident predictions while leaving the rest for the reviewer team

Diposit Digital de la Universitat de Barcelona

Blockchain Large Language Models

Author: Arthur Gervais
Dawn Song
Kaihua Qin
Liyi Zhou
Yu Gai
Publication venue: International Association for Cryptologic Research (IACR)
Publication date: 29/04/2023
Field of study

This paper presents a dynamic, real-time approach to detecting anomalous blockchain transactions. The proposed tool, BlockGPT, generates tracing representations of blockchain activity and trains from scratch a large language model to act as a real-time Intrusion Detection System. Unlike traditional methods, BlockGPT is designed to offer an unrestricted search space and does not rely on predefined rules or patterns, enabling it to detect a broader range of anomalies. We demonstrate the effectiveness of BlockGPT through its use as an anomaly detection tool for Ethereum transactions. In our experiments, it effectively identifies abnormal transactions among a dataset of 68M transactions and has a batched throughput of 2284 trans- actions per second on average. Our results show that, BlockGPT identifies abnormal transactions by ranking 49 out of 124 attacks among the top-3 most abnormal transactions interacting with their victim contracts. This work makes contributions to the field of blockchain transaction analysis by introducing a custom data encoding compatible with the transformer architecture, a domain-specific tokenization technique, and a tree encoding method specifically crafted for the Ethereum Virtual Machine (EVM) trace representation

Cryptology ePrint Archive

Improving Usability And Scalability Of Big Data Workflows In The Cloud

Author: Mohan Aravind
Publication venue: DigitalCommons@WayneState
Publication date: 01/01/2017
Field of study

Big data workflows have recently emerged as the next generation of data-centric workflow technologies to address the five “V” challenges of big data: volume, variety, velocity, veracity, and value. More formally, a big data workflow is the computerized modeling and automation of a process consisting of a set of computational tasks and their data interdependencies to process and analyze data of ever increasing in scale, complexity, and rate of acquisition. The convergence of big data and workflows creates new challenges in workflow community. First, the variety of big data results in a need for integrating large number of remote Web services and other heterogeneous task components that can consume and produce data in various formats and models into a uniform and interoperable workflow. Existing approaches fall short in addressing the so-called shimming problem only in an adhoc manner and unable to provide a generic solution. We automatically insert a piece of code called shims or adaptors in order to resolve the data type mismatches. Second, the volume of big data results in a large number of datasets that needs to be queried and analyzed in an effective and personalized manner. Further, there is also a strong need for sharing, reusing, and repurposing existing tasks and workflows across different users and institutes. To overcome such limitations, we propose a folksonomy- based social workflow recommendation system to improve workflow design productivity and efficient dataset querying and analyzing. Third, the volume of big data results in the need to process and analyze data of ever increasing in scale, complexity, and rate of acquisition. But a scalable distributed data model is still missing that abstracts and automates data distribution, parallelism, and scalable processing. We propose a NoSQL collectional data model that addresses this limitation. Finally, the volume of big data combined with the unbound resource leasing capability foreseen in the cloud, facilitates data scientists to wring actionable insights from the data in a time and cost efficient manner. We propose BARENTS scheduler that supports high-performance workflow scheduling in a heterogeneous cloud-computing environment with a single objective to minimize the workflow makespan under a user provided budget constraint

Digital Commons@Wayne State University

Measuring Semantic Similarity of Documents by Using Named Entity Recognition Methods

Author: Muñoz Morales David Efraín
Publication venue: Technological University Dublin
Publication date: 01/01/2022
Field of study

The work presented in this thesis was born from the desire to map documents with similar semantic concepts between them. We decided to address this problem as a named entity recognition task, where we have identified key concepts in the texts we use, and we have categorized them. So, we can apply named entity recognition techniques and automatically recognize these key concepts inside other documents. However, we propose the use of a classification method based on the recognition of named entities or key phrases, where the method can detect similarities between key concepts of the texts to be analyzed, and through the use of Poincaré embeddings, the model can associate the existing relationship between these concepts. Thanks to the Poincaré Embeddings’ ability to capture relationships between words, we were able to implement this feature in our classifier. Consequently for each word in a text we check if there are words close to it that are also close to the words that make up the key phrases that we use as Gold Standard. Therefore when detecting potential close words that make up a named entity, the classifier then applies a series of characteristics to classify it. The methodology used performed better than when we only considered the POS structure of the named entities and their n-grams. However, determining the POS structure and the n-grams were important to improve the recognition of named entities in our research. By improving time to recognize similar key phrases between documents, some common tasks in large companies can have a notorious benefit. An important example is the evaluation of resumes, to determine the best professional for a specific position. This task is characterized by consuming a lot of time to find the best profiles for a position, but our contribution in this research work considerably reduces that time, finding the best profiles for a job. Here the experiments are shown considering job descriptions and real resumes, and the methodology used to determine the representation of each of these documents through their key phrases is explained

Arrow@TUDublin

Investigation of design and execution alternatives for the committed choice non-deterministic logic languages

Author: Trehan Rajiv
Publication venue: The University of Edinburgh
Publication date: 01/01/1989
Field of study

The general area of developing, applying and studying new and parallel models of computation is motivated by a need to overcome the limits of current Von Neumann based architectures. A key area of research in understanding how new technology can be applied to Al problem solving is through using logic languages. Logic programming languages provide a procedural interpretation for sentences of first order logic, mainly using a class of sentence called Horn clauses. Horn clauses are open to a wide variety of parallel evaluation models, giving possible speed-ups and alternative parallel models of execution. The research in this thesis is concerned with investigating one class of parallel logic language known as Committed Choice Non-Deterministic languages. The investigation considers the inherent parallel behaviour of Al programs implemented in the CCND languages and the effect of various alternatives open to language implementors and designers. This is achieved by considering how various Al programming techniques map to alternative language designs and the behaviour of these Al programs on alternative implementations of these languages. The aim of this work is to investigate how Al programming techniques are affected (qualitatively and quantitatively) by particular language features. The qualitative evaluation is a consideration of how Al programs can be mapped to the various CCND languages. The applications considered are general search algorithms (which focuses on the committed choice nature of the languages); chart parsing (which focuses on the differences between safe and unsafe languages); and meta-level inference (which focuses on the difference between deep and flat languages). The quantitative evaluation considers the inherent parallel behaviour of the resulting programs and the effect of possible implementation alternatives on this inherent behaviour. To carry out this quantitative evaluation we have implemented a system which improves on the current interpreter based evaluation systems. The new system has an improved model of execution and allows severa

Edinburgh Research Archive

Extração de informação de saúde através das redes sociais

Author: Salgado Pedro Manuel Oliva Teles
Publication venue
Publication date: 09/02/2021
Field of study

Social media has been proven to be an excellent resource for connecting people and creating a parallel community. Turning it into a suitable source for extracting real world events information and information about its users as well. All of this information can be carefully re-arranged for social monitoring purposes and for the good of its community. For extracting health evidence in the social media, we started by analyzing and identifying postpartum depression in social media posts. We participated in an online challenge, eRisk 2020, continuing the previous participation of BioInfo@UAVR, predicting self-harm users based on their publications on Reddit. We built an algorithm based on methods of Natural Language Processing capable of pre-processing text data and vectorizing it. We make use of linguistic features based on the frequency of specific sets of words, and other models widely used that represent whole documents with vectors, such as Tf-Idf and Doc2Vec. The vectors and the correspondent label are then passed to a Machine Learning classifier in order to train it. Based on the patterns it found, the model predicts a classification for unlabeled users. We use multiple classifiers, to find the one that behaves the best with the data. With the goal of getting the most out of the model, an optimization step is performed in which we remove stop words and set the text vectorization algorithms and classifier to be ran in parallel. An analysis of the feature importance is integrated and a validation step is performed. The results are discussed and presented in various plots, and include a comparison between different tuning strategies and the relation between the parameters and the score. We conclude that the choice of parameters is essential for achieving a better score and for finding them, there are other strategies more efficient then the widely used Grid Search. Finally, we compare several approaches for building an incremental classification based on the post timeline of the users. And conclude that it is possible to have a chronological perception of certain traits of Reddit users, specifically evaluating the risk of self-harm with a F1 Score of 0.73.As redes sociais são um excelente recurso para conectar pessoas, criando assim uma comunidade paralela em que fluem informações acerca de eventos globais bem como sobre os seus utilizadores. Toda esta informação pode ser trabalhada com o intuito de monitorizar o bem estar da sua comunidade. De forma a encontrar evidência médica nas redes sociais, começámos por analisar e identificar posts de mães em risco de depressão pós-parto no Reddit. Participámos num concurso online, eRisk 2020, com o intuito de continuar a participação da equipa BioInfo@ UAVR, em que prevemos utilizadores que estão em risco de se automutilarem através da análise das suas publicações no Reddit. Construímos um algoritmo com base em métodos de Processamento de Linguagem Natural capaz de pré-processar os dados de texto e vectorizá-los. Fazendo uso de características linguísticas baseadas na frequência de conjuntos de palavras, e outros modelos usados globalmente, capazes de representar documentos com vetores, como o Tf-Idf e o Doc2Vec. Os vetores e a sua respetiva classificação são depois disponibilizados a algoritmos de Aprendizagem Automática, para serem treinados e encontrar padrões entre eles. Utilizamos vários classificadores, de forma a encontrar o que se comporta melhor com os dados. Com base nos padrões que encontrou, os classificadores prevêm a classificação de utilizadores ainda por avaliar. De forma a tirar o máximo proveito do algoritmo, é desempenhada uma otimização em que as stop words são removidas e paralelizamos os algoritmos de vectorização de texto e o classificador. Incorporamos uma análise da importância dos atributos do modelo e a otimização dos híper parâmetros de forma a obter um resultado melhor. Os resultados são discutidos e apresentados em múltiplos plots, e incluem a comparação entre diferentes estratégias de optimização e observamos a relação entre os parâmetros e a sua performance. Concluimos que a escolha dos parâmetros é essencial para conseguir melhores resultados e que para os encontrar, existem estratégias mais eficientes que o habitual Grid Search, como o Random Search e a Bayesian Optimization. Comparamos também várias abordagens para formar uma classificação incremental que tem em conta a cronologia dos posts. Concluimos que é possível ter uma perceção cronológica de traços dos utilizadores do Reddit, nomeadamente avaliar o risco de automutilação, com um F1 Score de 0,73.Mestrado em Engenharia de Computadores e Telemátic

Repositório Institucional da Universidade de Aveiro