10 research outputs found

    A scalable framework for cross-lingual authorship identification

    Get PDF
    This is an accepted manuscript of an article published by Elsevier in Information Sciences on 10/07/2018, available online: https://doi.org/10.1016/j.ins.2018.07.009 The accepted version of the publication may differ from the final published version.© 2018 Elsevier Inc. Cross-lingual authorship identification aims at finding the author of an anonymous document written in one language by using labeled documents written in other languages. The main challenge of cross-lingual authorship identification is that the stylistic markers (features) used in one language may not be applicable to other languages in the corpus. Existing methods overcome this challenge by using external resources such as machine translation and part-of-speech tagging. However, such solutions are not applicable to languages with poor external resources (known as low resource languages). They also fail to scale as the number of candidate authors and/or the number of languages in the corpus increases. In this investigation, we analyze different types of stylometric features and identify 10 high-performance language-independent features for cross-lingual stylometric analysis tasks. Based on these stylometric features, we propose a cross-lingual authorship identification solution that can accurately handle a large number of authors. Specifically, we partition the documents into fragments where each fragment is further decomposed into fixed size chunks. Using a multilingual corpus of 400 authors with 825 documents written in 6 different languages, we show that our method can achieve an accuracy level of 96.66%. Our solution also outperforms the best existing solution that does not rely on external resources.Published versio

    ‎An Artificial Intelligence Framework for Supporting Coarse-Grained Workload Classification in Complex Virtual Environments

    Get PDF
    Cloud-based machine learning tools for enhanced Big Data applications}‎, ‎where the main idea is that of predicting the ``\emph{next}'' \emph{workload} occurring against the target Cloud infrastructure via an innovative \emph{ensemble-based approach} that combines the effectiveness of different well-known \emph{classifiers} in order to enhance the whole accuracy of the final classification‎, ‎which is very relevant at now in the specific context of \emph{Big Data}‎. ‎The so-called \emph{workload categorization problem} plays a critical role in improving the efficiency and reliability of Cloud-based big data applications‎. ‎Implementation-wise‎, ‎our method proposes deploying Cloud entities that participate in the distributed classification approach on top of \emph{virtual machines}‎, ‎which represent classical ``commodity'' settings for Cloud-based big data applications‎. ‎Given a number of known reference workloads‎, ‎and an unknown workload‎, ‎in this paper we deal with the problem of finding the reference workload which is most similar to the unknown one‎. ‎The depicted scenario turns out to be useful in a plethora of modern information system applications‎. ‎We name this problem as \emph{coarse-grained workload classification}‎, ‎because‎, ‎instead of characterizing the unknown workload in terms of finer behaviors‎, ‎such as CPU‎, ‎memory‎, ‎disk‎, ‎or network intensive patterns‎, ‎we classify the whole unknown workload as one of the (possible) reference workloads‎. ‎Reference workloads represent a category of workloads that are relevant in a given applicative environment‎. ‎In particular‎, ‎we focus our attention on the classification problem described above in the special case represented by \emph{virtualized environments}‎. ‎Today‎, ‎\emph{Virtual Machines} (VMs) have become very popular because they offer important advantages to modern computing environments such as cloud computing or server farms‎. ‎In virtualization frameworks‎, ‎workload classification is very useful for accounting‎, ‎security reasons‎, ‎or user profiling‎. ‎Hence‎, ‎our research makes more sense in such environments‎, ‎and it turns out to be very useful in a special context like Cloud Computing‎, ‎which is emerging now‎. ‎In this respect‎, ‎our approach consists of running several machine learning-based classifiers of different workload models‎, ‎and then deriving the best classifier produced by the \emph{Dempster-Shafer Fusion}‎, ‎in order to magnify the accuracy of the final classification‎. ‎Experimental assessment and analysis clearly confirm the benefits derived from our classification framework‎. ‎The running programs which produce unknown workloads to be classified are treated in a similar way‎. ‎A fundamental aspect of this paper concerns the successful use of data fusion in workload classification‎. ‎Different types of metrics are in fact fused together using the Dempster-Shafer theory of evidence combination‎, ‎giving a classification accuracy of slightly less than 80%80\%‎. ‎The acquisition of data from the running process‎, ‎the pre-processing algorithms‎, ‎and the workload classification are described in detail‎. ‎Various classical algorithms have been used for classification to classify the workloads‎, ‎and the results are compared‎

    Named Entity Recognition and Text Compression

    Get PDF
    Import 13/01/2017In recent years, social networks have become very popular. It is easy for users to share their data using online social networks. Since data on social networks is idiomatic, irregular, brief, and includes acronyms and spelling errors, dealing with such data is more challenging than that of news or formal texts. With the huge volume of posts each day, effective extraction and processing of these data will bring great benefit to information extraction applications. This thesis proposes a method to normalize Vietnamese informal text in social networks. This method has the ability to identify and normalize informal text based on the structure of Vietnamese words, Vietnamese syllable rules, and a trigram model. After normalization, the data will be processed by a named entity recognition (NER) model to identify and classify the named entities in these data. In our NER model, we use six different types of features to recognize named entities categorized in three predefined classes: Person (PER), Location (LOC), and Organization (ORG). When viewing social network data, we found that the size of these data are very large and increase daily. This raises the challenge of how to decrease this size. Due to the size of the data to be normalized, we use a trigram dictionary that is quite big, therefore we also need to decrease its size. To deal with this challenge, in this thesis, we propose three methods to compress text files, especially in Vietnamese text. The first method is a syllable-based method relying on the structure of Vietnamese morphosyllables, consonants, syllables and vowels. The second method is trigram-based Vietnamese text compression based on a trigram dictionary. The last method is based on an n-gram slide window, in which we use five dictionaries for unigrams, bigrams, trigrams, four-grams and five-grams. This method achieves a promising compression ratio of around 90% and can be used for any size of text file.In recent years, social networks have become very popular. It is easy for users to share their data using online social networks. Since data on social networks is idiomatic, irregular, brief, and includes acronyms and spelling errors, dealing with such data is more challenging than that of news or formal texts. With the huge volume of posts each day, effective extraction and processing of these data will bring great benefit to information extraction applications. This thesis proposes a method to normalize Vietnamese informal text in social networks. This method has the ability to identify and normalize informal text based on the structure of Vietnamese words, Vietnamese syllable rules, and a trigram model. After normalization, the data will be processed by a named entity recognition (NER) model to identify and classify the named entities in these data. In our NER model, we use six different types of features to recognize named entities categorized in three predefined classes: Person (PER), Location (LOC), and Organization (ORG). When viewing social network data, we found that the size of these data are very large and increase daily. This raises the challenge of how to decrease this size. Due to the size of the data to be normalized, we use a trigram dictionary that is quite big, therefore we also need to decrease its size. To deal with this challenge, in this thesis, we propose three methods to compress text files, especially in Vietnamese text. The first method is a syllable-based method relying on the structure of Vietnamese morphosyllables, consonants, syllables and vowels. The second method is trigram-based Vietnamese text compression based on a trigram dictionary. The last method is based on an n-gram slide window, in which we use five dictionaries for unigrams, bigrams, trigrams, four-grams and five-grams. This method achieves a promising compression ratio of around 90% and can be used for any size of text file.460 - Katedra informatikyvyhově

    Método para la evaluación de usabilidad de sitios web transaccionales basado en el proceso de inspección heurística

    Get PDF
    La usabilidad es considerada uno de los factores más importantes en el desarrollo de productos de software. Este atributo de calidad está referido al grado en que, usuarios específicos de un determinado aplicativo, pueden fácilmente hacer uso del software para lograr su propósito. Dada la importancia de este aspecto en el éxito de las aplicaciones informáticas, múltiples métodos de evaluación han surgido como instrumentos de medición que permiten determinar si la propuesta de diseño de la interfaz de un sistema de software es entendible, fácil de usar, atractiva y agradable al usuario. El método de evaluación heurística es uno de los métodos más utilizados en el área de Interacción Humano-Computador (HCI) para este propósito debido al bajo costo de su ejecución en comparación otras técnicas existentes. Sin embargo, a pesar de su amplio uso extensivo durante los últimos años, no existe un procedimiento formal para llevar a cabo este proceso de evaluación. Jakob Nielsen, el autor de esta técnica de inspección, ofrece únicamente lineamientos generales que, según la investigación realizada, tienden a ser interpretados de diferentes maneras por los especialistas. Por tal motivo, se ha desarrollado el presente proyecto de investigación que tiene como objetivo establecer un proceso sistemático, estructurado, organizado y formal para llevar a cabo evaluaciones heurísticas a productos de software. En base a un análisis exhaustivo realizado a aquellos estudios que reportan en la literatura el uso del método de evaluación heurística como parte del proceso de desarrollo de software, se ha formulado un nuevo método de evaluación basado en cinco fases: (1) planificación, (2) entrenamiento, (3) evaluación, (4) discusión y (5) reporte. Cada una de las fases propuestas que componen el protocolo de inspección contiene un conjunto de actividades bien definidas a ser realizadas por el equipo de evaluación como parte del proceso de inspección. Asimismo, se han establecido ciertos roles que deberán desempeñar los integrantes del equipo de inspectores para asegurar la calidad de los resultados y un apropiado desarrollo de la evaluación heurística. La nueva propuesta ha sido validada en dos escenarios académicos distintos (en Colombia, en una universidad pública, y en Perú, en dos universidades tanto en una pública como en una privada) demostrando en todos casos que es posible identificar más problemas de usabilidad altamente severos y críticos cuando un proceso estructurado de inspección es adoptado por los evaluadores. Otro aspecto favorable que muestran los resultados es que los evaluadores tienden a cometer menos errores de asociación (entre heurística que es incumplida y problemas de usabilidad identificados) y que la propuesta es percibida como fácil de usar y útil. Al validarse la nueva propuesta desarrollada por el autor de este estudio se consolida un nuevo conocimiento que aporta al bagaje cultural de la ciencia.Tesi

    Smart Sensor Technologies for IoT

    Get PDF
    The recent development in wireless networks and devices has led to novel services that will utilize wireless communication on a new level. Much effort and resources have been dedicated to establishing new communication networks that will support machine-to-machine communication and the Internet of Things (IoT). In these systems, various smart and sensory devices are deployed and connected, enabling large amounts of data to be streamed. Smart services represent new trends in mobile services, i.e., a completely new spectrum of context-aware, personalized, and intelligent services and applications. A variety of existing services utilize information about the position of the user or mobile device. The position of mobile devices is often achieved using the Global Navigation Satellite System (GNSS) chips that are integrated into all modern mobile devices (smartphones). However, GNSS is not always a reliable source of position estimates due to multipath propagation and signal blockage. Moreover, integrating GNSS chips into all devices might have a negative impact on the battery life of future IoT applications. Therefore, alternative solutions to position estimation should be investigated and implemented in IoT applications. This Special Issue, “Smart Sensor Technologies for IoT” aims to report on some of the recent research efforts on this increasingly important topic. The twelve accepted papers in this issue cover various aspects of Smart Sensor Technologies for IoT

    Sustainability, Digital Transformation and Fintech: The New Challenges of the Banking Industry

    Get PDF
    In the current competitive scenario, the banking industry must contend with multiple challenges tied to regulations, legacy systems, disruptive models/technologies, new competitors, and a restive customer base, while simultaneously pursuing new strategies for sustainable growth. Banking institutions that can address these emerging challenges and opportunities to effectively balance long-term goals with short-term performance pressures could be aptly rewarded. This book comprises a selection of papers addressing some of these relevant issues concerning the current challenges and opportunities for international banking institutions. Papers in this collection focus on the digital transformation of the banking industry and its effect on sustainability, the emergence of new competitors such as FinTech companies, the role of mobile banking in the industry, the connections between sustainability and financial performance, and other general sustainability and corporate social responsibility (CSR) topics related to the banking industry. The book is a Special Issue of the MDPI journal Sustainability, which has been sponsored by the Santander Financial Institute (SANFI), a Spanish research and training institution created as a collaboration between Santander Bank and the University of Cantabria. SANFI works to identify, develop, support, and promote knowledge, study, talent, and innovation in the financial sector

    Design and Management of Manufacturing Systems

    Get PDF
    Although the design and management of manufacturing systems have been explored in the literature for many years now, they still remain topical problems in the current scientific research. The changing market trends, globalization, the constant pressure to reduce production costs, and technical and technological progress make it necessary to search for new manufacturing methods and ways of organizing them, and to modify manufacturing system design paradigms. This book presents current research in different areas connected with the design and management of manufacturing systems and covers such subject areas as: methods supporting the design of manufacturing systems, methods of improving maintenance processes in companies, the design and improvement of manufacturing processes, the control of production processes in modern manufacturing systems production methods and techniques used in modern manufacturing systems and environmental aspects of production and their impact on the design and management of manufacturing systems. The wide range of research findings reported in this book confirms that the design of manufacturing systems is a complex problem and that the achievement of goals set for modern manufacturing systems requires interdisciplinary knowledge and the simultaneous design of the product, process and system, as well as the knowledge of modern manufacturing and organizational methods and techniques
    corecore