246 research outputs found

    A New Incremental Decision Tree Learning for Cyber Security based on ILDA and Mahalanobis Distance

    Get PDF
    A cyber-attack detection is currently essential for computer network protection. The fundamentals of protection are to detect cyber-attack effectively with the ability to combat it in various ways and with constant data learning such as internet traffic. With these functions, each cyber-attack can be memorized and protected effectively any time. This research will present procedures for a cyber-attack detection system Incremental Decision Tree Learning (IDTL) that use the principle through Incremental Linear Discriminant Analysis (ILDA) together with Mahalanobis distance for classification of the hierarchical tree by reducing data features that enhance classification of a variety of malicious data. The proposed model can learn a new incoming datum without involving the previous learned data and discard this datum after being learned. The results of the experiments revealed that the proposed method can improve classification accuracy as compare with other methods. They showed the highest accuracy when compared to other methods. If comparing with the effectiveness of each class, it was found that the proposed method can classify both intrusion datasets and other datasets efficiently

    Reliable pattern recognition system with novel semi-supervised learning approach

    Get PDF
    Over the past decade, there has been considerable progress in the design of statistical machine learning strategies, including Semi-Supervised Learning (SSL) approaches. However, researchers still have difficulties in applying most of these learning strategies when two or more classes overlap, and/or when each class has a bimodal/multimodal distribution. In this thesis, an efficient, robust, and reliable recognition system with a novel SSL scheme has been developed to overcome overlapping problems between two classes and bimodal distribution within each class. This system was based on the nature of category learning and recognition to enhance the system's performance in relevant applications. In the training procedure, besides the supervised learning strategy, the unsupervised learning approach was applied to retrieve the "extra information" that could not be obtained from the images themselves. This approach was very helpful for the classification between two confusing classes. In this SSL scheme, both the training data and the test data were utilized in the final classification. In this thesis, the design of a promising supervised learning model with advanced state-of-the-art technologies is firstly presented, and a novel rejection measurement for verification of rejected samples, namely Linear Discriminant Analysis Measurement (LDAM), is defined. Experiments on CENPARMI's Hindu-Arabic Handwritten Numeral Database, CENPARMI's Numerals Database, and NIST's Numerals Database were conducted in order to evaluate the efficiency of LDAM. Moreover, multiple verification modules, including a Writing Style Verification (WSV) module, have been developed according to four newly defined error categories. The error categorization was based on the different costs of misclassification. The WSV module has been developed by the unsupervised learning approach to automatically retrieve the person's writing styles so that the rejected samples can be classified and verified accordingly. As a result, errors on CENPARMI's Hindu-Arabic Handwritten Numeral Database (24,784 training samples, 6,199 testing samples) were reduced drastically from 397 to 59, and the final recognition rate of this HAHNR reached 99.05%, a significantly higher rate compared to other experiments on the same database. When the rejection option was applied on this database, the recognition rate, error rate, and reliability were 97.89%, 0.63%, and 99.28%, respectivel

    Multilingual opinion mining

    Get PDF
    170 p.Cada día se genera gran cantidad de texto en diferentes medios online. Gran parte de ese texto contiene opiniones acerca de multitud de entidades, productos, servicios, etc. Dada la creciente necesidad de disponer de medios automatizados para analizar, procesar y explotar esa información, las técnicas de análisis de sentimiento han recibido gran cantidad de atención por parte de la industria y la comunidad científica durante la última década y media. No obstante, muchas de las técnicas empleadas suelen requerir de entrenamiento supervisado utilizando para ello ejemplos anotados manualmente, u otros recursos lingüísticos relacionados con un idioma o dominio de aplicación específicos. Esto limita la aplicación de este tipo de técnicas, ya que dicho recursos y ejemplos anotados no son sencillos de obtener. En esta tesis se explora una serie de métodos para realizar diversos análisis automáticos de texto en el marco del análisis de sentimiento, incluyendo la obtención automática de términos de un dominio, palabras que expresan opinión, polaridad del sentimiento de dichas palabras (positivas o negativas), etc. Finalmente se propone y se evalúa un método que combina representación continua de palabras (continuous word embeddings) y topic-modelling inspirado en la técnica de Latent Dirichlet Allocation (LDA), para obtener un sistema de análisis de sentimiento basado en aspectos (ABSA), que sólo necesita unas pocas palabras semilla para procesar textos de un idioma o dominio determinados. De este modo, la adaptación a otro idioma o dominio se reduce a la traducción de las palabras semilla correspondientes

    Content Recognition and Context Modeling for Document Analysis and Retrieval

    Get PDF
    The nature and scope of available documents are changing significantly in many areas of document analysis and retrieval as complex, heterogeneous collections become accessible to virtually everyone via the web. The increasing level of diversity presents a great challenge for document image content categorization, indexing, and retrieval. Meanwhile, the processing of documents with unconstrained layouts and complex formatting often requires effective leveraging of broad contextual knowledge. In this dissertation, we first present a novel approach for document image content categorization, using a lexicon of shape features. Each lexical word corresponds to a scale and rotation invariant local shape feature that is generic enough to be detected repeatably and is segmentation free. A concise, structurally indexed shape lexicon is learned by clustering and partitioning feature types through graph cuts. Our idea finds successful application in several challenging tasks, including content recognition of diverse web images and language identification on documents composed of mixed machine printed text and handwriting. Second, we address two fundamental problems in signature-based document image retrieval. Facing continually increasing volumes of documents, detecting and recognizing unique, evidentiary visual entities (\eg, signatures and logos) provides a practical and reliable supplement to the OCR recognition of printed text. We propose a novel multi-scale framework to detect and segment signatures jointly from document images, based on the structural saliency under a signature production model. We formulate the problem of signature retrieval in the unconstrained setting of geometry-invariant deformable shape matching and demonstrate state-of-the-art performance in signature matching and verification. Third, we present a model-based approach for extracting relevant named entities from unstructured documents. In a wide range of applications that require structured information from diverse, unstructured document images, processing OCR text does not give satisfactory results due to the absence of linguistic context. Our approach enables learning of inference rules collectively based on contextual information from both page layout and text features. Finally, we demonstrate the importance of mining general web user behavior data for improving document ranking and other web search experience. The context of web user activities reveals their preferences and intents, and we emphasize the analysis of individual user sessions for creating aggregate models. We introduce a novel algorithm for estimating web page and web site importance, and discuss its theoretical foundation based on an intentional surfer model. We demonstrate that our approach significantly improves large-scale document retrieval performance

    Design for London

    Get PDF
    Design for London was a unique experiment in urban planning, design and strategic thinking. Set up in 2006 by Mayor Ken Livingstone and his Architectural Advisor, Richard Rogers, the brief for the team was ‘to think about London, what made London unique and how it could be made better’. Sitting within London government but outside its formal statutory responsibilities, it was given freedom to question and challenge. The team had no power or money, but it did have the licence to operate without the usual constraints of government. With introductions from Ken Livingstone and Richard Rogers, Design for London covers the tumultuous and heady period of the first decade of this century when London was a test bed for new ideas. It outlines how key projects such as the London Olympics, public space programmes, high street regeneration and greening programmes were managed, critically examines the lessons that might be learnt in strategic urban design and considers how a design agenda for London could be developed in the future. By providing an engaging account of the strategic approaches and work of Design for London, and documenting the particular methodology and approach to urban theory it developed, Design for London will appeal to undergraduate and postgraduate students of planning, urban design and architecture, and to current practitioners from the public, private and community sectors who are struggling to achieve regeneration through poorly understood ‘placemaking’ concepts

    Design for London

    Get PDF
    Design for London was a unique experiment in urban planning, design and strategic thinking. Set up in 2006 by Mayor Ken Livingstone and his Architectural Advisor, Richard Rogers, the brief for the team was ‘to think about London, what made London unique and how it could be made better’. Sitting within London government but outside its formal statutory responsibilities, it was given freedom to question and challenge. The team had no power or money, but it did have the licence to operate without the usual constraints of government. With introductions from Ken Livingstone and Richard Rogers, Design for London covers the tumultuous and heady period of the first decade of this century when London was a test bed for new ideas. It outlines how key projects such as the London Olympics, public space programmes, high street regeneration and greening programmes were managed, critically examines the lessons that might be learnt in strategic urban design and considers how a design agenda for London could be developed in the future. By providing an engaging account of the strategic approaches and work of Design for London, and documenting the particular methodology and approach to urban theory it developed, Design for London will appeal to undergraduate and postgraduate students of planning, urban design and architecture, and to current practitioners from the public, private and community sectors who are struggling to achieve regeneration through poorly understood ‘placemaking’ concepts

    Drawing, Handwriting Processing Analysis: New Advances and Challenges

    No full text
    International audienceDrawing and handwriting are communicational skills that are fundamental in geopolitical, ideological and technological evolutions of all time. drawingand handwriting are still useful in defining innovative applications in numerous fields. In this regard, researchers have to solve new problems like those related to the manner in which drawing and handwriting become an efficient way to command various connected objects; or to validate graphomotor skills as evident and objective sources of data useful in the study of human beings, their capabilities and their limits from birth to decline

    Predicting the Need for Urgent Instructor Intervention in MOOC Environments

    Get PDF
    In recent years, massive open online courses (MOOCs) have become universal knowledge resources and arguably one of the most exciting innovations in e-learning environments. MOOC platforms comprise numerous courses covering a wide range of subjects and domains. Thousands of learners around the world enrol on these online platforms to satisfy their learning needs (mostly) free of charge. However, the retention rates of MOOC courses (i.e., those who successfully complete a course of study) are low (around 10% on average); dropout rates tend to be very high (around 90%). The principal channel via which MOOC learners can communicate their difficulties with the learning content and ask for assistance from instructors is by posting in a dedicated MOOC forum. Importantly, in the case of learners who are suffering from burnout or stress, some of these posts require urgent intervention. Given the above, urgent instructor intervention regarding learner requests for assistance via posts made on MOOC forums has become an important topic for research among researchers. Timely intervention by MOOC instructors may mitigate dropout issues and make the difference between a learner dropping out or staying on a course. However, due to the typically extremely high learner-to-instructor ratio in MOOCs and the often-huge numbers of posts on forums, while truly urgent posts are rare, managing them can be very challenging –– if not sometimes impossible. Instructors can find it challenging to monitor all existing posts and identify which posts require immediate intervention to help learners, encourage retention, and reduce the current high dropout rates. The main objective of this research project, therefore, was thus to mine and analyse learners’ MOOC posts as a fundamental step towards understanding their need for instructor intervention. To achieve this, the researcher proposed and built comprehensive classification models to predict the need for instructor intervention. The ultimate goal is to help instructors by guiding them to posts, topics, and learners that require immediate interventions. Given the above research aim the researcher conducted different experiments to fill the gap in literature based on different platform datasets (the FutureLearn platform and the Stanford MOOCPosts dataset) in terms of the former, three MOOC corpora were prepared: two of them gold-standard MOOC corpora to identify urgent posts, annotated by selected experts in the field; the third is a corpus detailing learner dropout. Based in these datasets, different architectures and classification models based on traditional machine learning, and deep learning approaches were proposed. In this thesis, the task of determining the need for instructor intervention was tackled from three perspectives: (i) identifying relevant posts, (ii) identifying relevant topics, and (iii) identifying relevant learners. Posts written by learners were classified into two categories: (i) (urgent) intervention and (ii) (non-urgent) intervention. Also, learners were classified into: (i) requiring instructor intervention (at risk of dropout) and (ii) no need for instructor intervention (completer). In identifying posts, two experiments were used to contribute to this field. The first is a novel classifier based on a deep learning model that integrates novel MOOC post dimensions such as numerical data in addition to textual data; this represents a novel contribution to the literature as all available models at the time of writing were based on text-only. The results demonstrate that the combined, multidimensional features model proposed in this project is more effective than the text-only model. The second contribution relates to creating various simple and hybrid deep learning models by applying plug & play techniques with different types of inputs (word-based or word-character-based) and different ways of representing target input words as vector representations of a particular word. According to the experimental findings, employing Bidirectional Encoder Representations from Transformers (BERT) for word embedding rather than word2vec as the former is more effective at the intervention task than the latter across all models. Interestingly, adding word-character inputs with BERT does not improve performance as it does for word2vec. Additionally, on the task of identifying topics, this is the first time in the literature that specific language terms to identify the need for urgent intervention in MOOCs were obtained. This was achieved by analysing learner MOOC posts using latent Dirichlet allocation (LDA) and offers a visualisation tool for instructors or learners that may assist them and improve instructor intervention. In addition, this thesis contributes to the literature by creating mechanisms for identifying MOOC learners who may need instructor intervention in a new context, i.e., by using their historical online forum posts as a multi-input approach for other deep learning architectures and Transformer models. The findings demonstrate that using the Transformer model is more effective at identifying MOOC learners who require instructor intervention. Next, the thesis sought to expand its methodology to identify posts that relate to learner behaviour, which is also a novel contribution, by proposing a novel priority model to identify the urgency of intervention building based on learner histories. This model can classify learners into three groups: low risk, mid risk, and high risk. The results show that the completion rates of high-risk learners are very low, which confirms the importance of this model. Next, as MOOC data in terms of urgent posts tend to be highly unbalanced, the thesis contributes by examining various data balancing methods to spot situations in which MOOC posts urgently require instructor assistance. This included developing learner and instructor models to assist instructors to respond to urgent MOOCs posts. The results show that models with undersampling can predict the most urgent cases; 3x augmentation + undersampling usually attains the best performance. Finally, for the first time, this thesis contributes to the literature by applying text classification explainability (eXplainable Artificial Intelligence (XAI)) to an instructor intervention model, demonstrating how using a reliable predictor in combination with XAI and colour-coded visualisation could be utilised to assist instructors in deciding when posts require urgent intervention, as well as supporting annotators to create high-quality, gold-standard datasets to determine posts cases where urgent intervention is required
    • …
    corecore