Search CORE

114 research outputs found

Content-based genre classification of large texts

Author: Shahin Amr
Publication venue
Publication date: 01/05/2019
Field of study

The advent of Natural Language Processing (NLP) and deep learning allows us to achieve tasks that sounded impossible about 10 years ago, one of those tasks is genre classification for large text bodies. Movies, books, novels, and various other texts more often than not, belong to one or more genres, the purpose of this research is to classify those texts into their genres while also calculating the weighed presence of this genre in the aforementioned texts. Movies in particular are classified into genres mostly for marketing purposes, and with no indication on which genre is the most autocratic. In this thesis, we explore the possibility of using deep neural networks and NLP to classify movies using the contents of the movie script. We follow the philosophy that scenes makes movies and generate the final result based on the classification of each individual scene. the results were obtained by training Convolutional Neural Networks (ConvNet or CNN) and Hierarchical Attention Networks (HAN) and compare their performance to the de-facto architectures for NLP, namely Recurrent Neural Networks (RNN) and Attention Models. The results we got on the validation data-set are comparable to those obtained by similar research done mostly on sentiment analysis or rating predictions, the accuracy is about 85% which is an acceptable measure in the literature. We dedicated a part iii of our conclusion discussing how our models would perform on a larger dataset and what steps could be taken to increase the accuracy

Concordia University Research Repository

Multi-Modal Medical Imaging Analysis with Modern Neural Networks

Author: Liang Gongbo
Publication venue: UKnowledge
Publication date: 01/01/2020
Field of study

Medical imaging is an important non-invasive tool for diagnostic and treatment purposes in medical practice. However, interpreting medical images is a time consuming and challenging task. Computer-aided diagnosis (CAD) tools have been used in clinical practice to assist medical practitioners in medical imaging analysis since the 1990s. Most of the current generation of CADs are built on conventional computer vision techniques, such as manually defined feature descriptors. Deep convolutional neural networks (CNNs) provide robust end-to-end methods that can automatically learn feature representations. CNNs are a promising building block of next-generation CADs. However, applying CNNs to medical imaging analysis tasks is challenging. This dissertation addresses three major issues that obstruct utilizing modern deep neural networks on medical image analysis tasks---lack of domain knowledge in architecture design, lack of labeled data in model training, and lack of uncertainty estimation in deep neural networks. We evaluated the proposed methods on six large, clinically-relevant datasets. The result shows that the proposed methods can significantly improve the deep neural network performance on medical imaging analysis tasks

University of Kentucky

A multimodal deep learning approach for food tray recognition

Author: Peracaula Prat Joan
Publication venue
Publication date: 13/09/2020
Field of study

Treballs Finals de Grau d'Enginyeria Informàtica, Facultat de Matemàtiques, Universitat de Barcelona, Any: 2020, Director: Marc Bolaños i Petia Radeva[en] Food recognition, object detection and classification applied to the food domain, is the main topic of this work. We have studied the problem of recognising food instances in tray images of self-service restaurants and have proposed a novel multimodal deep learning approach. From images and daily menus, the model presented uses two state of the art models in object detection and classification and a multimodal neural network to make significantly refined predictions compared to the baseline object detection model, achieving a class weighted average F1-score of 0.862. An ensemble model built from the proposed and the baseline models, also presented in this work, improves the results achieving a class weighted average F1-score of 0.877

Diposit Digital de la Universitat de Barcelona

Αυτόματη ανίχνευση της πολικότητας της γνώμης στο Twitter

Author: Mandilara Eleni
Μανδηλαρά Ελένη
Publication venue
Publication date: 09/12/2015
Field of study

DSpace at NTUA

A history and theory of textual event detection and recognition

Author: Chen Yanping
Ding Zehua
Huang Ruizhang
Qin Yongbin
Shah Nazaraf
Zheng Qinghua
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 17/11/2020
Field of study

Coventry University Pure Portal

A systematic survey of online data mining technology intended for law enforcement

Author: Edwards Matthew
Rashid Awais
Rayson Paul
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2015
Field of study

As an increasing amount of crime takes on a digital aspect, law enforcement bodies must tackle an online environment generating huge volumes of data. With manual inspections becoming increasingly infeasible, law enforcement bodies are optimising online investigations through data-mining technologies. Such technologies must be well designed and rigorously grounded, yet no survey of the online data-mining literature exists which examines their techniques, applications and rigour. This article remedies this gap through a systematic mapping study describing online data-mining literature which visibly targets law enforcement applications, using evidence-based practices in survey making to produce a replicable analysis which can be methodologically examined for deficiencies

Lancaster E-Prints

Explore Bristol Research

Design and application of gene-pool optimal mixing evolutionary algorithms for genetic programming

Author: Virgolin M. (Marco)
Publication venue
Publication date: 08/06/2020
Field of study

CWI's Institutional Repository

Detecting Abnormal Behavior in Web Applications

Author: Chu Zi
Publication venue: W&M ScholarWorks
Publication date: 01/01/2012
Field of study

The rapid advance of web technologies has made the Web an essential part of our daily lives. However, network attacks have exploited vulnerabilities of web applications, and caused substantial damages to Internet users. Detecting network attacks is the first and important step in network security. A major branch in this area is anomaly detection. This dissertation concentrates on detecting abnormal behaviors in web applications by employing the following methodology. For a web application, we conduct a set of measurements to reveal the existence of abnormal behaviors in it. We observe the differences between normal and abnormal behaviors. By applying a variety of methods in information extraction, such as heuristics algorithms, machine learning, and information theory, we extract features useful for building a classification system to detect abnormal behaviors.;In particular, we have studied four detection problems in web security. The first is detecting unauthorized hotlinking behavior that plagues hosting servers on the Internet. We analyze a group of common hotlinking attacks and web resources targeted by them. Then we present an anti-hotlinking framework for protecting materials on hosting servers. The second problem is detecting aggressive behavior of automation on Twitter. Our work determines whether a Twitter user is human, bot or cyborg based on the degree of automation. We observe the differences among the three categories in terms of tweeting behavior, tweet content, and account properties. We propose a classification system that uses the combination of features extracted from an unknown user to determine the likelihood of being a human, bot or cyborg. Furthermore, we shift the detection perspective from automation to spam, and introduce the third problem, namely detecting social spam campaigns on Twitter. Evolved from individual spammers, spam campaigns manipulate and coordinate multiple accounts to spread spam on Twitter, and display some collective characteristics. We design an automatic classification system based on machine learning, and apply multiple features to classifying spam campaigns. Complementary to conventional spam detection methods, our work brings efficiency and robustness. Finally, we extend our detection research into the blogosphere to capture blog bots. In this problem, detecting the human presence is an effective defense against the automatic posting ability of blog bots. We introduce behavioral biometrics, mainly mouse and keyboard dynamics, to distinguish between human and bot. By passively monitoring user browsing activities, this detection method does not require any direct user participation, and improves the user experience

College of William & Mary: W&M Publish

Machine Learning Applied to Fall Prediction and Detection Using Wearable Sensors

Author: Joana Raquel Cerqueira da Silva
Publication venue
Publication date: 13/11/2020
Field of study

Repositório Aberto da Universidade do Porto

Extracting business performance signals from Twitter news

Author: Dongo Ibrahim
Publication venue: Mathematical and Computer Sciences
Publication date: 01/12/2021
Field of study

Social media and social networks underpin a revolution in communication between people, with the particular feature that much of that communication is open to all. This provides a massive pool of data that can be exploited by researchers for a wide variety of different applications. Data from Twitter is of particular interest in this sense, given its large global usage levels, and the availability of APIs and other tools that enable easy access to the publicly available stream of tweets. Owing to the wide public penetration of Twitter, many businesses make use of it to share their latest news, effectively using Twitter as a gateway to connect to end-users, consumers and/or investors. In this thesis, we focus on the potential for extracting information from Twitter that is relevant to the financial and competitiveness status of a business. We consider a collection of well-regarded Twitter accounts that are known for communicating recent business news, and we investigate the automated analysis of the stream of tweets from these sources, with a view to learning business-relevant information about specific companies. A key aspect of our approach is the idea of extracting specific areas of business performance: we explore three such areas: productivity, competitiveness, and industrial risk. We propose a two-step model which first classifies a tweet into one of these areas, and then assigns a sentiment value (on a positive/negative scale). The resulting sentiment values across specific aspects represent novel business indicators that could add significant value to the toolset used by business analysts. Our experiments are based on a new manually pre-classified data set (available from a URL provided). Additionally, we propose n-grams made from non-contiguous words as a novel feature to enhance performance in this context. Experiments involving a range of feature selection methods show that these new features provide valuable benefits in comparison with standard n-gram features. We also interduce the concept of an extra layer added to the primary classifier, with the role of filtering out noisy tweets before they enter the system. We use a One-Class SVM for this purpose. Broadly, we show that the methods developed in this thesis achieve promising results in both topic and sentiment classification in the business performance context, suggesting that twitter can indeed be a useful source of signals related to different aspects of business performance. We also find that our system can provide valuable insight into unseen test data. However, more research is needed to be able to extract robust signals for industrial risk, and there seems to be a considerable promise for further development

ROS: The Research Output Service. Heriot-Watt University Edinburgh