112 research outputs found

    Measuring the Interestingness of Articles in a Limited User Environment Prospectus

    Full text link

    Security and privacy recommendation of mobile app for Arabic speaking

    Get PDF
    There is an enormous number of mobile apps, leading users to be concerned about the security and privacy of their data. But few users are aware of what is meant by app permissions, which sometimes do not illustrate what kind of data is gathered. Therefore, users are still concerned about security risks and privacy, with little knowledge and experience of what security and privacy awareness. Users depend on ratings, which may be fake, or keep track of their sense to install an app, and an enormous number of users do not like to read reviews. To solve this issue, we propose a recommender system that reads users' reviews, and which exposes flaws, violations and third-party policies or the quality of a user's experience. In order to design and implement our recommender, we conduct a survey which supports two significant points: to detect the level of security and privacy awareness between users, and to gather new words into a dictionary of a recommender system, which assists to classify each review on the correct level, which can indeed reveal the scale of security and privacy in an app

    Hybrid Recommender Systems: A Systematic Literature Review

    Get PDF
    Recommender systems are software tools used to generate and provide suggestions for items and other entities to the users by exploiting various strategies. Hybrid recommender systems combine two or more recommendation strategies in different ways to benefit from their complementary advantages. This systematic literature review presents the state of the art in hybrid recommender systems of the last decade. It is the first quantitative review work completely focused in hybrid recommenders. We address the most relevant problems considered and present the associated data mining and recommendation techniques used to overcome them. We also explore the hybridization classes each hybrid recommender belongs to, the application domains, the evaluation process and proposed future research directions. Based on our findings, most of the studies combine collaborative filtering with another technique often in a weighted way. Also cold-start and data sparsity are the two traditional and top problems being addressed in 23 and 22 studies each, while movies and movie datasets are still widely used by most of the authors. As most of the studies are evaluated by comparisons with similar methods using accuracy metrics, providing more credible and user oriented evaluations remains a typical challenge. Besides this, newer challenges were also identified such as responding to the variation of user context, evolving user tastes or providing cross-domain recommendations. Being a hot topic, hybrid recommenders represent a good basis with which to respond accordingly by exploring newer opportunities such as contextualizing recommendations, involving parallel hybrid algorithms, processing larger datasets, etc

    Information extraction from social media for route planning

    Get PDF
    Micro-blogging is an emerging form of communication and became very popular in recent years. Micro-blogging services allow users to publish updates as short text messages that are broadcast to the followers of users in real-time. Twitter is currently the most popular micro-blogging service. It is a rich and real-time information source and a good way to discover interesting content or to follow recent developments. Additionally, the updates published on Twitter public timeline can be retrieved through their API. A significant amount of traffic information exists on Twitter platform. Twitter users tweet when they are in traffic about accidents, road closures or road construction. With this in mind, this paper presents a system that extracts traffic information from Twitter to be used in route planning. Route planning is of increasing importance as societies try to reduce their energy consumption. Furthermore, route planning is concerned with two types of constraints: stable, such as distance between two points and temporary such as weather conditions, traffic jams or road construction. Our system attempt to extract these temporary constraints from Twitter. We train Naive bayes, Maxent and SVM classifiers to filter non relevant traffic. We then apply NER on traffic tweets to extract locations, highwaysand directions. These extracted locations are then geocoded and used in route planning to avoid routes with traffic jams

    Automatic classification of news stories – A machine learning approach

    Get PDF
    Thesis submitted to the Department of Computer Science, Ashesi University College, in partial fulfillment of Bachelor of Science degree in Computer Science, April 2016Humans are good at classifying things because our brains are adept at understanding contextual nuances. Machines, however, need to be fed the right features to achieve reasonably good levels of classification. Classifying text manually is a time-consuming and expensive process especially in the information age where a combination of the success of cloud computing, big data and the resurgent trend of the internet of things as well as unprecedented population growth have led to an explosion in the amount of data that we have to deal with – approximately 2.5 quintillion bytes every 24 hours (Walker, 2015). This Thesis explores the efficiency of two well-known machine learning classification algorithms; Naïve Bayes and Support Vector Machines in classifying news stories - an important subset of the global repositories of information. The findings in this study report that using machine learning to classify news stories is not easy but is feasible and if done properly can yield accuracy rates of at least 70%. These results translate into significant time savings that cannot be achieved by manual classification and are a precursor to other machine learning techniques such as recommendation, clustering and sentiment analysis.Ashesi University Colleg

    Text-based Sentiment Analysis and Music Emotion Recognition

    Get PDF
    Nowadays, with the expansion of social media, large amounts of user-generated texts like tweets, blog posts or product reviews are shared online. Sentiment polarity analysis of such texts has become highly attractive and is utilized in recommender systems, market predictions, business intelligence and more. We also witness deep learning techniques becoming top performers on those types of tasks. There are however several problems that need to be solved for efficient use of deep neural networks on text mining and text polarity analysis. First of all, deep neural networks are data hungry. They need to be fed with datasets that are big in size, cleaned and preprocessed as well as properly labeled. Second, the modern natural language processing concept of word embeddings as a dense and distributed text feature representation solves sparsity and dimensionality problems of the traditional bag-of-words model. Still, there are various uncertainties regarding the use of word vectors: should they be generated from the same dataset that is used to train the model or it is better to source them from big and popular collections that work as generic text feature representations? Third, it is not easy for practitioners to find a simple and highly effective deep learning setup for various document lengths and types. Recurrent neural networks are weak with longer texts and optimal convolution-pooling combinations are not easily conceived. It is thus convenient to have generic neural network architectures that are effective and can adapt to various texts, encapsulating much of design complexity. This thesis addresses the above problems to provide methodological and practical insights for utilizing neural networks on sentiment analysis of texts and achieving state of the art results. Regarding the first problem, the effectiveness of various crowdsourcing alternatives is explored and two medium-sized and emotion-labeled song datasets are created utilizing social tags. One of the research interests of Telecom Italia was the exploration of relations between music emotional stimulation and driving style. Consequently, a context-aware music recommender system that aims to enhance driving comfort and safety was also designed. To address the second problem, a series of experiments with large text collections of various contents and domains were conducted. Word embeddings of different parameters were exercised and results revealed that their quality is influenced (mostly but not only) by the size of texts they were created from. When working with small text datasets, it is thus important to source word features from popular and generic word embedding collections. Regarding the third problem, a series of experiments involving convolutional and max-pooling neural layers were conducted. Various patterns relating text properties and network parameters with optimal classification accuracy were observed. Combining convolutions of words, bigrams, and trigrams with regional max-pooling layers in a couple of stacks produced the best results. The derived architecture achieves competitive performance on sentiment polarity analysis of movie, business and product reviews. Given that labeled data are becoming the bottleneck of the current deep learning systems, a future research direction could be the exploration of various data programming possibilities for constructing even bigger labeled datasets. Investigation of feature-level or decision-level ensemble techniques in the context of deep neural networks could also be fruitful. Different feature types do usually represent complementary characteristics of data. Combining word embedding and traditional text features or utilizing recurrent networks on document splits and then aggregating the predictions could further increase prediction accuracy of such models

    Stock price change prediction using news text mining

    Get PDF
    Along with the advent of the Internet as a new way of propagating news in a digital format, came the need to understand and transform this data into information. This work presents a computational framework that aims to predict the changes of stock prices along the day, given the occurrence of news articles related to the companies listed in the Down Jones Index. For this task, an automated process that gathers, cleans, labels, classifies, and simulates investments was developed. This process integrates the existing data mining and text algorithms, with the proposal of new techniques of alignment between news articles and stock prices, pre-processing, and classifier ensemble. The result of experiments in terms of classification measures and the Cumulative Return obtained through investment simulation outperformed the other results found after an extensive review in the related literature. This work also argues that the classification measure of Accuracy and incorrect use of cross validation technique have too few to contribute in terms of investment recommendation for financial market. Altogether, the developed methodology and results contribute with the state of art in this emerging research field, demonstrating that the correct use of text mining techniques is an applicable alternative to predict stock price movements in the financial market.Com o advento da Internet como um meio de propagação de notícias em formato digital, veio a necessidade de entender e transformar esses dados em informação. Este trabalho tem como objetivo apresentar um processo computacional para predição de preços de ações ao longo do dia, dada a ocorrência de notícias relacionadas às companhias listadas no índice Down Jones. Para esta tarefa, um processo automatizado que coleta, limpa, rotula, classifica e simula investimentos foi desenvolvido. Este processo integra algoritmos de mineração de dados e textos já existentes, com novas técnicas de alinhamento entre notícias e preços de ações, pré-processamento, e assembleia de classificadores. Os resultados dos experimentos em termos de medidas de classificação e o retorno acumulado obtido através de simulação de investimentos foram maiores do que outros resultados encontrados após uma extensa revisão da literatura. Este trabalho também discute que a acurácia como medida de classificação, e a incorreta utilização da técnica de validação cruzada, têm muito pouco a contribuir em termos de recomendação de investimentos no mercado financeiro. Ao todo, a metodologia desenvolvida e resultados contribuem com o estado da arte nesta área de pesquisa emergente, demonstrando que o uso correto de técnicas de mineração de dados e texto é uma alternativa aplicável para a predição de movimentos no mercado financeiro

    Semantic enrichment of knowledge sources supported by domain ontologies

    Get PDF
    This thesis introduces a novel conceptual framework to support the creation of knowledge representations based on enriched Semantic Vectors, using the classical vector space model approach extended with ontological support. One of the primary research challenges addressed here relates to the process of formalization and representation of document contents, where most existing approaches are limited and only take into account the explicit, word-based information in the document. This research explores how traditional knowledge representations can be enriched through incorporation of implicit information derived from the complex relationships (semantic associations) modelled by domain ontologies with the addition of information presented in documents. The relevant achievements pursued by this thesis are the following: (i) conceptualization of a model that enables the semantic enrichment of knowledge sources supported by domain experts; (ii) development of a method for extending the traditional vector space, using domain ontologies; (iii) development of a method to support ontology learning, based on the discovery of new ontological relations expressed in non-structured information sources; (iv) development of a process to evaluate the semantic enrichment; (v) implementation of a proof-of-concept, named SENSE (Semantic Enrichment kNowledge SourcEs), which enables to validate the ideas established under the scope of this thesis; (vi) publication of several scientific articles and the support to 4 master dissertations carried out by the department of Electrical and Computer Engineering from FCT/UNL. It is worth mentioning that the work developed under the semantic referential covered by this thesis has reused relevant achievements within the scope of research European projects, in order to address approaches which are considered scientifically sound and coherent and avoid “reinventing the wheel”.European research projects - CoSpaces (IST-5-034245), CRESCENDO (FP7-234344) and MobiS (FP7-318452

    A review of the state of the art in Machine Learning on the Semantic Web: Technical Report CSTR-05-003

    Get PDF
    corecore