3,349 research outputs found

    KACST Arabic Text Classification Project: Overview and Preliminary Results

    No full text
    Electronically formatted Arabic free-texts can be found in abundance these days on the World Wide Web, often linked to commercial enterprises and/or government organizations. Vast tracts of knowledge and relations lie hidden within these texts, knowledge that can be exploited once the correct intelligent tools have been identified and applied. For example, text mining may help with text classification and categorization. Text classification aims to automatically assign text to a predefined category based on identifiable linguistic features. Such a process has different useful applications including, but not restricted to, E-Mail spam detection, web pages content filtering, and automatic message routing. In this paper an overview of King Abdulaziz City for Science and Technology (KACST) Arabic Text Classification Project will be illustrated along with some preliminary results. This project will contribute to the better understanding and elaboration of Arabic text classification techniques

    Site Selection Using Geo-Social Media: A Study For Eateries In Lisbon

    Get PDF
    Dissertation submitted in partial fulfilment of the requirements for the Degree of Master of Science in Geospatial TechnologiesThe rise in the influx of multicultural societies, studentification, and overall population growth has positively impacted the local economy of eateries in Lisbon, Portugal. However, this has also increased retail competition, especially in tourism. The overall increase in multicultural societies has also led to an increase in multiple smaller hotspots of human-urban attraction, making the concept of just one downtown in the city a little vague. These transformations of urban cities pose a big challenge for upcoming retail and eateries store owners in finding the most optimal location to set up their shops. An optimal site selection strategy should recommend new locations that can maximize the revenues of a business. Unfortunately, with dynamically changing human-urban interactions, traditional methods like relying on census data or surveys to understand neighborhoods and their impact on businesses are no more reliable or scalable. This study aims to address this gap by using geo-social data extracted from social media platforms like Twitter, Flickr, Instagram, and Google Maps, which then acts as a proxy to the real population. Seven variables are engineered at a neighborhood level using this data: business interest, age, gender, spatial competition, spatial proximity to stores, homogeneous neighborhoods, and percentage of the native population. A Random Forest based binary classification method is then used to predict whether a Point of Interest (POI) can be a part of any neighborhood n. The results show that using only these 7 variables, an F1-Score of 83% can be achieved in classifying whether a neighborhood is good for an “eateries” POI. The methodology used in this research is made to work with open data and be generic and reproducible to any city worldwide

    Exploiting and Ranking Dominating Product Features through Communal Sentiments

    Get PDF
    The rapidly expanding e-commerce has facilitated consumers to purchase products online. Various brands and millions of products have been offered online. Varieties of customers’ reviews are available now days in internet. These reviews are important for the consumers as well as the merchants. Most of the reviews are disorganized so it generates difficulty for usefulness of information. In this paper we are proposing a product feature ranking framework, which will identify important features of products from online customer opinions, and aim to improve the usability of the different reviews. The important product features are recognized using two observations 1) the important features are mostly commented on by a large number of users 2) users reviews on the important features are greatly influence on the overall reviews on the product. We first identify product features by shallow dependency parser and determine customer’s reviews on these features via a sentiment classifier. Then we adopt develop a probabilistic feature ranking algorithm to conclude the importance of features by considering frequency and the influence of the influence of the users reviews given to each feature over their overall reviews. DOI: 10.17762/ijritcc2321-8169.15068

    Deducing and Ordering Most-influencing Product Features through Well-established Sentiments using NLP

    Get PDF
    The quickly extending e-commerce has encouraged shoppers to buy items on the web. Different brands and a huge number of items have been offered on the web. Mixtures of clients' reviews are accessible now days on web. These free audits cum reviews are imperative for the buyers and additionally the shippers/merchants. The greater parts of the reviews are disorganized leading to ambiguity in helpfulness of data. In this paper we are proposing a product feature ranking framework, which will distinguish important features cum aspects of products from online customer reviews, and aim to enhance usability of the these reviews. The important aspects or features of product can be usually distinguished using two interpretations 1) the critical aspects are generally remarked by larger audience 2) customers reviews on the key aspects- significantly influence on the overall reviews on the product. Firstly we distinguish product aspects by shallow dependency parser and conclude client's surveys on these elements by means of a sentiment classifier. Then we suggest probabilistic feature detection and ordering them by their rank algorithm to finish up the significance of features by considering recurrence and the impact of customers opinions given to every feature over their entire reviews. DOI: 10.17762/ijritcc2321-8169.150711

    Deriving the Pricing Power of Product Features by Mining Consumer Reviews

    Get PDF
    The increasing pervasiveness of the Internet has dramatically changed the way that consumers shop for goods. Consumer-generated product reviews have become a valuable source of information for customers, who read the reviews and decide whether to buy the product based on the information provided. In this paper, we use techniques that decompose the reviews into segments that evaluate the individual characteristics of a product (e.g., image quality and battery life for a digital camera). Then, as a major contribution of this paper, we adapt methods from the econometrics literature, specifically the hedonic regression concept, to estimate: (a) the weight that customers place on each individual product feature, (b) the implicit evaluation score that customers assign to each feature, and (c) how these evaluations affect the revenue for a given product. Towards this goal, we develop a novel hybrid technique combining text mining and econometrics that models consumer product reviews as elements in a tensor product of feature and evaluation spaces. We then impute the quantitative impact of consumer reviews on product demand as a linear functional from this tensor product space. We demonstrate how to use a low-dimension approximation of this functional to significantly reduce the number of model parameters, while still providing good experimental results. We evaluate our technique using a data set from Amazon.com consisting of sales data and the related consumer reviews posted over a 15-month period for 242 products. Our experimental evaluation shows that we can extract actionable business intelligence from the data and better understand the customer preferences and actions. We also show that the textual portion of the reviews can improve product sales prediction compared to a baseline technique that simply relies on numeric data

    Implementation of a knowledge discovery and enhancement module from structured information gained from unstructured sources of information

    Get PDF
    Tese de mestrado integrado. Engenharia Informática e Computação. Faculdade de Engenharia. Universidade do Porto. 201
    • …
    corecore