618 research outputs found

    Automatic Summarization of Customer Reviews: An Integrated Approach

    Get PDF
    The proliferation of interactivity between Web content producers and consumers underscores the development of the Internet in recent years. In particular, customer reviews posted on the Web have grown significantly. Because customers represent the primary stakeholder group of a company, understanding customers’ concerns expressed in these reviews could help marketers and business analysts to identify market trends and to provide better products and services. However, the large volume of textual reviews written in informal language makes it difficult to understand customers’ concerns. This paper describes an integrated approach to summarizing customer reviews. The approach consists of the steps of sentence extraction, aspect identification, sentiment classification, and review summarization. We report preliminary results of using our approach to summarize product reviews extracted from Amazon.com. Our work augments existing work by considering nonstandard input and by incorporating linguistic resources and clustering in automatic summarization

    The Web as an Adaptive Network: Coevolution of Web Behavior and Web Structure

    No full text
    Much is known about the complex network structure of the Web, and about behavioral dynamics on the Web. A number of studies address how behaviors on the Web are affected by different network topologies, whilst others address how the behavior of users on the Web alters network topology. These represent complementary directions of influence, but they are generally not combined within any one study. In network science, the study of the coupled interaction between topology and behavior, or state-topology coevolution, is known as 'adaptive networks', and is a rapidly developing area of research. In this paper, we review the case for considering the Web as an adaptive network and several examples of state-topology coevolution on the Web. We also review some abstract results from recent literature in adaptive networks and discuss their implications for Web Science. We conclude that adaptive networks provide a formal framework for characterizing processes acting 'on' and 'of' the Web, and offers potential for identifying general organizing principles that seem otherwise illusive in Web Scienc

    Event detection from click-through data via query clustering

    Get PDF
    The web is an index of real-world events and lot of knowledge can be mined from the web resources and their derivatives. Event detection is one recent research topic triggered from the domain of web data mining with the increasing popularity of search engines. In the visitor-centric approach, the click-through data generated by the web search engines is the start up resource with the intuition: often such data is event-driven. In this thesis, a retrospective algorithm is proposed to detect such real-world events from the click-through data. This approach differs from the existing work as it: (i) considers the click-through data as collaborative query sessions instead of mere web logs and try to understand user behavior (ii) tries to integrate the semantics, structure, and content of queries and pages (iii) aims to achieve the overall objective via Query Clustering. The problem of event detection is transformed into query clustering by generating clusters - hybrid cover graphs; each hybrid cover graph corresponds to a real-world event. The evolutionary pattern for the co-occurrences of query-page pairs in a hybrid cover graph is imposed for the quality purpose over a moving window period. Also, the approach is experimentally evaluated on a commercial search engine\u27s data collected over 3 months with about 20 million web queries and page clicks from 650000 users. The results outperform the most recent work in this domain in terms of number of events detected, F-measures, entropy, recall etc. --Abstract, page iv

    Viral Marketing for Smart Cities: Influencers in Social Network Communities

    Get PDF
    Social networks are used by cities primarily for announcing local-area events, but also for increasing engagement of citizens in votes and elections. Given the current plethora of heterogeneous social networks, city administrators can benefit from social networks to promote initiatives, which are important to a current smart city as well use them to discover future needs in order to manage resources more efficiently. Our focus in this paper is how we can adapt commercial and viral marketing techniques to smart city systems to influence the behavior, opinion and choices of citizens in order to improve their well being and that of the whole society as well as predicting future trends and events

    Preference rules for label ranking: Mining patterns in multi-target relations

    Get PDF
    In this paper, we investigate two variants of association rules for preference data, Label Ranking Association Rules and Pairwise Association Rules. Label Ranking Association Rules (LRAR) are the equivalent of Class Association Rules (CAR) for the Label Ranking task. In CAR, the consequent is a single class, to which the example is expected to belong to. In LRAR, the consequent is a ranking of the labels. The generation of LRAR requires special support and confidence measures to assess the similarity of rankings. In this work, we carry out a sensitivity analysis of these similarity-based measures. We want to understand which datasets benefit more from such measures and which parameters have more influence in the accuracy of the model. Furthermore, we propose an alternative type of rules, the Pairwise Association Rules (PAR), which are defined as association rules with a set of pairwise preferences in the consequent. While PAR can be used both as descriptive and predictive models, they are essentially descriptive models. Experimental results show the potential of both approaches.This research has received funding from the ECSEL Joint Undertaking, the framework programme for research and innovation horizon 2020 (2014-2020) under grant agreement number 662189-MANTIS-2014-1, and by National Funds through the FCT — Fundação para a Ciência e a Tecnologia (Portuguese Foundation for Science and Technology) as part of project UID/EEA/50014/2013

    From Theory to Practice: A Data Quality Framework for Classification Tasks

    Get PDF
    The data preprocessing is an essential step in knowledge discovery projects. The experts affirm that preprocessing tasks take between 50% to 70% of the total time of the knowledge discovery process. In this sense, several authors consider the data cleaning as one of the most cumbersome and critical tasks. Failure to provide high data quality in the preprocessing stage will significantly reduce the accuracy of any data analytic project. In this paper, we propose a framework to address the data quality issues in classification tasks DQF4CT. Our approach is composed of: (i) a conceptual framework to provide the user guidance on how to deal with data problems in classification tasks; and (ii) an ontology that represents the knowledge in data cleaning and suggests the proper data cleaning approaches. We presented two case studies through real datasets: physical activity monitoring (PAM) and occupancy detection of an office room (OD). With the aim of evaluating our proposal, the cleaned datasets by DQF4CT were used to train the same algorithms used in classification tasks by the authors of PAM and OD. Additionally, we evaluated DQF4CT through datasets of the Repository of Machine Learning Databases of the University of California, Irvine (UCI). In addition, 84% of the results achieved by the models of the datasets cleaned by DQF4CT are better than the models of the datasets authors.This work has also been supported by: Project: “Red de formación de talento humano para la innovación social y productiva en el Departamento del Cauca InnovAcción Cauca”. Convocatoria 03-2018 Publicación de artículos en revistas de alto impacto. Project: “Alternativas Innovadoras de Agricultura Inteligente para sistemas productivos agrícolas del departamento del Cauca soportado en entornos de IoT - ID 4633” financed by Convocatoria 04C–2018 “Banco de Proyectos Conjuntos UEES-Sostenibilidad” of Project “Red de formación de talento humano para la innovación social y productiva en el Departamento del Cauca InnovAcción Cauca”. Spanish Ministry of Economy, Industry and Competitiveness (Projects TRA2015-63708-R and TRA2016-78886-C3-1-R)
    • …
    corecore