1,438 research outputs found

    Recent Advances in Social Data and Artificial Intelligence 2019

    Get PDF
    The importance and usefulness of subjects and topics involving social data and artificial intelligence are becoming widely recognized. This book contains invited review, expository, and original research articles dealing with, and presenting state-of-the-art accounts pf, the recent advances in the subjects of social data and artificial intelligence, and potentially their links to Cyberspace

    A text segmentation approach for automated annotation of online customer reviews, based on topic modeling

    Full text link
    Online customer review classification and analysis have been recognized as an important problem in many domains, such as business intelligence, marketing, and e-governance. To solve this problem, a variety of machine learning methods was developed in the past decade. Existing methods, however, either rely on human labeling or have high computing cost, or both. This makes them a poor fit to deal with dynamic and ever-growing collections of short but semantically noisy texts of customer reviews. In the present study, the problem of multi-topic online review clustering is addressed by generating high quality bronze-standard labeled sets for training efficient classifier models. A novel unsupervised algorithm is developed to break reviews into sequential semantically homogeneous segments. Segment data is then used to fine-tune a Latent Dirichlet Allocation (LDA) model obtained for the reviews, and to classify them along categories detected through topic modeling. After testing the segmentation algorithm on a benchmark text collection, it was successfully applied in a case study of tourism review classification. In all experiments conducted, the proposed approach produced results similar to or better than baseline methods. The paper critically discusses the main findings and paves ways for future work

    Competitive analysis of online reviews using exploratory text mining

    Get PDF
    Purpose – This paper explores the usefulness of analyzing text-based online reviews using text mining tools and visual analytics for SWOT Analysis, as applied to the hotel industry. These results can be used to develop competitive actions. Design – The text mining/visualization tool, ReviewMap, was used to transform an archive of reviews spanning multiple suppliers into a hierarchy of data of increasing dimensionality. Visual summaries at each level were integrated to propagate selections at one level throughout the rest of the hierarchy. These visual summaries identify features required for competition at a given level and features that currently discriminate amongst competitors. Methodology – The approach was exploratory, the objective of which was to determine if useable competitive intelligence could be found in a typical collection of online reviews for a set of competing hotels. A publically available collection of reviews was subjected to a set of text mining procedures and visual analyses in order to summarize the features and opinions expressed. Originality – Prior analyses of online reviews relied solely upon numeric “star” ratings. This study utilized text mining to uncover information within the written comments and applied the information in a SWOT Analysis of three competing hotels. Findings – In the set of reviews used in this paper, a common measure of analytical power almost doubled when text mining summaries of the written comments were used in combination with numeric ratings. Visual analytics revealed the dominant features for each hotel, the features required of all hotels competing at a given level, and the features that define specific positions within the competitive landscape. This analysis of strengths, weaknesses, opportunities and threats revealed several promising competitive actions for the hotels in the study

    “CAVIIAR FOR ALL” A CASE STUDY OF AN INNOVATIVE APPLICATION FOR CATERING, TOURISM AND CULTURE

    Get PDF
    Many people think that when we want something like a product or a service it comes from a financial point of view, but what really makes businesses sustainable and growth is creativity and innovation. This paper presents a real case study that exemplifies the notion of “idea to product” of an innovative application for the information and propagation of catering, tourism and culture (caviiar.pt). This is an uninterrupted service, which is oriented to give “real time” information about catering services and regional or nearby culture and touristic points of interest. It also allows the promotion of gastronomic or cultural events with information relevant to the idea of the application. This project intends to create a new catering, touristic and cultural notion with a high level of interaction with clients and their necessities or wants, bringing to daylight a new touristic concept: “online assessment tourism”.info:eu-repo/semantics/publishedVersio

    Analysis of pattern recognition and dimensionality reduction techniques for odor biometrics

    Full text link
    In this paper, we analyze the performance of several well-known pattern recognition and dimensionality reduction techniques when applied to mass-spectrometry data for odor biometric identification. Motivated by the successful results of previous works capturing the odor from other parts of the body, this work attempts to evaluate the feasibility of identifying people by the odor emanated from the hands. By formulating this task according to a machine learning scheme, the problem is identified with a small-sample-size supervised classification problem in which the input data is formed by mass spectrograms from the hand odor of 13 subjects captured in different sessions. The high dimensionality of the data makes it necessary to apply feature selection and extraction techniques together with a simple classifier in order to improve the generalization capabilities of the model. Our experimental results achieve recognition rates over 85% which reveals that there exists discriminatory information in the hand odor and points at body odor as a promising biometric identifier

    MULTI-LEVEL CITY PORTRAIT RESEARCH BASED ON MULTI-SOURCE DATA

    Get PDF
    City portrait is a social impression generated by the interaction between the public and the city, which can help us better understand and perceive the nature and characteristics of the city, and thus provide strong support for the development and governance of the city. However, most existing studies extract thematic semantic labels globally, but ignore the order of the tags and the degree of their contribution in the topic, which affects the city portrait extraction results. In addition, existing studies also lack the analysis of the impact of grid areas as the study scale on city portraits. In this paper, we propose a new approach to accurately identify city labels based on multi-source data grid fusion using a topic feature word extraction model (Weight-LdaVecNet) with fused topic word embedding and network structure analysis with feature word weight constraints. On this basis, we construct a multi-level city portrait description framework using hierarchical cluster analysis, extract tag clusters, and obtain a similarity matrix by combining topic feature tags and region feature tags using similarity analysis to construct a multi-level city region portrait, with a view to achieving a fine-grained construction of a multi-level city portrait. The experimental results show that, compared with the traditional LDA model, our method indicates that the identified city labels with similar thematic semantics have strong aggregation, thus proving the effectiveness of our proposed method. In addition, in the overall multi-level city portrait, we find that Beijing has a strong attractiveness in terms of cultural features. However, the regional distribution of cultural characteristics dimensions is not uniform in the multilevel city-region portrait, and better rational allocation and planning of cultural resources are needed to better meet people's needs

    Some Advances in Aspect Analysis of User-Generated Content

    Get PDF
    Starting from the online reviews associated with an overall rating, the aim is to propose a methodology for detecting the main aspects (or topics) of interest for users, and afterwards to estimate the aspect ratings latently assigned in each review jointly with the weight or emphasis put on each aspect

    Behaviour modelling with data obtained from the Internet and contributions to cluster validation

    Get PDF
    [EN]This PhD thesis makes contributions in modelling behaviours found in different types of data acquired from the Internet and in the field of clustering evaluation. Two different types of Internet data were processed, on the one hand, internet traffic with the objective of attack detection and on the other hand, web surfing activity with the objective of web personalization, both data being of sequential nature. To this aim, machine learning techniques were applied, mostly unsupervised techniques. Moreover, contributions were made in cluster evaluation, in order to make easier the selection of the best partition in clustering problems. With regard to network attack detection, first, gureKDDCup database was generated which adds payload data to KDDCup99 connection attributes because it is essential to detect non-flood attacks. Then, by modelling this data a network Intrusion Detection System (nIDS) was proposed where context-independent payload processing was done obtaining satisfying detection rates. In the web mining context web surfing activity was modelled for web personalization. In this context, generic and non-invasive systems to extract knowledge were proposed just using the information stored in webserver log files. Contributions were done in two senses: in problem detection and in link suggestion. In the first application a meaningful list of navigation attributes was proposed for each user session to group and detect different navigation profiles. In the latter, a general and non-invasive link suggestion system was proposed which was evaluated with satisfactory results in a link prediction context. With regard to the analysis of Cluster Validity Indices (CVI), the most extensive CVI comparison found up to a moment was carried out using a partition similarity measure based evaluation methodology. Moreover, we analysed the behaviour of CVIs in a real web mining application with elevated number of clusters in which they tend to be unstable. We proposed a procedure which automatically selects the best partition analysing the slope of different CVI values.[EU]Doktorego-tesi honek internetetik eskuratutako datu mota ezberdinetan aurkitutako portaeren modelugintzan eta multzokatzeen ebaluazioan egiten ditu bere ekarpenak. Zehazki, bi mota ezberdinetako interneteko datuak prozesatu dira: batetik, interneteko trafikoa, erasoak hautemateko helburuarekin; eta bestetik, web nabigazioen jarduera, weba pertsonalizatzeko helburuarekin; bi datu motak izaera sekuentzialekoak direlarik. Helburu hauek lortzeko, ikasketa automatikoko teknikak aplikatu dira, nagusiki gainbegiratu-gabeko teknikak. Testuinguru honetan, multzokatzeen partizio onenaren aukeraketak dakartzan arazoak gutxitzeko multzokatzeen ebaluazioan ere ekarpenak egin dira. Sareko erasoen hautemateari dagokionez, lehenik gureKDDCup datubasea eratu da KDDCup99-ko konexio atributuei payload-ak (sareko paketeen datu eremuak) gehituz, izan ere, ez-flood erasoak (pakete gutxi erabiltzen dituzten erasoak) hautemateko ezinbestekoak baitira. Ondoren, datu hauek modelatuz testuinguruarekiko independenteak diren payload prozesaketak oinarri dituen sareko erasoak hautemateko sistema (network Intrusion Detection System (nIDS)) bat proposatu da maila oneko eraso hautemate-tasak lortuz. Web meatzaritzaren testuinguruan, weba pertsonalizatzeko helburuarekin web nabigazioen jarduera modelatu da. Honetarako, web zerbizarietako lorratz fitxategietan metatutako informazioa soilik erabiliz ezagutza erabilgarria erauziko duen sistema orokor eta ez-inbasiboak proposatu dira. Ekarpenak bi zentzutan eginaz: arazoen hautematean eta esteken iradokitzean. Lehen aplikazioan sesioen nabigazioa adierazteko atributu esanguratsuen zerrenda bat proposatu da, gero nabigazioak multzokatu eta nabigazio profil ezberdinak hautemateko. Bigarren aplikazioan, estekak iradokitzeko sistema orokor eta ez-inbasibo bat proposatu da, eta berau, estekak aurresateko testuinguruan ebaluatu da emaitza onak lortuz. Multzokatzeak balioztatzeko indizeen (Cluster Validity Indices (CVI)) azterketari dagokionez, gaurdaino aurkitu den CVI-en konparaketa zabalena burutu da partizioen antzekotasun neurrian oinarritutako ebaluazio metodologia erabiliz. Gainera, CVI-en portaera aztertu da egiazko web meatzaritza aplikazio batean normalean baino multzo kopuru handiagoak dituena, non CVI-ek ezegonkorrak izateko joera baitute. Arazo honi aurre eginaz, CVI ezberdinek partizio ezberdinetarako lortzen dituzten balioen maldak aztertuz automatikoki partiziorik onena hautatzen duen prozedura proposatu da.[ES]Esta tesis doctoral hace contribuciones en el modelado de comportamientos encontrados en diferentes tipos de datos adquiridos desde internet y en el campo de la evaluación del clustering. Dos tipos de datos de internet han sido procesados: en primer lugar el tráfico de internet con el objetivo de detectar ataques; y en segundo lugar la actividad generada por los usuarios web con el objetivo de personalizar la web; siendo los dos tipos de datos de naturaleza secuencial. Para este fin, se han aplicado técnicas de aprendizaje automático, principalmente técnicas no-supervisadas. Además, se han hecho aportaciones en la evaluación de particiones de clusters para facilitar la selección de la mejor partición de clusters. Respecto a la detección de ataques en la red, primero, se generó la base de datos gureKDDCup que añade el payload (la parte de contenido de los paquetes de la red) a los atributos de la conexión de KDDCup99 porque el payload es esencial para la detección de ataques no-flood (ataques que utilizan pocos paquetes). Después, se propuso un sistema de detección de intrusos (network Intrusion Detection System (IDS)) modelando los datos de gureKDDCup donde se propusieron varios preprocesos del payload independientes del contexto obteniendo resultados satisfactorios. En el contexto de la minerı́a web, se ha modelado la actividad de la navegación web para la personalización web. En este contexto se propondrán sistemas genéricos y no-invasivos para la extracción del conocimiento, utilizando únicamente la información almacenada en los ficheros log de los servidores web. Se han hecho aportaciones en dos sentidos: en la detección de problemas y en la sugerencia de links. En la primera aplicación, se propuso una lista de atributos significativos para representar las sesiones de navegación web para después agruparlos y detectar diferentes perfiles de navegación. En la segunda aplicación, se propuso un sistema general y no-invasivo para sugerir links y se evaluó en el contexto de predicción de links con resultados satisfactorios. Respecto al análisis de ı́ndices de validación de clusters (Cluster Validity Indices (CVI)), se ha realizado la más amplia comparación encontrada hasta el momento que utiliza la metodologı́a de evaluación basada en medidas de similitud de particiones. Además, se ha analizado el comportamiento de los CVIs en una aplicación real de minerı́a web con un número elevado de clusters, contexto en el que los CVIs tienden a ser inestables, ası́ que se propuso un procedimiento para la selección automática de la mejor partición en base a la pendiente de los valores de diferentes CVIs.Grant of the Basque Government (ref.: BFI08.226); Grant of Ministry of Economy and Competitiveness of the Spanish Government (ref.: BES-2011-045989); Research stay grant of Spanish Ministry of Economy and Competitiveness (ref.: EEBB-I-14-08862); University of the Basque Country UPV/EHU (BAILab, grant UFI11/45); Department of Education, Universities and Research of the Basque Government (grant IT-395-10); Ministry of Economy and Competitiveness of the Spanish Government and by the European Regional Development Fund - ERDF (eGovernAbility, grant TIN2014-52665-C2-1-R)
    corecore