19,989 research outputs found

    DocTag2Vec: An Embedding Based Multi-label Learning Approach for Document Tagging

    Full text link
    Tagging news articles or blog posts with relevant tags from a collection of predefined ones is coined as document tagging in this work. Accurate tagging of articles can benefit several downstream applications such as recommendation and search. In this work, we propose a novel yet simple approach called DocTag2Vec to accomplish this task. We substantially extend Word2Vec and Doc2Vec---two popular models for learning distributed representation of words and documents. In DocTag2Vec, we simultaneously learn the representation of words, documents, and tags in a joint vector space during training, and employ the simple kk-nearest neighbor search to predict tags for unseen documents. In contrast to previous multi-label learning methods, DocTag2Vec directly deals with raw text instead of provided feature vector, and in addition, enjoys advantages like the learning of tag representation, and the ability of handling newly created tags. To demonstrate the effectiveness of our approach, we conduct experiments on several datasets and show promising results against state-of-the-art methods.Comment: 10 page

    Business Ontology for Evaluating Corporate Social Responsibility

    Get PDF
    This paper presents a software solution that is developed to automatically classify companies by taking into account their level of social responsibility. The application is based on ontologies and on intelligent agents. In order to obtain the data needed to evaluate companies, we developed a web crawling module that analyzes the company’s website and the documents that are available online such as social responsibility report, mission statement, employment structure, etc. Based on a predefined CSR ontology, the web crawling module extracts the terms that are linked to corporate social responsibility. By taking into account the extracted qualitative data, an intelligent agent, previously trained on a set of companies, computes the qualitative values, which are then included in the classification model based on neural networks. The proposed ontology takes into consideration the guidelines proposed by the “ISO 26000 Standard for Social Responsibility”. Having this model, and being aware of the positive relationship between Corporate Social Responsibility and financial performance, an overall perspective on each company’s activity can be configured, this being useful not only to the company’s creditors, auditors, stockholders, but also to its consumers.corporate social responsibility, ISO 26000 Standard for Social Responsibility, ontology, web crawling, intelligent agent, corporate performance, POS tagging, opinion mining, sentiment analysis

    Outsourcing labour to the cloud

    Get PDF
    Various forms of open sourcing to the online population are establishing themselves as cheap, effective methods of getting work done. These have revolutionised the traditional methods for innovation and have contributed to the enrichment of the concept of 'open innovation'. To date, the literature concerning this emerging topic has been spread across a diverse number of media, disciplines and academic journals. This paper attempts for the first time to survey the emerging phenomenon of open outsourcing of work to the internet using 'cloud computing'. The paper describes the volunteer origins and recent commercialisation of this business service. It then surveys the current platforms, applications and academic literature. Based on this, a generic classification for crowdsourcing tasks and a number of performance metrics are proposed. After discussing strengths and limitations, the paper concludes with an agenda for academic research in this new area

    On content-based recommendation and user privacy in social-tagging systems

    Get PDF
    Recommendation systems and content filtering approaches based on annotations and ratings, essentially rely on users expressing their preferences and interests through their actions, in order to provide personalised content. This activity, in which users engage collectively has been named social tagging, and it is one of the most popular in which users engage online, and although it has opened new possibilities for application interoperability on the semantic web, it is also posing new privacy threats. It, in fact, consists of describing online or offline resources by using free-text labels (i.e. tags), therefore exposing the user profile and activity to privacy attacks. Users, as a result, may wish to adopt a privacy-enhancing strategy in order not to reveal their interests completely. Tag forgery is a privacy enhancing technology consisting of generating tags for categories or resources that do not reflect the user's actual preferences. By modifying their profile, tag forgery may have a negative impact on the quality of the recommendation system, thus protecting user privacy to a certain extent but at the expenses of utility loss. The impact of tag forgery on content-based recommendation is, therefore, investigated in a real-world application scenario where different forgery strategies are evaluated, and the consequent loss in utility is measured and compared.Peer ReviewedPostprint (author’s final draft

    CHORUS Deliverable 4.5: Report of the 3rd CHORUS Conference

    Get PDF
    The third and last CHORUS conference on Multimedia Search Engines took place from the 26th to the 27th of May 2009 in Brussels, Belgium. About 100 participants from 15 European countries, the US, Japan and Australia learned about the latest developments in the domain. An exhibition of 13 stands presented 16 research projects currently ongoing around the world

    Social media analytics: a survey of techniques, tools and platforms

    Get PDF
    This paper is written for (social science) researchers seeking to analyze the wealth of social media now available. It presents a comprehensive review of software tools for social networking media, wikis, really simple syndication feeds, blogs, newsgroups, chat and news feeds. For completeness, it also includes introductions to social media scraping, storage, data cleaning and sentiment analysis. Although principally a review, the paper also provides a methodology and a critique of social media tools. Analyzing social media, in particular Twitter feeds for sentiment analysis, has become a major research and business activity due to the availability of web-based application programming interfaces (APIs) provided by Twitter, Facebook and News services. This has led to an ‘explosion’ of data services, software tools for scraping and analysis and social media analytics platforms. It is also a research area undergoing rapid change and evolution due to commercial pressures and the potential for using social media data for computational (social science) research. Using a simple taxonomy, this paper provides a review of leading software tools and how to use them to scrape, cleanse and analyze the spectrum of social media. In addition, it discussed the requirement of an experimental computational environment for social media research and presents as an illustration the system architecture of a social media (analytics) platform built by University College London. The principal contribution of this paper is to provide an overview (including code fragments) for scientists seeking to utilize social media scraping and analytics either in their research or business. The data retrieval techniques that are presented in this paper are valid at the time of writing this paper (June 2014), but they are subject to change since social media data scraping APIs are rapidly changing
    • 

    corecore