246 research outputs found

    Towards Conceptual Indexing of the Blogosphere through Wikipedia Topic Hierarchy

    Get PDF
    PACLIC 23 / City University of Hong Kong / 3-5 December 200

    Exploring classification as conversation

    Get PDF
    Conversations are proposed as a useful lens through which to consider knowledge-organizing behaviors. Human conversations are sites of knowledge creation, where participants communicate to establish meaning that is contextual and shared. The conversations generated in collaborative online environments offer new opportunities to observe, not only how knowledge is created, but also how users participate in various knowledge-organizing activities. In a Web environment pervaded by conversational forms – social classification systems, blogs, and wikis – participatory knowledge organization is an emerging phenomenon that warrants further exploration. Other areas for research are suggested, including the potential promise to leverage participatory knowledge organization into future applications and developments of Web functionality

    Combining granularity-based topic-dependent and topic-independent evidences for opinion detection

    Get PDF
    Fouille des opinion, une sous-discipline dans la recherche d'information (IR) et la linguistique computationnelle, fait rĂ©fĂ©rence aux techniques de calcul pour l'extraction, la classification, la comprĂ©hension et l'Ă©valuation des opinions exprimĂ©es par diverses sources de nouvelles en ligne, social commentaires des mĂ©dias, et tout autre contenu gĂ©nĂ©rĂ© par l'utilisateur. Il est Ă©galement connu par de nombreux autres termes comme trouver l'opinion, la dĂ©tection d'opinion, l'analyse des sentiments, la classification sentiment, de dĂ©tection de polaritĂ©, etc. DĂ©finition dans le contexte plus spĂ©cifique et plus simple, fouille des opinion est la tĂąche de rĂ©cupĂ©ration des opinions contre son besoin aussi exprimĂ© par l'utilisateur sous la forme d'une requĂȘte. Il y a de nombreux problĂšmes et dĂ©fis liĂ©s Ă  l'activitĂ© fouille des opinion. Dans cette thĂšse, nous nous concentrons sur quelques problĂšmes d'analyse d'opinion. L'un des dĂ©fis majeurs de fouille des opinion est de trouver des opinions concernant spĂ©cifiquement le sujet donnĂ© (requĂȘte). Un document peut contenir des informations sur de nombreux sujets Ă  la fois et il est possible qu'elle contienne opiniĂątre texte sur chacun des sujet ou sur seulement quelques-uns. Par consĂ©quent, il devient trĂšs important de choisir les segments du document pertinentes Ă  sujet avec leurs opinions correspondantes. Nous abordons ce problĂšme sur deux niveaux de granularitĂ©, des phrases et des passages. Dans notre premiĂšre approche de niveau de phrase, nous utilisons des relations sĂ©mantiques de WordNet pour trouver cette association entre sujet et opinion. Dans notre deuxiĂšme approche pour le niveau de passage, nous utilisons plus robuste modĂšle de RI i.e. la language modĂšle de se concentrer sur ce problĂšme. L'idĂ©e de base derriĂšre les deux contributions pour l'association d'opinion-sujet est que si un document contient plus segments textuels (phrases ou passages) opiniĂątre et pertinentes Ă  sujet, il est plus opiniĂątre qu'un document avec moins segments textuels opiniĂątre et pertinentes. La plupart des approches d'apprentissage-machine basĂ©e Ă  fouille des opinion sont dĂ©pendants du domaine i.e. leurs performances varient d'un domaine Ă  d'autre. D'autre part, une approche indĂ©pendant de domaine ou un sujet est plus gĂ©nĂ©ralisĂ©e et peut maintenir son efficacitĂ© dans diffĂ©rents domaines. Cependant, les approches indĂ©pendant de domaine souffrent de mauvaises performances en gĂ©nĂ©ral. C'est un grand dĂ©fi dans le domaine de fouille des opinion Ă  dĂ©velopper une approche qui est plus efficace et gĂ©nĂ©ralisĂ©. Nos contributions de cette thĂšse incluent le dĂ©veloppement d'une approche qui utilise de simples fonctions heuristiques pour trouver des documents opiniĂątre. Fouille des opinion basĂ©e entitĂ© devient trĂšs populaire parmi les chercheurs de la communautĂ© IR. Il vise Ă  identifier les entitĂ©s pertinentes pour un sujet donnĂ© et d'en extraire les opinions qui leur sont associĂ©es Ă  partir d'un ensemble de documents textuels. Toutefois, l'identification et la dĂ©termination de la pertinence des entitĂ©s est dĂ©jĂ  une tĂąche difficile. Nous proposons un systĂšme qui prend en compte Ă  la fois l'information de l'article de nouvelles en cours ainsi que des articles antĂ©rieurs pertinents afin de dĂ©tecter les entitĂ©s les plus importantes dans les nouvelles actuelles. En plus de cela, nous prĂ©sentons Ă©galement notre cadre d'analyse d'opinion et tĂąches relieĂ©s. Ce cadre est basĂ©e sur les Ă©vidences contents et les Ă©vidences sociales de la blogosphĂšre pour les tĂąches de trouver des opinions, de prĂ©vision et d'avis de classement multidimensionnel. Cette contribution d'prĂ©maturĂ©e pose les bases pour nos travaux futurs. L'Ă©valuation de nos mĂ©thodes comprennent l'utilisation de TREC 2006 Blog collection et de TREC Novelty track 2004 collection. La plupart des Ă©valuations ont Ă©tĂ© rĂ©alisĂ©es dans le cadre de TREC Blog track.Opinion mining is a sub-discipline within Information Retrieval (IR) and Computational Linguistics. It refers to the computational techniques for extracting, classifying, understanding, and assessing the opinions expressed in various online sources like news articles, social media comments, and other user-generated content. It is also known by many other terms like opinion finding, opinion detection, sentiment analysis, sentiment classification, polarity detection, etc. Defining in more specific and simpler context, opinion mining is the task of retrieving opinions on an issue as expressed by the user in the form of a query. There are many problems and challenges associated with the field of opinion mining. In this thesis, we focus on some major problems of opinion mining

    From people to entities : typed search in the enterprise and the web

    Get PDF
    [no abstract

    The TXM Portal Software giving access to Old French Manuscripts Online

    Get PDF
    Texte intégral en ligne : http://www.lrec-conf.org/proceedings/lrec2012/workshops/13.ProceedingsCultHeritage.pdfInternational audiencehttp://www.lrec-conf.org/proceedings/lrec2012/workshops/13.ProceedingsCultHeritage.pdf This paper presents the new TXM software platform giving online access to Old French Text Manuscripts images and tagged transcriptions for concordancing and text mining. This platform is able to import medieval sources encoded in XML according to the TEI Guidelines for linking manuscript images to transcriptions, encode several diplomatic levels of transcription including abbreviations and word level corrections. It includes a sophisticated tokenizer able to deal with TEI tags at different levels of linguistic hierarchy. Words are tagged on the fly during the import process using IMS TreeTagger tool with a specific language model. Synoptic editions displaying side by side manuscript images and text transcriptions are automatically produced during the import process. Texts are organized in a corpus with their own metadata (title, author, date, genre, etc.) and several word properties indexes are produced for the CQP search engine to allow efficient word patterns search to build different type of frequency lists or concordances. For syntactically annotated texts, special indexes are produced for the Tiger Search engine to allow efficient syntactic concordances building. The platform has also been tested on classical Latin, ancient Greek, Old Slavonic and Old Hieroglyphic Egyptian corpora (including various types of encoding and annotations)

    CHORUS Deliverable 2.1: State of the Art on Multimedia Search Engines

    Get PDF
    Based on the information provided by European projects and national initiatives related to multimedia search as well as domains experts that participated in the CHORUS Think-thanks and workshops, this document reports on the state of the art related to multimedia content search from, a technical, and socio-economic perspective. The technical perspective includes an up to date view on content based indexing and retrieval technologies, multimedia search in the context of mobile devices and peer-to-peer networks, and an overview of current evaluation and benchmark inititiatives to measure the performance of multimedia search engines. From a socio-economic perspective we inventorize the impact and legal consequences of these technical advances and point out future directions of research

    AMCIS 2008 Panel Report: Aging Content on the Web: Issues, Implications, and Potential Research Opportunities

    Get PDF
    Since its inception in the early 1990s, the World Wide Web (Web) has grown enormously. According to the “official Google blog” (Google 2008), the Web had 1 trillion (as in 1,000,000,000,000) unique coexisting URL’s as of July 25, 2008. Given the exponential growth of the Web over time, an issue that is likely to gain prominence is that of outdated information. This is especially important to study since many of us rely on the Web to find facts in order to take decisions. For example, for students and researchers, the “date” of a document is important for scholarship and student work. However, getting an accurate date on content is challenging, and furthermore, outdated pages that are not deleted from Web servers will continue to be returned in response to Web searches. The panel, held at the 2008 Americas Conference on Information Systems in Toronto, Canada, identified a number of research issues and opportunities that arise as a result of this phenomenon

    BlogForever D2.6: Data Extraction Methodology

    Get PDF
    This report outlines an inquiry into the area of web data extraction, conducted within the context of blog preservation. The report reviews theoretical advances and practical developments for implementing data extraction. The inquiry is extended through an experiment that demonstrates the effectiveness and feasibility of implementing some of the suggested approaches. More specifically, the report discusses an approach based on unsupervised machine learning that employs the RSS feeds and HTML representations of blogs. It outlines the possibilities of extracting semantics available in blogs and demonstrates the benefits of exploiting available standards such as microformats and microdata. The report proceeds to propose a methodology for extracting and processing blog data to further inform the design and development of the BlogForever platform
    • 

    corecore