14 research outputs found

    Crawler for Estonian Social Media RSS Feeds

    Get PDF
    Bakalaureusetöö raames valmis kaks roomajat eesti keelse sotsiaalmeedia roomamiseks. Töös on kirjeldatud roomajate algoritmiline ülesehitus ning samuti antud hinnang roomajate efektiivsusele läbitud eksperimendi põhjal. Rakendused leiavad kasutust Tartu Ülikooli eesti keele uurimisgrupi töös.The aim of the thesis was to develop two crawlers for Estonian social media. The thesis includes the description of algorithms used in the crawlers. Besides that there is an overview of the experiment done with the crawlers and an evaluation based on this. The crawlers will be used by the Tartu University Estonian language work group

    Methoden für Trendanalysen im Web zur Unterstützung des Customer Relationship Management

    Get PDF
    Mit dem Einzug des Web 2.0 ins tägliche Leben haben Individuen die Möglichkeit ihre Meinungen und Gefühle in Form von Blogs zu veröffentlichen. Die Analyse der Trends in dieser Blogosphäre kann maßgeblich zur Unterstützung der Kundenrückgewinnung in einem CRM-System eingesetzt werden. In dieser Forschungsarbeit werden bestehende Ansätze zur Trenderkennung im Allgemeinen untersucht und anschließend die Eignung ihrer Applikation auf Weblogs geprüft. Dazu wird ausgehend von bestehenden wissenschaftlichen Arbeiten ein System zur Trendanalyse prototypisch implementiert und die Analyseergebnisse im Anschluss evaluiert

    It's all a bit upmessing - non-standard verb-particle combinations in blogs

    Get PDF
    This article will explore how verb-particle combinations, for a long time one of the most productive segments of English word-formation, have changed with the advent of online real-time short communication forms such as blogs or their more sophisticated social networking or microblogging varieties like Twitter and Facebook. Following up on earlier research (Diemer 2008), evidence will be presented that that the long and seemingly unstoppable trend towards verb-adverb combinations and the decline of the prefixes has been partly reversed by these new forms of communication. Selected examples with the prefixes in and on will be discussed. It will be argued that the main reasons for this change are facilitation of syntax, need for innovation in specialized and peer group communication, analogy formation and the influence of other languages on English

    CHORUS Deliverable 2.1: State of the Art on Multimedia Search Engines

    Get PDF
    Based on the information provided by European projects and national initiatives related to multimedia search as well as domains experts that participated in the CHORUS Think-thanks and workshops, this document reports on the state of the art related to multimedia content search from, a technical, and socio-economic perspective. The technical perspective includes an up to date view on content based indexing and retrieval technologies, multimedia search in the context of mobile devices and peer-to-peer networks, and an overview of current evaluation and benchmark inititiatives to measure the performance of multimedia search engines. From a socio-economic perspective we inventorize the impact and legal consequences of these technical advances and point out future directions of research

    Importance of social media in the information sourcing phase during the decision-making process of the South African traveller

    Get PDF
    Includes bibliographical references.The Internet and the emergence of social media have a significant effect on the tourism industry world-wide. Tourists can search for advice online from strangers and friends who have visited the destination in the past. Research indicates that this information source is perceived as more credible than traditional marketing material such as Web sites, brochures or other forms of advertisements. More specifically, information sources on social media assist the tourist in evaluating alternatives in order to make an informed purchasing- decision. Destination marketing organisations and tourism enterprises need to understand the role that social media plays in the decision-making process in order to create effective marketing strategies online. This research paper places the focus on the South African traveller and which online sources s/he uses to search for travel information before going on holiday. Social media sources in particularly will be under investigation. There has been a dearth of research conducted in this area on emerging markets such as South Africa and this paper will fill an important gap in the academic literature. The database for this research was acquired from Travelstart; a leading digital travel agency in South Africa

    A bottom-up approach to real-time search in large networks and clouds

    Full text link

    The voting model for people search

    Get PDF
    The thesis investigates how persons in an enterprise organisation can be ranked in response to a query, so that those persons with relevant expertise to the query topic are ranked first. The expertise areas of the persons are represented by documentary evidence of expertise, known as candidate profiles. The statement of this research work is that the expert search task in an enterprise setting can be successfully and effectively modelled using a voting paradigm. In the so-called Voting Model, when a document is retrieved for a query, this document represents a vote for every expert associated with the document to have relevant expertise to the query topic. This voting paradigm is manifested by the proposition of various voting techniques that aggregate the votes from documents to candidate experts. Moreover, the research work demonstrates that these voting techniques can be modelled in terms of a Bayesian belief network, providing probabilistic semantics for the proposed voting paradigm. The proposed voting techniques are thoroughly evaluated on three standard expert search test collections, deriving conclusions concerning each component of the Voting Model, namely the method used to identify the documents that represent each candidate's expertise areas, the weighting models that are used to rank the documents, and the voting techniques which are used to convert the ranking of documents into the ranking of experts. Effective settings are identified and insights about the behaviour of each voting technique are derived. Moreover, the practical aspects of deploying an expert search engine such as its efficiency and how it should be trained are also discussed. This thesis includes an investigation of the relationship between the quality of the underlying ranking of documents and the resulting effectiveness of the voting techniques. The thesis shows that various effective document retrieval approaches have a positive impact on the performance of the voting techniques. Interestingly, it also shows that a `perfect' ranking of documents does not necessarily translate into an equally perfect ranking of candidates. Insights are provided into the reasons for this, which relate to the complexity of evaluating tasks based on ranking aggregates of documents. Furthermore, it is shown how query expansion can be adapted and integrated into the expert search process, such that the query expansion successfully acts on a pseudo-relevant set containing only a list of names of persons. Five ways of performing query expansion in the expert search task are proposed, which vary in the extent to which they tackle expert search-specific problems, in particular, the occurrence of topic drift within the expertise evidence for each candidate. Not all documentary evidence of expertise for a given person are equally useful, nor may there be sufficient expertise evidence for a relevant person within an enterprise. This thesis investigates various approaches to identify the high quality evidence for each person, and shows how the World Wide Web can be mined as a resource to find additional expertise evidence. This thesis also demonstrates how the proposed model can be applied to other people search tasks such as ranking blog(ger)s in the blogosphere setting, and suggesting reviewers for the submitted papers to an academic conference. The central contributions of this thesis are the introduction of the Voting Model, and the definition of a number of voting techniques within the model. The thesis draws insights from an extremely large and exhaustive set of experiments, involving many experimental parameters, and using different test collections for several people search tasks. This illustrates the effectiveness and the generality of the Voting Model at tackling various people search tasks and, indeed, the retrieval of aggregates of documents in general

    An Online Analytical System for Multi-Tagged Document Collections

    Get PDF
    The New York Times Annotated Corpus and the ACM Digital Library are two prototypical examples of document collections in which each document is tagged with keywords and significant phrases. Such collections can be viewed as high-dimensional document cubes against which browsers and search systems can be applied in a manner similar to online analytical processing against data cubes. The tagging patterns in these collections are examined and a generative tagging model is developed that can mimic the tag assignments observed in those collections. When a user browses the collection by means of a Boolean query over tags, the result is a subset of documents that can be summarized by a centroid derived from their document term vectors. A partial materialization strategy is developed to provide efficient storage and access to centroids for such document subsets. A customized local term vocabulary storage approach is incorporated into the partial materialization to ensure that rich and relevant term vocabulary is available for representing centroids while maintaining a low storage footprint. By adopting this strategy, summary measures dependent on centroids (including bursty terms, or larger sets of indicative documents) can be efficiently and accurately computed for important subsets of documents. The proposed design is evaluated on the two collections along with PubMed (a held-back document collection) and several synthetic collections to validate that it outperforms alternative storage strategies. Finally, an enhanced faceted browsing system is developed to support users' exploration of large multi-tagged document collections. It provides summary measures of document result sets at each step of navigation through a set of indicative terms and diverse set of documents, as well as information scent that helps to guide users' exploration. These summaries are derived from pre-materialized views that allow for quick calculation of centroids for various result sets. The utility and efficiency of the system is demonstrated on the New York Times Annotated Corpus
    corecore