584 research outputs found

    Automatically tagging email by leveraging other users' folders

    Full text link
    Most email applications devote a significant part of their real estate to organization mechanisms such as folders. Yet, we verified on the Yahoo! Mail service that 70 % of email users have never defined a single folder. This implies that one of the most well known email features is underexploited. We propose here to revive the feature by providing a method for generating a lighter form of folders, or tags, benefiting even the most passive users. The method automatically asso-ciates, whenever possible, an appropriate semantic tag with a given email. This gives rise to an alternate mechanism for organizing and searching email. We advocate a novel modeling approach that exploits the overall population of users, thereby learning from the wisdom-of-crowds how to categorize messages. Given our massive user base, it is enough to learn from a minority of the users who label certain messages in order to label that kind of messages for the general population. We design a novel cas-cade classification approach, which copes with the severe scalability and accuracy constraints we are facing. Signifi-cant efficiency gains are achieved by working within a low dimensional latent space, and by using a novel hierarchical classifier. Precision level is controlled by separating the task into a two-phase classification process. We performed an extensive empirical study covering three different time periods, over 100 million messages, and thou-sands of candidate tags per message. The results are encour-aging and compare favorably with alternative approaches. Our method successfully tags 72 % of incoming email traf-fic. Performance-wise, the computational overhead, even on surge large traffic, is sufficiently low for our approach to be applicable in production on any large Web mail service. 1

    The best of both worlds: highlighting the synergies of combining manual and automatic knowledge organization methods to improve information search and discovery.

    Get PDF
    Research suggests organizations across all sectors waste a significant amount of time looking for information and often fail to leverage the information they have. In response, many organizations have deployed some form of enterprise search to improve the 'findability' of information. Debates persist as to whether thesauri and manual indexing or automated machine learning techniques should be used to enhance discovery of information. In addition, the extent to which a knowledge organization system (KOS) enhances discoveries or indeed blinds us to new ones remains a moot point. The oil and gas industry was used as a case study using a representative organization. Drawing on prior research, a theoretical model is presented which aims to overcome the shortcomings of each approach. This synergistic model could help to re-conceptualize the 'manual' versus 'automatic' debate in many enterprises, accommodating a broader range of information needs. This may enable enterprises to develop more effective information and knowledge management strategies and ease the tension between what arc often perceived as mutually exclusive competing approaches. Certain aspects of the theoretical model may be transferable to other industries, which is an area for further research

    Information scraps: how and why information eludes our personal information management tools

    No full text
    In this paper we describe information scraps -- a class of personal information whose content is scribbled on Post-it notes, scrawled on corners of random sheets of paper, buried inside the bodies of e-mail messages sent to ourselves, or typed haphazardly into text files. Information scraps hold our great ideas, sketches, notes, reminders, driving directions, and even our poetry. We define information scraps to be the body of personal information that is held outside of its natural or We have much still to learn about these loose forms of information capture. Why are they so often held outside of our traditional PIM locations and instead on Post-its or in text files? Why must we sometimes go around our traditional PIM applications to hold on to our scraps, such as by e-mailing ourselves? What are information scraps' role in the larger space of personal information management, and what do they uniquely offer that we find so appealing? If these unorganized bits truly indicate the failure of our PIM tools, how might we begin to build better tools? We have pursued these questions by undertaking a study of 27 knowledge workers. In our findings we describe information scraps from several angles: their content, their location, and the factors that lead to their use, which we identify as ease of capture, flexibility of content and organization, and avilability at the time of need. We also consider the personal emotive responses around scrap management. We present a set of design considerations that we have derived from the analysis of our study results. We present our work on an application platform, jourknow, to test some of these design and usability findings

    Photo Wallet : interface design for simple mobile photo albums

    Get PDF
    Tese de mestrado. Multimédia (Perfil Tecnologias). Universidade do Porto. Faculdade de Engenharia. 201

    Organizational transformation through knowledge management : an internship at Luxembourg-slovenian business club

    Get PDF
    Internship Report presented as the partial requirement for obtaining a Master's degree in Information Management, specialization in Knowledge Management and Business IntelligenceThe purpose of this report is to describe a five month internship that the student did at the Slovenian non-profit organization Luxembourg-Slovenian Business Club (LSBC). This internship report stands as partial requirement for obtaining the Master Degree in Information Management with specialization in Knowledge Management and Business Intelligence. Methodologies and the framework followed were largely based on knowledge acquired through the guidance of Nova IMS Information Management Master. The main objective of this internship was to better understand the impact of information on a business context and how to foment a knowledge-based environment. In-depth, the aim was determining the information flow as it stands, identify bottlenecks and help growing a knowledge creation culture while shortening the gap inside the organization and between the organization and its members (both individuals and organizations). The main areas affected by this internship were Knowledge Management, Information Systems and Enterprise 2.0. This report starts by giving an introduction to context and goals where the internship is inserted upon, followed by a detailed description of the background of the organization itself. After this section, it follows literature background focused on Knowledge Management areas - all subjects that were relevant for the internship practical work. Subsequently, an explanation of the of internship objectives and the path to achieve them is further discussed. Also, a presentation of the completed tasks results, followed by a critical opinion about them. Finally, possible future work endeavours that can follow up this project are then present as well as a pragmatic reflexion of the internship. As a result of this report, improvements in information handling and some applied methodologies regarding Knowledge Management will be integrated in the organization. Hopefully, it will also bring to this organization new opportunities to develop business, to establish new partnerships while simultaneously expanding LSBC network of contacts

    SchedMail: Sender-Assisted Message Delivery Scheduling to Reduce Time-Fragmentation

    Get PDF
    Although early efforts aimed at dealing with large amounts of emails focused on filtering out spam, there is growing interest in prioritizing non-spam emails, with the objective of reducing information overload and time fragmentation experienced by recipients. However, most existing approaches place the burden of classifying emails exclusively on the recipients' side, either directly or through recipients' email service mechanisms. This disregards the fact that senders typically know more about the nature of the contents of outgoing messages before the messages are read by recipients. This thesis presents mechanisms collectively called SchedMail which can be added to popular email clients, to shift a part of the user efforts and computational resources required for email prioritization to the senders' side. Particularly, senders declare the urgency of their messages, and recipients specify policies about when different types of messages should be delivered. Recipients also judge the accuracy of sender-side urgency, which becomes the basis for learned reputations of senders; these reputations are then used to interpret urgency declarations from the recipients' perspectives. In order to experimentally evaluate the proposed mechanisms, a proof-of-concept prototype was implemented based on a popular open source email client K-9 Mail. By comparing the amount of email interruptions experienced by recipients, with and without SchedMail, the thesis concludes that SchedMail can effectively reduce recipients' time fragmentation, without placing demands on email protocols or adding significant computational overhead

    Learning techniques for automatic email message tagging

    Get PDF
    A organização automática de mensagens de correio electrónico é um desafio actual na área da aprendizagem automática. O número excessivo de mensagens afecta cada vez mais utilizadores, especialmente os que usam o correio electrónico como ferramenta de comunicação e trabalho. Esta tese aborda o problema da organização automática de mensagens de correio electrónico propondo uma solução que tem como objectivo a etiquetagem automática de mensagens. A etiquetagem automática é feita com recurso às pastas de correio electrónico anteriormente criadas pelos utilizadores, tratando-as como etiquetas, e à sugestão de múltiplas etiquetas para cada mensagem (top-N). São estudadas várias técnicas de aprendizagem e os vários campos que compõe uma mensagem de correio electrónico são analisados de forma a determinar a sua adequação como elementos de classificação. O foco deste trabalho recai sobre os campos textuais (o assunto e o corpo das mensagens), estudando-se diferentes formas de representação, selecção de características e algoritmos de classificação. É ainda efectuada a avaliação dos campos de participantes através de algoritmos de classificação que os representam usando o modelo vectorial ou como um grafo. Os vários campos são combinados para classificação utilizando a técnica de combinação de classificadores Votação por Maioria. Os testes são efectuados com um subconjunto de mensagens de correio electrónico da Enron e um conjunto de dados privados disponibilizados pelo Institute for Systems and Technologies of Information, Control and Communication (INSTICC). Estes conjuntos são analisados de forma a perceber as características dos dados. A avaliação do sistema é realizada através da percentagem de acerto dos classificadores. Os resultados obtidos apresentam melhorias significativas em comparação com os trabalhos relacionados.Automatic organization of email messages is still a challenge in machine learning. The problema of “email overload”, coined in 1998 by Whittaker et al, is presently affecting enterprise and power users. This thesis addresses automatic email organization by proposing a solution based on supervised learning algorithms that automatically labels email messages with tags. We approach tagging using previously created user-folders as tags and top-N ranking classifier output. Learning techniques are reviewed and the different fields of an email message are analyzed for their suitability for classification. Special attention is given to the textual fields (subject and body), by studying and testing different representations, different feature selection methods and several classification algorithms. The participant fields are analyzed and evaluated using classification algorithms that work with the vector-space model and a graph based representation. The diferente email fields are combined for classification using the classifier combination technique of Majority Voting. Experiments are done on a subset of the Enron Corpus and on a private data set from the Institute for Systems and Technologies of Information, Control and Communication (INSTICC). The data sets are extensively analyzed in order to understand the characteristics of the data. The evaluation of the system, using accuracy, shows great promise, with the experimental results presenting a significant improvement over related works
    corecore