52,081 research outputs found

    Relation Discovery from Web Data for Competency Management

    Get PDF
    This paper describes a technique for automatically discovering associations between people and expertise from an analysis of very large data sources (including web pages, blogs and emails), using a family of algorithms that perform accurate named-entity recognition, assign different weights to terms according to an analysis of document structure, and access distances between terms in a document. My contribution is to add a social networking approach called BuddyFinder which relies on associations within a large enterprise-wide "buddy list" to help delimit the search space and also to provide a form of 'social triangulation' whereby the system can discover documents from your colleagues that contain pertinent information about you. This work has been influential in the information retrieval community generally, as it is the basis of a landmark system that achieved overall first place in every category in the Enterprise Search Track of TREC2006

    Flow-based reputation: more than just ranking

    Full text link
    The last years have seen a growing interest in collaborative systems like electronic marketplaces and P2P file sharing systems where people are intended to interact with other people. Those systems, however, are subject to security and operational risks because of their open and distributed nature. Reputation systems provide a mechanism to reduce such risks by building trust relationships among entities and identifying malicious entities. A popular reputation model is the so called flow-based model. Most existing reputation systems based on such a model provide only a ranking, without absolute reputation values; this makes it difficult to determine whether entities are actually trustworthy or untrustworthy. In addition, those systems ignore a significant part of the available information; as a consequence, reputation values may not be accurate. In this paper, we present a flow-based reputation metric that gives absolute values instead of merely a ranking. Our metric makes use of all the available information. We study, both analytically and numerically, the properties of the proposed metric and the effect of attacks on reputation values

    Ontology (Science)

    Get PDF
    Increasingly, in data-intensive areas of the life sciences, experimental results are being described in algorithmically useful ways with the help of ontologies. Such ontologies are authored and maintained by scientists to support the retrieval, integration and analysis of their data. The proposition to be defended here is that ontologies of this type – the Gene Ontology (GO) being the most conspicuous example – are a _part of science_. Initial evidence for the truth of this proposition (which some will find self-evident) is the increasing recognition of the importance of empirically-based methods of evaluation to the ontology develop¬ment work being undertaken in support of scientific research. Ontologies created by scientists must, of course, be associated with implementations satisfying the requirements of software engineering. But the ontologies are not themselves engineering artifacts, and to conceive them as such brings grievous consequences. Rather, ontologies such as the GO are in different respects comparable to scientific theories, to scientific databases, and to scientific journal publications. Such a view implies a new conception of what is involved in the author¬ing, maintenance and application of ontologies in scientific contexts, and therewith also a new approach to the evaluation of ontologies and to the training of ontologists

    The evaluation of ontologies: Editorial review vs. democratic ranking

    Get PDF
    Increasingly, the high throughput technologies used by biomedical researchers are bringing about a situation in which large bodies of data are being described using controlled structured vocabularies—also known as ontologies—in order to support the integration and analysis of this data. Annotation of data by means of ontologies is already contributing in significant ways to the cumulation of scientific knowledge and, prospectively, to the applicability of cross-domain algorithmic reasoning in support of scientific advance. This very success, however, has led to a proliferation of ontologies of varying scope and quality. We define one strategy for achieving quality assurance of ontologies—a plan of action already adopted by a large community of collaborating ontologists—which consists in subjecting ontologies to a process of peer review analogous to that which is applied to scientific journal articles

    Trusting (and Verifying) Online Intermediaries\u27 Policing

    Get PDF
    All is not well in the land of online self-regulation. However competently internet intermediaries police their sites, nagging questions will remain about their fairness and objectivity in doing so. Is Comcast blocking BitTorrent to stop infringement, to manage traffic, or to decrease access to content that competes with its own for viewers? How much digital due process does Google need to give a site it accuses of harboring malware? If Facebook censors a video of war carnage, is that a token of respect for the wounded or one more reflexive effort of a major company to ingratiate itself with the Washington establishment? Questions like these will persist, and erode the legitimacy of intermediary self-policing, as long as key operations of leading companies are shrouded in secrecy. Administrators must develop an institutional competence for continually monitoring rapidly-changing business practices. A trusted advisory council charged with assisting the Federal Trade Commission (FTC) and Federal Communications Commission (FCC) could help courts and agencies adjudicate controversies concerning intermediary practices. An Internet Intermediary Regulatory Council (IIRC) would spur the development of expertise necessary to understand whether companies’ controversial decisions are socially responsible or purely self-interested. Monitoring is a prerequisite for assuring a level playing field online

    Living Knowledge

    Get PDF
    Diversity, especially manifested in language and knowledge, is a function of local goals, needs, competences, beliefs, culture, opinions and personal experience. The Living Knowledge project considers diversity as an asset rather than a problem. With the project, foundational ideas emerged from the synergic contribution of different disciplines, methodologies (with which many partners were previously unfamiliar) and technologies flowed in concrete diversity-aware applications such as the Future Predictor and the Media Content Analyser providing users with better structured information while coping with Web scale complexities. The key notions of diversity, fact, opinion and bias have been defined in relation to three methodologies: Media Content Analysis (MCA) which operates from a social sciences perspective; Multimodal Genre Analysis (MGA) which operates from a semiotic perspective and Facet Analysis (FA) which operates from a knowledge representation and organization perspective. A conceptual architecture that pulls all of them together has become the core of the tools for automatic extraction and the way they interact. In particular, the conceptual architecture has been implemented with the Media Content Analyser application. The scientific and technological results obtained are described in the following

    Social Search with Missing Data: Which Ranking Algorithm?

    Get PDF
    Online social networking tools are extremely popular, but can miss potential discoveries latent in the social 'fabric'. Matchmaking services which can do naive profile matching with old database technology are too brittle in the absence of key data, and even modern ontological markup, though powerful, can be onerous at data-input time. In this paper, we present a system called BuddyFinder which can automatically identify buddies who can best match a user's search requirements specified in a term-based query, even in the absence of stored user-profiles. We deploy and compare five statistical measures, namely, our own CORDER, mutual information (MI), phi-squared, improved MI and Z score, and two TF/IDF based baseline methods to find online users who best match the search requirements based on 'inferred profiles' of these users in the form of scavenged web pages. These measures identify statistically significant relationships between online users and a term-based query. Our user evaluation on two groups of users shows that BuddyFinder can find users highly relevant to search queries, and that CORDER achieved the best average ranking correlations among all seven algorithms and improved the performance of both baseline methods
    • …
    corecore