88 research outputs found

    USING SOCIAL ANNOTATIONS TO IMPROVE WEB SEARCH

    Get PDF
    Web-based tagging systems, which include social bookmarking systems such as Delicious, have become increasingly popular. These systems allow participants to annotate or tag web resources. This research examined the use of social annotations to improve the quality of web searches. The research involved three components. First, social annotations were used to index resources. Two annotation-based indexing methods were proposed: annotation based indexing and full text with annotation indexing. Second, social annotations were used to improve search result ranking. Six annotation based ranking methods were proposed: Popularity Count, Propagate Popularity Count, Query Weighted Popularity Count, Query Weighted Propagate Popularity Count, Match Tag Count and Normalized Match Tag Count. Third, social annotations were used to both index and rank resources. The result from the first experiment suggested that both static feature and similarity feature should be considered when using social annotations to re-rank search result. The result of the second experiment showed that using only annotation as an index of resources may not be a good idea. Since social Annotations could be viewed as a high level concept of the content, combining them to the content of resource could add some more important concepts to the resources. Last but not least, the result from the third experiment confirmed that the combination of using social annotations to rank the search result and using social annotations as resource index augmentation provided a promising rank of search results. It showed that social annotations could benefit web search

    A flexible framework for evaluating user and item fairness in recommender systems

    Full text link
    This version of the article has been accepted for publication, after peer review (when applicable) and is subject to Springer Nature’s AM terms of use, but is not the Version of Record and does not reflect post-acceptance improvements, or any corrections. The Version of Record is available online at: https://doi.org/10.1007/s11257-020-09285-1One common characteristic of research works focused on fairness evaluation (in machine learning) is that they call for some form of parity (equality) either in treatment—meaning they ignore the information about users’ memberships in protected classes during training—or in impact—by enforcing proportional beneficial outcomes to users in different protected classes. In the recommender systems community, fairness has been studied with respect to both users’ and items’ memberships in protected classes defined by some sensitive attributes (e.g., gender or race for users, revenue in a multi-stakeholder setting for items). Again here, the concept has been commonly interpreted as some form of equality—i.e., the degree to which the system is meeting the information needs of all its users in an equal sense. In this work, we propose a probabilistic framework based on generalized cross entropy (GCE) to measure fairness of a given recommendation model. The framework comes with a suite of advantages: first, it allows the system designer to define and measure fairness for both users and items and can be applied to any classification task; second, it can incorporate various notions of fairness as it does not rely on specific and predefined probability distributions and they can be defined at design time; finally, in its design it uses a gain factor, which can be flexibly defined to contemplate different accuracy-related metrics to measure fairness upon decision-support metrics (e.g., precision, recall) or rank-based measures (e.g., NDCG, MAP). An experimental evaluation on four real-world datasets shows the nuances captured by our proposed metric regarding fairness on different user and item attributes, where nearest-neighbor recommenders tend to obtain good results under equality constraints. We observed that when the users are clustered based on both their interaction with the system and other sensitive attributes, such as age or gender, algorithms with similar performance values get different behaviors with respect to user fairness due to the different way they process data for each user clusterThe authors thank the reviewers for their thoughtful comments and suggestions. This work was supported in part by the Ministerio de Ciencia, Innovacion y Universidades (Reference: 123496 Y. Deldjoo et al. PID2019-108965GB-I00) and in part by the Center for Intelligent Information Retrieval. Any opinions, findings and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect those of the sponsor

    Joint Models for Answer Verification in Question Answering Systems

    Full text link
    This paper studies joint models for selecting correct answer sentences among the top kk provided by answer sentence selection (AS2) modules, which are core components of retrieval-based Question Answering (QA) systems. Our work shows that a critical step to effectively exploit an answer set regards modeling the interrelated information between pair of answers. For this purpose, we build a three-way multi-classifier, which decides if an answer supports, refutes, or is neutral with respect to another one. More specifically, our neural architecture integrates a state-of-the-art AS2 model with the multi-classifier, and a joint layer connecting all components. We tested our models on WikiQA, TREC-QA, and a real-world dataset. The results show that our models obtain the new state of the art in AS2

    Leveraging Semantic Annotations for Event-focused Search & Summarization

    Get PDF
    Today in this Big Data era, overwhelming amounts of textual information across different sources with a high degree of redundancy has made it hard for a consumer to retrospect on past events. A plausible solution is to link semantically similar information contained across the different sources to enforce a structure thereby providing multiple access paths to relevant information. Keeping this larger goal in view, this work uses Wikipedia and online news articles as two prominent yet disparate information sources to address the following three problems: ‱ We address a linking problem to connect Wikipedia excerpts to news articles by casting it into an IR task. Our novel approach integrates time, geolocations, and entities with text to identify relevant documents that can be linked to a given excerpt. ‱ We address an unsupervised extractive multi-document summarization task to generate a fixed-length event digest that facilitates efficient consumption of information contained within a large set of documents. Our novel approach proposes an ILP for global inference across text, time, geolocations, and entities associated with the event. ‱ To estimate temporal focus of short event descriptions, we present a semi-supervised approach that leverages redundancy within a longitudinal news collection to estimate accurate probabilistic time models. Extensive experimental evaluations demonstrate the effectiveness and viability of our proposed approaches towards achieving the larger goal.Im heutigen Big Data Zeitalters existieren ĂŒberwĂ€ltigende Mengen an Textinformationen, die ĂŒber mehrere Quellen verteilt sind und ein hohes Maß an Redundanz haben. Durch diese Gegebenheiten ist eine Retroperspektive auf vergangene Ereignisse fĂŒr Konsumenten nur schwer möglich. Eine plausible Lösung ist die VerknĂŒpfung semantisch Ă€hnlicher, aber ĂŒber mehrere Quellen verteilter Informationen, um dadurch eine Struktur zu erzwingen, die mehrere Zugriffspfade auf relevante Informationen, bietet. Vor diesem Hintergrund benutzt diese Dissertation Wikipedia und Onlinenachrichten als zwei prominente, aber dennoch grundverschiedene Informationsquellen, um die folgenden drei Probleme anzusprechen: ‱ Wir adressieren ein VerknĂŒpfungsproblem, um Wikipedia-AuszĂŒge mit Nachrichtenartikeln zu verbinden und das Problem in eine Information-Retrieval-Aufgabe umzuwandeln. Unser neuartiger Ansatz integriert Zeit- und GeobezĂŒge sowie EntitĂ€ten mit Text, um relevante Dokumente, die mit einem gegebenen Auszug verknĂŒpft werden können, zu identifizieren. ‱ Wir befassen uns mit einer unĂŒberwachten Extraktionsmethode zur automatischen Zusammenfassung von Texten aus mehreren Dokumenten um Ereigniszusammenfassungen mit fester LĂ€nge zu generieren, was eine effiziente Aufnahme von Informationen aus großen Dokumentenmassen ermöglicht. Unser neuartiger Ansatz schlĂ€gt eine ganzzahlige lineare Optimierungslösung vor, die globale Inferenzen ĂŒber Text, Zeit, Geolokationen und mit Ereignis-verbundenen EntitĂ€ten zieht. ‱ Um den zeitlichen Fokus kurzer Ereignisbeschreibungen abzuschĂ€tzen, stellen wir einen semi-ĂŒberwachten Ansatz vor, der die Redundanz innerhalb einer langzeitigen Dokumentensammlung ausnutzt, um genaue probabilistische Zeitmodelle abzuschĂ€tzen. Umfangreiche experimentelle Auswertungen zeigen die Wirksamkeit und TragfĂ€higkeit unserer vorgeschlagenen AnsĂ€tze zur Erreichung des grĂ¶ĂŸeren Ziels

    Explainable AI and Interpretable Computer Vision:From Oversight to Insight

    Get PDF
    The increasing availability of big data and computational power has facilitated unprecedented progress in Artificial Intelligence (AI) and Machine Learning (ML). However, complex model architectures have resulted in high-performing yet uninterpretable ‘black boxes’. This prevents users from verifying that the reasoning process aligns with expectations and intentions. This thesis posits that the sole focus on predictive performance is an unsustainable trajectory, since a model can make right predictions for the wrong reasons. The research field of Explainable AI (XAI) addresses the black-box nature of AI by generating explanations that present (aspects of) a model's behaviour in human-understandable terms. This thesis supports the transition from oversight to insight, and shows that explainability can give users more insight into every part of the machine learning pipeline: from the training data to the prediction model and the resulting explanations. When relying on explanations for judging a model's reasoning process, it is important that the explanations are truthful, relevant and understandable. Part I of this thesis reflects upon explanation quality and identifies 12 desirable properties, including compactness, completeness and correctness. Additionally, it provides an extensive collection of quantitative XAI evaluation methods, and analyses their availabilities in open-source toolkits. As alternative to common post-model explainability that reverse-engineers an already trained prediction model, Part II of this thesis presents in-model explainability for interpretable computer vision. These image classifiers learn prototypical parts, which are used in an interpretable decision tree or scoring sheet. The models are explainable by design since their reasoning depends on the extent to which an image patch “looks like” a learned part-prototype. Part III of this thesis shows that ML can also explain characteristics of a dataset. Because of a model's ability to analyse large amounts of data in little time, extracting hidden patterns can contribute to the validation and potential discovery of domain knowledge, and allows to detect sources of bias and shortcuts early on. Concluding, neither the prediction model nor the data nor the explanation method should be handled as a black box. The way forward? AI with a human touch: developing powerful models that learn interpretable features, and using these meaningful features in a decision process that users can understand, validate and adapt. This in-model explainability, such as the part-prototype models from Part II, opens up the opportunity to ‘re-educate’ models with our desired norms, values and reasoning. Enabling human decision-makers to detect and correct undesired model behaviour will contribute towards an effective but also reliable and responsible usage of AI

    Seventh Biennial Report : June 2003 - March 2005

    No full text

    No Pattern, No Recognition: a Survey about Reproducibility and Distortion Issues of Text Clustering and Topic Modeling

    Full text link
    Extracting knowledge from unlabeled texts using machine learning algorithms can be complex. Document categorization and information retrieval are two applications that may benefit from unsupervised learning (e.g., text clustering and topic modeling), including exploratory data analysis. However, the unsupervised learning paradigm poses reproducibility issues. The initialization can lead to variability depending on the machine learning algorithm. Furthermore, the distortions can be misleading when regarding cluster geometry. Amongst the causes, the presence of outliers and anomalies can be a determining factor. Despite the relevance of initialization and outlier issues for text clustering and topic modeling, the authors did not find an in-depth analysis of them. This survey provides a systematic literature review (2011-2022) of these subareas and proposes a common terminology since similar procedures have different terms. The authors describe research opportunities, trends, and open issues. The appendices summarize the theoretical background of the text vectorization, the factorization, and the clustering algorithms that are directly or indirectly related to the reviewed works

    Utilizing external resources for enriching information retrieval

    Get PDF
    Information retrieval (IR) seeks to support users in finding information relevant to their information needs. One obstacle for many IR algorithms to achieve better results in many IR tasks is that there is insufficient information available to enable relevant content to be identified. For example, users typically enter very short queries, in text-based image retrieval where textual annotations often describe the content of the images inadequately, or there is insufficient user log data for personalization of the search process. This thesis explores the problem of inadequate data in IR tasks. We propose methods for Enriching Information Retrieval (ENIR) which address various challenges relating to insufficient data in IR. Applying standard methods to address these problems can face unexpected challenges. For example, standard query expansion methods assume that the target collection contains sufficient data to be able to identify relevant terms to add to the original query to improve retrieval effectiveness. In the case of short documents, this assumption is not valid. One strategy to address this problem is document side expansion which has been largely overlooked in the past research. Similarly, topic modeling in personalized search often lacks the knowledge required to form adequate models leading to mismatch problems when trying to apply these models improve search. This thesis focuses on methods of ENIR for tasks affected by problems of insufficient data. To achieve ENIR, our overall solution is to include external resources for ENIR. This research focuses on developing methods for two typical ENIR tasks: text-based image retrieval and personalized web data search. In this research, the main relevant areas within existing IR research are relevance feedback and personalized modeling. ENIR is shown to be effective to augment existing knowledge in these classical areas. The areas of relevance feedback and personalized modeling are strongly correlated since user modeling and document modeling in personalized retrieval enrich the data from both sides of the query and document, which is similar to query and document expansion in relevance feedback. Enriching IR is the key challenge in these areas for IR. By addressing these two research areas, this thesis provides a prototype for an external resource based search solution. The experimental results show external resources can play a key role in enriching IR
    • 

    corecore