53 research outputs found

    Effective searching of RDF knowledge bases

    Get PDF
    RDF data has become a vital source of information for many applications. In this thesis, we present a set of models and algorithms to effectively search large RDF knowledge bases. These knowledge bases contain a large set of subjectpredicate-object (SPO) triples where subjects and objects are entities and predicates express relationships between them. Searching such knowledge bases can be done using the W3C-endorsed SPARQL language or by similarly designed triple-pattern search. However, the exact-match semantics of triple-pattern search might fall short of satisfying the users needs by returning too many or too few results. Thus, IR-style searching and ranking techniques are crucial. This thesis develops models and algorithms to enhance triple-pattern search. We propose a keyword extension to triple-pattern search that allows users to augment triple-pattern queries with keyword conditions. To improve the recall of triple-pattern search, we present a framework to automatically reformulate triple-pattern queries in such a way that the intention of the original user query is preserved while returning a sufficient number of ranked results. For efficient query processing, we present a set of top-k query processing algorithms and for ease of use, we develop methods for plain keyword search over RDF knowledge bases. Finally, we propose a set of techniques to diversify query results and we present several methods to allow users to interactively explore RDF knowledge bases to find additional contextual information about their query results.Eine Vielzahl aktueller Anwendungen basiert auf RDF-Daten als essentieller Informationsquelle. Daher sind Modelle und Algorithmen zur effizienten Suche in RDF-Wissensdatenbanken ein entscheidender Aspekt, der ĂŒber Erfolg und Nichterfolg entscheidet. Derartige Datenbanken bestehen aus einer großen Menge von Subjekt-PrĂ€dikat-Objekt-Tripeln (SPO-Tripeln), wobei Subjekt und Objekt EntitĂ€ten darstellen und PrĂ€dikate Beziehungen zwischen diesen EntitĂ€ten beschreiben. Suchanfragen werden in der Regel durch Verwendung des W3C Anfragestandards SPARQL oder Ă€hnlich strukturierte Anfragesprachen formuliert und basieren auf Tripel-Patterns. Werden nur exakte Treffer in die Ergebnismenge ĂŒbernommen, wird das InformationsbedĂŒrfnis des Nutzers hĂ€ufig nicht befriedigt, wenn zu wenige oder zu viele Ergebnisse ausgegeben werden. Techniken, die ihren Ursprung im Information-Retrieval haben, sowie ein geeignetes Ranking können diesem Problem entgegenwirken. Diese Dissertation stellt daher Modelle und Algorithmen zur Verbesserung der Suche basierend auf Tripel-Patterns vor. Die im Rahmen der Dissertation erarbeitete Strategie zur Lösung der oben geschilderten Problematik basiert auf der Idee, die Tripel-Patterns einer Anfrage durch SchlĂŒsselwörter zu erweitern. Um den Recall dieser Suchvariante zu verbessern, wird ein Framework vorgestellt, welches die vom Nutzer ĂŒbergebenen Anfragen automatisch in einer Weise umformuliert, dass die Intention der ursprĂŒnglichen Nutzeranfrage erhalten bleibt und eine ausreichende Anzahl an sortierten Ergebnissen ausgegeben wird. Um derartige Anfragen effizient bearbeiten zu können, werden Top-k Algorithmen und Methoden zur SchlĂŒsselwortsuche auf RDF-Datenbanken vorgestellt. Schließlich werden einige Methoden zur Diversifikation der Anfrageergebnisse prĂ€sentiert sowie einige AnsĂ€tze vorgestellt, die es Benutzern erlauben, RDFDatenbanken interaktiv zu explorieren und so zusĂ€tzliche Kontextinformationen zu den Anfrageergebnissen zu erhalten

    Exploring Fairness of Ranking in Online Job Marketplaces

    Get PDF
    International audienceWe study fairness of ranking in online job marketplaces. We focus on group fairness and aim to algorithmically explore how a scoring function, through which individuals are ranked for jobs, treats different demographic groups. Previous work on group-level fairness has focused on the case where groups are pre-defined or where they are defined using a single protected attribute (e.g., Caucasian vs Asian). In this paper, we argue for the need to examine fairness for groups of people defined with any combination of protected attributes. To do this, we formulate an optimization problem to find a partitioning of individuals on their protected attributes that exhibits the highest unfairness with respect to the scoring function. The scoring function yields one histogram of score distributions per partition and we rely on Earth Mover's Distance , a measure that is commonly used to compare histograms, to quantify unfairness. Since the number of ways to partition individuals is exponential in the number of their protected attribute values, we propose two heuristic algorithms to navigate the space of all possible partitionings to identify the one with the highest unfairness. We evaluate our algorithms using a simulation of a crowdsourcing platform and show that they can effectively quantify unfairness of various scoring functions

    Methodical evaluation of Arabic word embeddings

    Get PDF
    Many unsupervised learning techniques have been proposed to obtain meaningful representations of words from text. In this study, we evaluate these various techniques when used to generate Arabic word embeddings. We first build a benchmark for the Arabic language that can be utilized to perform intrinsic evaluation of different word embeddings. We then perform additional extrinsic evaluations of the embeddings based on two NLP tasks. 2017 Association for Computational Linguistics.This work was made possible by NPRP 6-716-1-138 grant from the Qatar National Research Fund (a member of Qatar Foundation). The statementsScopu

    Capturing children food exposure using wearable cameras and deep learning

    Get PDF
    Children’s dietary habits are influenced by complex factors within their home, school and neighborhood environments. Identifying such influencers and assessing their effects is traditionally based on self- reported data which can be prone to recall bias. We developed a culturally acceptable machine-learning-based data-collection system to objectively capture school-children’s exposure to food (including food items, food advertisements, and food outlets) in two urban Arab centers: Greater Beirut, in Lebanon, and Greater Tunis, in Tunisia. Our machine-learning-based system consists of 1) a wearable camera that captures continuous footage of children’s environment during a typical school day, 2) a machine learning model that automatically identifies images related to food from the collected data and discards any other footage, 3) a second machine learning model that classifies food-related images into images that contain actual food items, images that contain food advertisements, and images that contain food outlets, and 4) a third machine learning model that classifies images that contain food items into two classes, corresponding to whether the food items are being consumed by the child wearing the camera or whether they are consumed by others. This manuscript reports on a user-centered design study to assess the acceptability of using wearable cameras to capture food exposure among school children in Greater Beirut and Greater Tunis. We then describe how we trained our first machine learning model to detect food exposure images using data collected from the Web and utilizing the latest trends in deep learning for computer vision. Next, we describe how we trained our other machine learning models to classify food-related images into their respective categories using a combination of public data and data acquired via crowdsourcing. Finally, we describe how the different components of our system were packed together and deployed in a real-world case study and we report on its performance

    DeepNOVA : a deep learning NOVA classifier for food images

    Get PDF
    Assessing the healthiness of food items in images has gained attention in both the computer vision and the nutrition fields. However, such task is generally a difficult one as food images are captured in various settings and thus are usually non-homogeneous. Moreover, assessing how healthy a food item is requires nutritional expertise and knowledge of the constituents of the food item and how it is processed. In this manuscript, we propose an end-to-end deep learning approach that can detect and localize various food items in a given food image using a customized object detection model. Our approach then assesses how healthy each detected food item is by classifying it into one or more of the four NOVA groups (Unprocessed Food, Processed Culinary Ingredients, Processed Food, and Ultra-processed Food). To train our food item detection model, we used two public datasets and a custom one we created ourselves and which contains images of food taken using wearable cameras. To train the NOVA food classifier, we use the custom dataset we created ourselves and that was manually labeled by expert nutritionists. Our food item detection model achieved a mAP of 0.90 and the NOVA food classifier achieved an average F1-score of 0.86 on test data

    Prototyping Across the Disciplines

    Get PDF
    This article pursues the idea that within interdisciplinary teams in which researchers might find themselves participating, there are very different notions of research outcomes, as well as languages in which they are expressed. We explore the notion of the software prototype within the discussion of making and building in digital humanities. The backdrop for our discussion is a collaboration between project team members from computer science and literature that resulted in a tool named TopoText that was built to geocode locations within an unstructured text and to perform some basic Natural Language Processing (NLP) tasks about the context of those locations. In the interest of collaborating more effectively with increasingly larger and more multidisciplinary research communities, we move outward from that specific collaboration to explore one of the ways that such research is characterized in the domain of software engineering—the ISO/IEC 25010:2011 standard. Although not a perfect fit with discourses of value in the humanities, it provides a possible starting point for forging shared vocabularies within the research collaboratory. In particular, we focus on a subset of characteristics outlined by the standard and attempt to translate them into terms generative of further discussion in the digital humanities community

    The impact of digital technology on health of populations affected by humanitarian crises: Recent innovations and current gaps

    Get PDF
    Digital technology is increasingly used in humanitarian action and promises to improve the health and social well-being of populations affected by both acute and protracted crises. We set out to (1) review the current landscape of digital technologies used by humanitarian actors and affected populations, (2) examine their impact on health and well-being of affected populations, and (3) consider the opportunities for and challenges faced by users of these technologies. Through a systematic search of academic databases and reports, we identified 50 digital technologies used by humanitarian actors, and/or populations affected by crises. We organized them according to the stage of the humanitarian cycle that they were used in, and the health outcomes or determinants of health they affected. Digital technologies were found to facilitate communication, coordination, and collection and analysis of data, enabling timely responses in humanitarian contexts. A lack of evaluation of these technologies, a paternalistic approach to their development, and issues of privacy and equity constituted major challenges. We highlight the need to create a space for dialogue between technology designers and populations affected by humanitarian crises

    Maternal and infant outcomes of Syrian and Palestinian refugees, Lebanese and migrant women giving birth in a tertiary public hospital in Lebanon : a secondary analysis of an obstetric database

    Get PDF
    Acknowledgments We would like to thank Rafik Hariri University Hospital for providing the data for this study. Funding The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.Peer reviewedPublisher PD

    A Learning Based Framework for Improving Querying on Web Interfaces of Curated Knowledge Bases

    Get PDF
    Knowledge Bases (KBs) are widely used as one of the fundamental components in Semantic Web applications as they provide facts and relationships that can be automatically understood by machines. Curated knowledge bases usually use Resource Description Framework (RDF) as the data representation model. To query the RDF-presented knowledge in curated KBs, Web interfaces are built via SPARQL Endpoints. Currently, querying SPARQL Endpoints has problems like network instability and latency, which affect the query efficiency. To address these issues, we propose a client-side caching framework, SPARQL Endpoint Caching Framework (SECF), aiming at accelerating the overall querying speed over SPARQL Endpoints. SECF identifies the potential issued queries by leveraging the querying patterns learned from clients’ historical queries and prefecthes/caches these queries. In particular, we develop a distance function based on graph edit distance to measure the similarity of SPARQL queries. We propose a feature modelling method to transform SPARQL queries to vector representation that are fed into machine-learning algorithms. A time-aware smoothing-based method, Modified Simple Exponential Smoothing (MSES), is developed for cache replacement. Extensive experiments performed on real-world queries showcase the effectiveness of our approach, which outperforms the state-of-the-art work in terms of the overall querying speed
    • 

    corecore