347 research outputs found

    Understanding egocentric human actions with temporal decision forests

    Get PDF
    Understanding human actions is a fundamental task in computer vision with a wide range of applications including pervasive health-care, robotics and game control. This thesis focuses on the problem of egocentric action recognition from RGB-D data, wherein the world is viewed through the eyes of the actor whose hands describe the actions. The main contributions of this work are its findings regarding egocentric actions as described by hands in two application scenarios and a proposal of a new technique that is based on temporal decision forests. The thesis first introduces a novel framework to recognise fingertip writing in mid-air in the context of human-computer interaction. This framework detects whether the user is writing and tracks the fingertip over time to generate spatio-temporal trajectories that are recognised by using a Hough forest variant that encourages temporal consistency in prediction. A problem with using such forest approach for action recognition is that the learning of temporal dynamics is limited to hand-crafted temporal features and temporal regression, which may break the temporal continuity and lead to inconsistent predictions. To overcome this limitation, the thesis proposes transition forests. Besides any temporal information that is encoded in the feature space, the forest automatically learns the temporal dynamics during training, and it is exploited in inference in an online and efficient manner achieving state-of-the-art results. The last contribution of this thesis is its introduction of the first RGB-D benchmark to allow for the study of egocentric hand-object actions with both hand and object pose annotations. This study conducts an extensive evaluation of different baselines, state-of-the art approaches and temporal decision forest models using colour, depth and hand pose features. Furthermore, it extends the transition forest model to incorporate data from different modalities and demonstrates the benefit of using hand pose features to recognise egocentric human actions. The thesis concludes by discussing and analysing the contributions and proposing a few ideas for future work.Open Acces

    Context-aware ranking : from search to dialogue

    Full text link
    Les systèmes de recherche d'information (RI) ou moteurs de recherche ont été largement utilisés pour trouver rapidement les informations pour les utilisateurs. Le classement est la fonction centrale de la RI, qui vise à ordonner les documents candidats dans une liste classée en fonction de leur pertinence par rapport à une requête de l'utilisateur. Alors que IR n'a considéré qu'une seule requête au début, les systèmes plus récents prennent en compte les informations de contexte. Par exemple, dans une session de recherche, le contexte de recherche tel que le requêtes et interactions précédentes avec l'utilisateur, est largement utilisé pour comprendre l'intention de la recherche de l'utilisateur et pour aider au classement des documents. En plus de la recherche ad-hoc traditionnelle, la RI a été étendue aux systèmes de dialogue (c'est-à-dire, le dialogue basé sur la recherche, par exemple, XiaoIce), où on suppose avoir un grand référentiel de dialogues et le but est de trouver la réponse pertinente à l'énoncé courant d'un utilisateur. Encore une fois, le contexte du dialogue est un élément clé pour déterminer la pertinence d'une réponse. L'utilisation des informations contextuelles a fait l'objet de nombreuses études, allant de l'extraction de mots-clés importants du contexte pour étendre la requête ou l'énoncé courant de dialogue, à la construction d'une représentation neuronale du contexte qui sera utilisée avec la requête ou l'énoncé de dialogue pour la recherche. Nous remarquons deux d'importantes insuffisances dans la littérature existante. (1) Pour apprendre à utiliser les informations contextuelles, on doit extraire des échantillons positifs et négatifs pour l'entraînement. On a généralement supposé qu'un échantillon positif est formé lorsqu'un utilisateur interagit avec (clique sur) un document dans un contexte, et un un échantillon négatif est formé lorsqu'aucune interaction n'est observée. En réalité, les interactions des utilisateurs sont éparses et bruitées, ce qui rend l'hypothèse ci-dessus irréaliste. Il est donc important de construire des exemples d'entraînement d'une manière plus appropriée. (2) Dans les systèmes de dialogue, en particulier les systèmes de bavardage (chitchat), on cherche à trouver ou générer les réponses sans faire référence à des connaissances externes, ce qui peut facilement provoquer des réponses non pertinentes ou des hallucinations. Une solution consiste à fonder le dialogue sur des documents ou graphe de connaissances externes, où les documents ou les graphes de connaissances peuvent être considérés comme de nouveaux types de contexte. Le dialogue fondé sur les documents et les connaissances a été largement étudié, mais les approches restent simplistes dans la mesure où le contenu du document ou les connaissances sont généralement concaténés à l'énoncé courant. En réalité, seules certaines parties du document ou du graphe de connaissances sont pertinentes, ce qui justifie un modèle spécifique pour leur sélection. Dans cette thèse, nous étudions le problème du classement de textes en tenant compte du contexte dans le cadre de RI ad-hoc et de dialogue basé sur la recherche. Nous nous concentrons sur les deux problèmes mentionnés ci-dessus. Spécifiquement, nous proposons des approches pour apprendre un modèle de classement pour la RI ad-hoc basée sur des exemples d'entraîenemt sélectionnés à partir d'interactions utilisateur bruitées (c'est-à-dire des logs de requêtes) et des approches à exploiter des connaissances externes pour la recherche de réponse pour le dialogue. La thèse est basée sur cinq articles publiés. Les deux premiers articles portent sur le classement contextuel des documents. Ils traitent le problème ovservé dans les études existantes, qui considèrent tous les clics dans les logs de recherche comme des échantillons positifs, et prélever des documents non cliqués comme échantillons négatifs. Dans ces deux articles, nous proposons d'abord une stratégie d'augmentation de données non supervisée pour simuler les variations potentielles du comportement de l'utilisateur pour tenir compte de la sparcité des comportements des utilisateurs. Ensuite, nous appliquons l'apprentissage contrastif pour identifier ces variations et à générer une représentation plus robuste du comportement de l'utilisateur. D'un autre côté, comprendre l'intention de recherche dans une session de recherche peut représentent différents niveaux de difficulté - certaines intentions sont faciles à comprendre tandis que d'autres sont plus difficiles et nuancées. Mélanger directement ces sessions dans le même batch d'entraînement perturbera l'optimisation du modèle. Par conséquent, nous proposons un cadre d'apprentissage par curriculum avec des examples allant de plus faciles à plus difficiles. Les deux méthodes proposées obtiennent de meilleurs résultats que les méthodes existantes sur deux jeux de données de logs de requêtes réels. Les trois derniers articles se concentrent sur les systèmes de dialogue fondé les documents/connaissances. Nous proposons d'abord un mécanisme de sélection de contenu pour le dialogue fondé sur des documents. Les expérimentations confirment que la sélection de contenu de document pertinent en fonction du contexte du dialogue peut réduire le bruit dans le document et ainsi améliorer la qualité du dialogue. Deuxièmement, nous explorons une nouvelle tâche de dialogue qui vise à générer des dialogues selon une description narrative. Nous avons collecté un nouveau jeu de données dans le domaine du cinéma pour nos expérimentations. Les connaissances sont définies par une narration qui décrit une partie du scénario du film (similaire aux dialogues). Le but est de créer des dialogues correspondant à la narration. À cette fin, nous concevons un nouveau modèle qui tient l'état de la couverture de la narration le long des dialogues et déterminer la partie non couverte pour le prochain tour. Troisièmement, nous explorons un modèle de dialogue proactif qui peut diriger de manière proactive le dialogue dans une direction pour couvrir les sujets requis. Nous concevons un module de prédiction explicite des connaissances pour sélectionner les connaissances pertinentes à utiliser. Pour entraîner le processus de sélection, nous générons des signaux de supervision par une méthode heuristique. Les trois articles examinent comment divers types de connaissances peuvent être intégrés dans le dialogue. Le contexte est un élément important dans la RI ad-hoc et le dialogue, mais nous soutenons que le contexte doit être compris au sens large. Dans cette thèse, nous incluons à la fois les interactions précédentes avec l'utilisateur, le document et les connaissances dans le contexte. Cette série d'études est un pas dans la direction de l'intégration d'informations contextuelles diverses dans la RI et le dialogue.Information retrieval (IR) or search systems have been widely used to quickly find desired information for users. Ranking is the central function of IR, which aims at ordering the candidate documents in a ranked list according to their relevance to a user query. While IR only considered a single query in the early stages, more recent systems take into account the context information. For example, in a search session, the search context, such as the previous queries and interactions with the user, is widely used to understand the user's search intent and to help document ranking. In addition to the traditional ad-hoc search, IR has been extended to dialogue systems (i.e., retrieval-based dialogue, e.g., XiaoIce), where one assumes a large repository of previous dialogues and the goal is to retrieve the most relevant response to a user's current utterance. Again, the dialogue context is a key element for determining the relevance of a response. The utilization of context information has been investigated in many studies, which range from extracting important keywords from the context to expand the query or current utterance, to building a neural context representation used with the query or current utterance for search. We notice two important insufficiencies in the existing literature. (1) To learn to use context information, one has to extract positive and negative samples for training. It has been generally assumed that a positive sample is formed when a user interacts with a document in a context, and a negative sample is formed when no interaction is observed. In reality, user interactions are scarce and noisy, making the above assumption unrealistic. It is thus important to build more appropriate training examples. (2) In dialogue systems, especially chitchat systems, responses are typically retrieved or generated without referring to external knowledge. This may easily lead to hallucinations. A solution is to ground dialogue on external documents or knowledge graphs, where the grounding document or knowledge can be seen as new types of context. Document- and knowledge-grounded dialogue have been extensively studied, but the approaches remain simplistic in that the document content or knowledge is typically concatenated to the current utterance. In reality, only parts of the grounding document or knowledge are relevant, which warrant a specific model for their selection. In this thesis, we study the problem of context-aware ranking for ad-hoc document ranking and retrieval-based dialogue. We focus on the two problems mentioned above. Specifically, we propose approaches to learning a ranking model for ad-hoc retrieval based on training examples selected from noisy user interactions (i.e., query logs), and approaches to exploit external knowledge for response retrieval in retrieval-based dialogue. The thesis is based on five published articles. The first two articles are about context-aware document ranking. They deal with the problem in the existing studies that consider all clicks in the search logs as positive samples, and sample unclicked documents as negative samples. In the first paper, we propose an unsupervised data augmentation strategy to simulate potential variations of user behavior sequences to take into account the scarcity of user behaviors. Then, we apply contrastive learning to identify these variations and generate a more robust representation for user behavior sequences. On the other hand, understanding the search intent of search sessions may represent different levels of difficulty -- some are easy to understand while others are more difficult. Directly mixing these search sessions in the same training batch will disturb the model optimization. Therefore, in the second paper, we propose a curriculum learning framework to learn the training samples in an easy-to-hard manner. Both proposed methods achieve better performance than the existing methods on two real search log datasets. The latter three articles focus on knowledge-grounded retrieval-based dialogue systems. We first propose a content selection mechanism for document-grounded dialogue and demonstrate that selecting relevant document content based on dialogue context can effectively reduce the noise in the document and increase dialogue quality. Second, we explore a new task of dialogue, which is required to generate dialogue according to a narrative description. We collect a new dataset in the movie domain to support our study. The knowledge is defined as a narrative that describes a part of a movie script (similar to dialogues). The goal is to create dialogues corresponding to the narrative. To this end, we design a new model that can track the coverage of the narrative along the dialogues and determine the uncovered part for the next turn. Third, we explore a proactive dialogue model that can proactively lead the dialogue to cover the required topics. We design an explicit knowledge prediction module to select relevant pieces of knowledge to use. To train the selection process, we generate weak-supervision signals using a heuristic method. All of the three papers investigate how various types of knowledge can be integrated into dialogue. Context is an important element in ad-hoc search and dialogue, but we argue that context should be understood in a broad sense. In this thesis, we include both previous interactions and the grounding document and knowledge as part of the context. This series of studies is one step in the direction of incorporating broad context information into search and dialogue

    Software Engineering Methods for the Internet of Things: A Comparative Review

    Get PDF
    Accessing different physical objects at any time from anywhere through wireless network heavily impacts the living style of societies worldwide nowadays. Thus, the Internet of Things has now become a hot emerging paradigm in computing environments. Issues like interoperability, software reusability, and platform independence of those physical objects are considered the main current challenges. This raises the need for appropriate software engineering approaches to develop effective and efficient IoT applications software. This paper studies the state of the art of design and development methodologies for IoT software. The aim is to study how proposed approaches have been solved issues of interoperability, reusability, and independence of the platform. A comparative study is presented for the different software engineering methods used for the Internet of Things. Finally, the key research gaps and open issues are highlighted as future directions

    Process-Oriented Information Logistics: Aligning Process Information with Business Processes

    Get PDF
    During the last decade, research in the field of business process management (BPM) has focused on the design, modeling, execution, monitoring, and optimization of business processes. What has been neglected, however, is the provision of knowledge workers and decision makers with needed information when performing knowledge-intensive business processes such as product engineering, customer support, or strategic management. Today, knowledge workers and decision makers are confronted with a massive load of data, making it difficult for them to discover the information relevant for performing their tasks. Particularly challenging in this context is the alignment of process-related information (process information for short), such as e-mails, office files, forms, checklists, guidelines, and best practices, with business processes and their tasks. In practice, process information is not only stored in large, distributed and heterogeneous sources, but usually managed separately from business processes. For example, shared drives, databases, enterprise portals, and enterprise information systems are used to store process information. In turn, business processes are managed using advanced process management technology. As a consequence, process information and business processes often need to be manually linked; i.e., process information is hard-wired to business processes, e.g., in enterprise portals associating specific process information with process tasks. This approach often fails due to high maintenance efforts and missing support for the individual demands of knowledge workers and decision makers. In response to this problem, this thesis introduces process-oriented information logistics(POIL) as new paradigm for delivering the right process information, in the right format and quality, at the right place and the right point in time, to the right people. In particular, POIL allows for the process-oriented, context-aware (i.e., personalized) delivery of process information to process participants. The goal is to no longer manually hard-wire process information to business processes, but to automatically identify and deliver relevant process information to knowledge workers and decision makers. The core component of POIL is a semantic information network (SIN), which comprises homogeneous information objects (e.g., e-mails, offce files, guidelines), process objects (e.g., tasks, events, roles), and relationships between them. In particular, a SIN allows discovering objects linked with each other in different ways, e.g., objects addressing the same topic or needed when performing a particular process task. The SIN not only enables an integrated formal representation of process information and business processes, but also allows determining the relevance of process information for a given work context based on novel techniques and algorithms. Note that this becomes crucial in order to achieve the aforementioned overall goal of this thesis

    Strategic Roadmaps and Implementation Actions for ICT in Construction

    Get PDF

    Probabilistic neural topic models for text understanding

    Get PDF
    Making sense of text is still one of the most fascinating and open challenges thanks and despite the vast amount of information continuously produced by recent technologies. Along with the growing size of textual data, automatic approaches have to deal with the wide variety of linguistic features across different domains and contexts: for example, user reviews might be characterised by colloquial idioms, slang or contractions; while clinical notes often contain technical jargon, with typical medical abbreviations and polysemous words whose meaning strictly depend on the particular context in which they were used. We propose to address these issues by combining topic modelling principles and models with distributional word representations. Topic models generate concise and expressive representations for high volumes of documents by clustering words into “topics”, which can be interpreted as document decompositions. They are focused on analysing the global context of words and their co-occurrences within the whole corpus. Distributional language representations, instead, encode the word syntactic and semantic properties by leveraging the word local contexts and can be conveniently pre-trained to facilitate the model training and the simultaneous encoding of external knowledge. Our work represents one step in bridging the gap between the recent advances in topic modelling and the increasingly richer distributional word representations, with the aim of addressing the aforementioned issues related to different linguistic features within different domains. In this thesis, we first propose a hierarchical neural model inspired by topic modelling, which leverages an attention mechanism along with a novel neural cell for fine-grained detection of sentiments and themes discussed in user reviews. Next, we present a neural topic model with adversarial training to distinguish topics based on their high-level semantics (e.g. opinions or factual descriptions). Then, we design a probabilistic topic model specialised for the extraction of biomedical phrases, whose inference process goes beyond the limitations of traditional topic models by seamlessly combining the word co-occurrences statistics with the information from word embeddings. Finally, inspired by the usage of entities in topic modelling [85], we design a novel masking strategy to fine-tune language models for biomedical question-answering. For each of the above models, we report experimental assessments supporting their efficacy across a wide variety of tasks and domains
    • …
    corecore