1,522 research outputs found

    Surveying human habit modeling and mining techniques in smart spaces

    Get PDF
    A smart space is an environment, mainly equipped with Internet-of-Things (IoT) technologies, able to provide services to humans, helping them to perform daily tasks by monitoring the space and autonomously executing actions, giving suggestions and sending alarms. Approaches suggested in the literature may differ in terms of required facilities, possible applications, amount of human intervention required, ability to support multiple users at the same time adapting to changing needs. In this paper, we propose a Systematic Literature Review (SLR) that classifies most influential approaches in the area of smart spaces according to a set of dimensions identified by answering a set of research questions. These dimensions allow to choose a specific method or approach according to available sensors, amount of labeled data, need for visual analysis, requirements in terms of enactment and decision-making on the environment. Additionally, the paper identifies a set of challenges to be addressed by future research in the field

    Natural Language Processing in-and-for Design Research

    Full text link
    We review the scholarly contributions that utilise Natural Language Processing (NLP) methods to support the design process. Using a heuristic approach, we collected 223 articles published in 32 journals and within the period 1991-present. We present state-of-the-art NLP in-and-for design research by reviewing these articles according to the type of natural language text sources: internal reports, design concepts, discourse transcripts, technical publications, consumer opinions, and others. Upon summarizing and identifying the gaps in these contributions, we utilise an existing design innovation framework to identify the applications that are currently being supported by NLP. We then propose a few methodological and theoretical directions for future NLP in-and-for design research

    Ontology-based annotation using naive Bayes and decision trees

    Get PDF
    The Cognitive Paradigm Ontology (CogPO) defines an ontological relationship between academic terms and experiments in the field of neuroscience. BrainMap (www.brainmap.org) is a database of literature describing these experiments, which are annotated by human experts based on the ontological framework defined in CogPO. We present a stochastic approach to automate this process. We begin with a gold standard corpus of abstracts annotated by experts, and model the annotations with a group of naive Bayes classifiers, then explore the inherent relationship among different components defined by the ontology using a probabilistic decision tree model. Our solution outperforms conventional text mining approaches by taking advantage of an ontology. We consider five essential ontological components (Stimulus Modality, Stimulus Type, Response Modality, Response Type, and Instructions) in CogPO, evaluate the probability of successfully categorizing a research paper on each component by training a basic multi-label naive Bayes classifier with a set of examples taken from the BrainMap database which are already manually annotated by human experts. According to the performance of the classifiers we create a decision tree to label the components sequentially on different levels. Each node of the decision tree is associated with a naive Bayes classifier built in different subspaces of the input universe. We first make decisions on those components whose labels are comparatively easy to predict, and then use these predetermined conditions to narrow down the input space along all tree paths, therefore boosting the performance of the naive Bayes classification upon components whose labels are difficult to predict. For annotating a new instance, we use the classifiers associated with the nodes to find labels for each component, starting from the root and then tracking down the tree perhaps on multiple paths. The annotation is completed when the bottom level is reached, where all labels produced along the paths are collected

    Advanced Data Mining Techniques for Compound Objects

    Get PDF
    Knowledge Discovery in Databases (KDD) is the non-trivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in large data collections. The most important step within the process of KDD is data mining which is concerned with the extraction of the valid patterns. KDD is necessary to analyze the steady growing amount of data caused by the enhanced performance of modern computer systems. However, with the growing amount of data the complexity of data objects increases as well. Modern methods of KDD should therefore examine more complex objects than simple feature vectors to solve real-world KDD applications adequately. Multi-instance and multi-represented objects are two important types of object representations for complex objects. Multi-instance objects consist of a set of object representations that all belong to the same feature space. Multi-represented objects are constructed as a tuple of feature representations where each feature representation belongs to a different feature space. The contribution of this thesis is the development of new KDD methods for the classification and clustering of complex objects. Therefore, the thesis introduces solutions for real-world applications that are based on multi-instance and multi-represented object representations. On the basis of these solutions, it is shown that a more general object representation often provides better results for many relevant KDD applications. The first part of the thesis is concerned with two KDD problems for which employing multi-instance objects provides efficient and effective solutions. The first is the data mining in CAD parts, e.g. the use of hierarchic clustering for the automatic construction of product hierarchies. The introduced solution decomposes a single part into a set of feature vectors and compares them by using a metric on multi-instance objects. Furthermore, multi-step query processing using a novel filter step is employed, enabling the user to efficiently process similarity queries. On the basis of this similarity search system, it is possible to perform several distance based data mining algorithms like the hierarchical clustering algorithm OPTICS to derive product hierarchies. The second important application is the classification and search for complete websites in the world wide web (WWW). A website is a set of HTML-documents that is published by the same person, group or organization and usually serves a common purpose. To perform data mining for websites, the thesis presents several methods to classify websites. After introducing naive methods modelling websites as webpages, two more sophisticated approaches to website classification are introduced. The first approach uses a preprocessing that maps single HTML-documents within each website to so-called page classes. The second approach directly compares websites as sets of word vectors and uses nearest neighbor classification. To search the WWW for new, relevant websites, a focused crawler is introduced that efficiently retrieves relevant websites. This crawler minimizes the number of HTML-documents and increases the accuracy of website retrieval. The second part of the thesis is concerned with the data mining in multi-represented objects. An important example application for this kind of complex objects are proteins that can be represented as a tuple of a protein sequence and a text annotation. To analyze multi-represented objects, a clustering method for multi-represented objects is introduced that is based on the density based clustering algorithm DBSCAN. This method uses all representations that are provided to find a global clustering of the given data objects. However, in many applications there already exists a sophisticated class ontology for the given data objects, e.g. proteins. To map new objects into an ontology a new method for the hierarchical classification of multi-represented objects is described. The system employs the hierarchical structure of the ontology to efficiently classify new proteins, using support vector machines

    A crowdsourcing recommendation model for image annotations in cultural heritage platforms

    Get PDF
    Cultural heritage is one of many fields that has seen a significant digital transformation in the form of digitization and asset annotations for heritage preservation, inheritance, and dissemination. However, a lack of accurate and descriptive metadata in this field has an impact on the usability and discoverability of digital content, affecting cultural heritage platform visitors and resulting in an unsatisfactory user experience as well as limiting processing capabilities to add new functionalities. Over time, cultural heritage institutions were responsible for providing metadata for their collection items with the help of professionals, which is expensive and requires significant effort and time. In this sense, crowdsourcing can play a significant role in digital transformation or massive data processing, which can be useful for leveraging the crowd and enriching the metadata quality of digital cultural content. This paper focuses on a very important challenge faced by cultural heritage crowdsourcing platforms, which is how to attract users and make such activities enjoyable for them in order to achieve higher-quality annotations. One way to address this is to offer personalized interesting items based on each user preference, rather than making the user experience random and demanding. Thus, we present an image annotation recommendation system for users of cultural heritage platforms. The recommendation system design incorporates various technologies intending to help users in selecting the best matching images for annotations based on their interests and characteristics. Different classification methods were implemented to validate the accuracy of our work on Egyptian heritage.Agencia Estatal de InvestigaciĂłn | Ref. TIN2017-87604-RXunta de Galicia | Ref. ED431B 2020/3

    Entity-centric knowledge discovery for idiosyncratic domains

    Get PDF
    Technical and scientific knowledge is produced at an ever-accelerating pace, leading to increasing issues when trying to automatically organize or process it, e.g., when searching for relevant prior work. Knowledge can today be produced both in unstructured (plain text) and structured (metadata or linked data) forms. However, unstructured content is still themost dominant formused to represent scientific knowledge. In order to facilitate the extraction and discovery of relevant content, new automated and scalable methods for processing, structuring and organizing scientific knowledge are called for. In this context, a number of applications are emerging, ranging fromNamed Entity Recognition (NER) and Entity Linking tools for scientific papers to specific platforms leveraging information extraction techniques to organize scientific knowledge. In this thesis, we tackle the tasks of Entity Recognition, Disambiguation and Linking in idiosyncratic domains with an emphasis on scientific literature. Furthermore, we study the related task of co-reference resolution with a specific focus on named entities. We start by exploring Named Entity Recognition, a task that aims to identify the boundaries of named entities in textual contents. We propose a newmethod to generate candidate named entities based on n-gram collocation statistics and design several entity recognition features to further classify them. In addition, we show how the use of external knowledge bases (either domain-specific like DBLP or generic like DBPedia) can be leveraged to improve the effectiveness of NER for idiosyncratic domains. Subsequently, we move to Entity Disambiguation, which is typically performed after entity recognition in order to link an entity to a knowledge base. We propose novel semi-supervised methods for word disambiguation leveraging the structure of a community-based ontology of scientific concepts. Our approach exploits the graph structure that connects different terms and their definitions to automatically identify the correct sense that was originally picked by the authors of a scientific publication. We then turn to co-reference resolution, a task aiming at identifying entities that appear using various forms throughout the text. We propose an approach to type entities leveraging an inverted index built on top of a knowledge base, and to subsequently re-assign entities based on the semantic relatedness of the introduced types. Finally, we describe an application which goal is to help researchers discover and manage scientific publications. We focus on the problem of selecting relevant tags to organize collections of research papers in that context. We experimentally demonstrate that the use of a community-authored ontology together with information about the position of the concepts in the documents allows to significantly increase the precision of tag selection over standard methods

    Performance Evaluation of Smart Decision Support Systems on Healthcare

    Get PDF
    Medical activity requires responsibility not only from clinical knowledge and skill but also on the management of an enormous amount of information related to patient care. It is through proper treatment of information that experts can consistently build a healthy wellness policy. The primary objective for the development of decision support systems (DSSs) is to provide information to specialists when and where they are needed. These systems provide information, models, and data manipulation tools to help experts make better decisions in a variety of situations. Most of the challenges that smart DSSs face come from the great difficulty of dealing with large volumes of information, which is continuously generated by the most diverse types of devices and equipment, requiring high computational resources. This situation makes this type of system susceptible to not recovering information quickly for the decision making. As a result of this adversity, the information quality and the provision of an infrastructure capable of promoting the integration and articulation among different health information systems (HIS) become promising research topics in the field of electronic health (e-health) and that, for this same reason, are addressed in this research. The work described in this thesis is motivated by the need to propose novel approaches to deal with problems inherent to the acquisition, cleaning, integration, and aggregation of data obtained from different sources in e-health environments, as well as their analysis. To ensure the success of data integration and analysis in e-health environments, it is essential that machine-learning (ML) algorithms ensure system reliability. However, in this type of environment, it is not possible to guarantee a reliable scenario. This scenario makes intelligent SAD susceptible to predictive failures, which severely compromise overall system performance. On the other hand, systems can have their performance compromised due to the overload of information they can support. To solve some of these problems, this thesis presents several proposals and studies on the impact of ML algorithms in the monitoring and management of hypertensive disorders related to pregnancy of risk. The primary goals of the proposals presented in this thesis are to improve the overall performance of health information systems. In particular, ML-based methods are exploited to improve the prediction accuracy and optimize the use of monitoring device resources. It was demonstrated that the use of this type of strategy and methodology contributes to a significant increase in the performance of smart DSSs, not only concerning precision but also in the computational cost reduction used in the classification process. The observed results seek to contribute to the advance of state of the art in methods and strategies based on AI that aim to surpass some challenges that emerge from the integration and performance of the smart DSSs. With the use of algorithms based on AI, it is possible to quickly and automatically analyze a larger volume of complex data and focus on more accurate results, providing high-value predictions for a better decision making in real time and without human intervention.A atividade mĂ©dica requer responsabilidade nĂŁo apenas com base no conhecimento e na habilidade clĂ­nica, mas tambĂ©m na gestĂŁo de uma enorme quantidade de informaçÔes relacionadas ao atendimento ao paciente. É atravĂ©s do tratamento adequado das informaçÔes que os especialistas podem consistentemente construir uma polĂ­tica saudĂĄvel de bem-estar. O principal objetivo para o desenvolvimento de sistemas de apoio Ă  decisĂŁo (SAD) Ă© fornecer informaçÔes aos especialistas onde e quando sĂŁo necessĂĄrias. Esses sistemas fornecem informaçÔes, modelos e ferramentas de manipulação de dados para ajudar os especialistas a tomar melhores decisĂ”es em diversas situaçÔes. A maioria dos desafios que os SAD inteligentes enfrentam advĂȘm da grande dificuldade de lidar com grandes volumes de dados, que Ă© gerada constantemente pelos mais diversos tipos de dispositivos e equipamentos, exigindo elevados recursos computacionais. Essa situação torna este tipo de sistemas suscetĂ­vel a nĂŁo recuperar a informação rapidamente para a tomada de decisĂŁo. Como resultado dessa adversidade, a qualidade da informação e a provisĂŁo de uma infraestrutura capaz de promover a integração e a articulação entre diferentes sistemas de informação em saĂșde (SIS) tornam-se promissores tĂłpicos de pesquisa no campo da saĂșde eletrĂŽnica (e-saĂșde) e que, por essa mesma razĂŁo, sĂŁo abordadas nesta investigação. O trabalho descrito nesta tese Ă© motivado pela necessidade de propor novas abordagens para lidar com os problemas inerentes Ă  aquisição, limpeza, integração e agregação de dados obtidos de diferentes fontes em ambientes de e-saĂșde, bem como sua anĂĄlise. Para garantir o sucesso da integração e anĂĄlise de dados em ambientes e-saĂșde Ă© importante que os algoritmos baseados em aprendizagem de mĂĄquina (AM) garantam a confiabilidade do sistema. No entanto, neste tipo de ambiente, nĂŁo Ă© possĂ­vel garantir um cenĂĄrio totalmente confiĂĄvel. Esse cenĂĄrio torna os SAD inteligentes suscetĂ­veis Ă  presença de falhas de predição que comprometem seriamente o desempenho geral do sistema. Por outro lado, os sistemas podem ter seu desempenho comprometido devido Ă  sobrecarga de informaçÔes que podem suportar. Para tentar resolver alguns destes problemas, esta tese apresenta vĂĄrias propostas e estudos sobre o impacto de algoritmos de AM na monitoria e gestĂŁo de transtornos hipertensivos relacionados com a gravidez (gestação) de risco. O objetivo das propostas apresentadas nesta tese Ă© melhorar o desempenho global de sistemas de informação em saĂșde. Em particular, os mĂ©todos baseados em AM sĂŁo explorados para melhorar a precisĂŁo da predição e otimizar o uso dos recursos dos dispositivos de monitorização. Ficou demonstrado que o uso deste tipo de estratĂ©gia e metodologia contribui para um aumento significativo do desempenho dos SAD inteligentes, nĂŁo sĂł em termos de precisĂŁo, mas tambĂ©m na diminuição do custo computacional utilizado no processo de classificação. Os resultados observados buscam contribuir para o avanço do estado da arte em mĂ©todos e estratĂ©gias baseadas em inteligĂȘncia artificial que visam ultrapassar alguns desafios que advĂȘm da integração e desempenho dos SAD inteligentes. Como o uso de algoritmos baseados em inteligĂȘncia artificial Ă© possĂ­vel analisar de forma rĂĄpida e automĂĄtica um volume maior de dados complexos e focar em resultados mais precisos, fornecendo previsĂ”es de alto valor para uma melhor tomada de decisĂŁo em tempo real e sem intervenção humana
    • 

    corecore