56 research outputs found

    Current Challenges in the Application of Algorithms in Multi-institutional Clinical Settings

    Get PDF
    The Coronavirus disease pandemic has highlighted the importance of artificial intelligence in multi-institutional clinical settings. Particularly in situations where the healthcare system is overloaded, and a lot of data is generated, artificial intelligence has great potential to provide automated solutions and to unlock the untapped potential of acquired data. This includes the areas of care, logistics, and diagnosis. For example, automated decision support applications could tremendously help physicians in their daily clinical routine. Especially in radiology and oncology, the exponential growth of imaging data, triggered by a rising number of patients, leads to a permanent overload of the healthcare system, making the use of artificial intelligence inevitable. However, the efficient and advantageous application of artificial intelligence in multi-institutional clinical settings faces several challenges, such as accountability and regulation hurdles, implementation challenges, and fairness considerations. This work focuses on the implementation challenges, which include the following questions: How to ensure well-curated and standardized data, how do algorithms from other domains perform on multi-institutional medical datasets, and how to train more robust and generalizable models? Also, questions of how to interpret results and whether there exist correlations between the performance of the models and the characteristics of the underlying data are part of the work. Therefore, besides presenting a technical solution for manual data annotation and tagging for medical images, a real-world federated learning implementation for image segmentation is introduced. Experiments on a multi-institutional prostate magnetic resonance imaging dataset showcase that models trained by federated learning can achieve similar performance to training on pooled data. Furthermore, Natural Language Processing algorithms with the tasks of semantic textual similarity, text classification, and text summarization are applied to multi-institutional, structured and free-text, oncology reports. The results show that performance gains are achieved by customizing state-of-the-art algorithms to the peculiarities of the medical datasets, such as the occurrence of medications, numbers, or dates. In addition, performance influences are observed depending on the characteristics of the data, such as lexical complexity. The generated results, human baselines, and retrospective human evaluations demonstrate that artificial intelligence algorithms have great potential for use in clinical settings. However, due to the difficulty of processing domain-specific data, there still exists a performance gap between the algorithms and the medical experts. In the future, it is therefore essential to improve the interoperability and standardization of data, as well as to continue working on algorithms to perform well on medical, possibly, domain-shifted data from multiple clinical centers

    SPEC5G: A Dataset for 5G Cellular Network Protocol Analysis

    Full text link
    5G is the 5th generation cellular network protocol. It is the state-of-the-art global wireless standard that enables an advanced kind of network designed to connect virtually everyone and everything with increased speed and reduced latency. Therefore, its development, analysis, and security are critical. However, all approaches to the 5G protocol development and security analysis, e.g., property extraction, protocol summarization, and semantic analysis of the protocol specifications and implementations are completely manual. To reduce such manual effort, in this paper, we curate SPEC5G the first-ever public 5G dataset for NLP research. The dataset contains 3,547,586 sentences with 134M words, from 13094 cellular network specifications and 13 online websites. By leveraging large-scale pre-trained language models that have achieved state-of-the-art results on NLP tasks, we use this dataset for security-related text classification and summarization. Security-related text classification can be used to extract relevant security-related properties for protocol testing. On the other hand, summarization can help developers and practitioners understand the high level of the protocol, which is itself a daunting task. Our results show the value of our 5G-centric dataset in 5G protocol analysis automation. We believe that SPEC5G will enable a new research direction into automatic analyses for the 5G cellular network protocol and numerous related downstream tasks. Our data and code are publicly available

    24th Nordic Conference on Computational Linguistics (NoDaLiDa)

    Get PDF

    Geographic information extraction from texts

    Get PDF
    A large volume of unstructured texts, containing valuable geographic information, is available online. This information – provided implicitly or explicitly – is useful not only for scientific studies (e.g., spatial humanities) but also for many practical applications (e.g., geographic information retrieval). Although large progress has been achieved in geographic information extraction from texts, there are still unsolved challenges and issues, ranging from methods, systems, and data, to applications and privacy. Therefore, this workshop will provide a timely opportunity to discuss the recent advances, new ideas, and concepts but also identify research gaps in geographic information extraction

    Head-Driven Phrase Structure Grammar

    Get PDF
    Head-Driven Phrase Structure Grammar (HPSG) is a constraint-based or declarative approach to linguistic knowledge, which analyses all descriptive levels (phonology, morphology, syntax, semantics, pragmatics) with feature value pairs, structure sharing, and relational constraints. In syntax it assumes that expressions have a single relatively simple constituent structure. This volume provides a state-of-the-art introduction to the framework. Various chapters discuss basic assumptions and formal foundations, describe the evolution of the framework, and go into the details of the main syntactic phenomena. Further chapters are devoted to non-syntactic levels of description. The book also considers related fields and research areas (gesture, sign languages, computational linguistics) and includes chapters comparing HPSG with other frameworks (Lexical Functional Grammar, Categorial Grammar, Construction Grammar, Dependency Grammar, and Minimalism)

    B!SON: A Tool for Open Access Journal Recommendation

    Get PDF
    Finding a suitable open access journal to publish scientific work is a complex task: Researchers have to navigate a constantly growing number of journals, institutional agreements with publishers, funders’ conditions and the risk of Predatory Publishers. To help with these challenges, we introduce a web-based journal recommendation system called B!SON. It is developed based on a systematic requirements analysis, built on open data, gives publisher-independent recommendations and works across domains. It suggests open access journals based on title, abstract and references provided by the user. The recommendation quality has been evaluated using a large test set of 10,000 articles. Development by two German scientific libraries ensures the longevity of the project

    Event detection on streams of short texts for decision-making

    Get PDF
    L'objectif de cette thèse est de concevoir d'évènements sur les réseaux sociaux permettant d'assister les personnes en charge de prises de décisions dans des contextes industriels. Le but est de créer un système de détection d'évènement permettant de détecter des évènements à la fois ciblés, propres à des domaines particuliers mais aussi des évènements généraux. En particulier, nous nous intéressons à l'application de ce système aux chaînes d'approvisionnements et plus particulièrement celles liées aux matières premières. Le défi est de mettre en place un tel système de détection, mais aussi de déterminer quels sont les évènements potentiellement impactant dans ces contextes. Cette synthèse résume les différentes étapes des recherches menées pour répondre à ces problématiques. Architecture d'un système de détection d'évènements Dans un premier temps, nous introduisons les différents éléments nécessaires à la constitution d'un système de détection d'évènements. Ces systèmes sont classiquement constitués d'une étape de filtrage et de nettoyage des données, permettant de s'assurer de la qualité des données traitées par le reste du système. Ensuite, ces données sont représentées de manière à pouvoir être regroupées par similarité. Une fois ces regroupements de données établis, ils sont analysés de manière à savoir si les documents les constituants traitent d'un évènement ou non. Finalement, l'évolution dans le temps de ces évènements est suivie. Nous avons proposé au cours de cette thèse d'étudier les problématiques propres à chacune de ces étapes. Représentation textuelles de documents issus des réseaux sociaux Nous avons comparé différentes méthodes de représentations des données textuelles, dans le contexte de notre système de détection d'évènements. Nous avons comparé les performances de notre système de détection à l'algorithme First Story Detection (FSD), un algorithme ayant les mêmes objectifs. Nous avons d'abord démontré que le système que nous proposons est plus performant que le FSD, mais aussi que les architectures récentes de réseaux de neurones (transformeur) sont plus performantes que TF-IDF dans notre contexte, contrairement à ce qui avait été montré dans le contexte du FSD. Nous avons ensuite proposé de combiner différentes représentations textuelles afin d'exploiter conjointement leurs forces. Détection d'évènement, suivi et évaluation Nous avons proposé des approches pour les composantes d'analyse de regroupement de documents ainsi que pour le suivi de l'évolution de ces évènements. En particulier, nous utilisons l'entropie et la diversité d'utilisateurs introduits dans [Rajouter les citations] pour évaluer les regroupements. Nous suivons ensuite leur évolution au cours du temps en faisant des comparaisons entre regroupements à des instants différents, afin de créer des chaînes de regroupements. Enfin, nous avons étudié comment évaluer des systèmes de détection d'évènements dans des contextes où seulement peu de données annotées par des humains sont disponibles. Nous avons proposé une méthode permettant d'évaluer automatiquement les systèmes de détection d'évènement en exploitant des données partiellement annotées. Application au contexte des matières premières. Afin de spécifier les types d'évènements à superviser, nous avons mené une étude historique des évènements ayant impacté le cours des matières premières. En particulier, nous nous sommes focalisé sur le phosphate, une matière première stratégique. Nous avons étudié les différents facteurs ayant une influence, proposé une méthode reproductible pouvant être appliquée à d'autres matières premières ou d'autres domaines. Enfin, nous avons dressé une liste d'éléments à superviser pour permettre aux experts d'anticiper les variations des cours.The objective of this thesis is to design an event detection system on social networks to assist people in charge of decision-making in industrial contexts. The event detection system must be able to detect both targeted, domain-specific events and general events. In particular, we are interested in the application of this system to supply chains and more specifically those related to raw materials. The challenge is to build such a detection system, but also to determine which events are potentially influencing the raw materials supply chains. This synthesis summarizes the different stages of research conducted to answer these problems. Architecture of an event detection system First, we introduce the different building blocks of an event detection system. These systems are classically composed of a data filtering and cleaning step, ensuring the quality of the data processed by the system. Then, these data are embedded in such a way that they can be clustered by similarity. Once these data clusters are created, they are analyzed in order to know if the documents constituting them discuss an event or not. Finally, the evolution of these events is tracked. In this thesis, we have proposed to study the problems specific to each of these steps. Textual representation of documents from social networks We compared different text representation models, in the context of our event detection system. We also compared the performances of our event detection system to the First Story Detection (FSD) algorithm, an algorithm with the same objectives. We first demonstrated that our proposed system performs better than FSD, but also that recent neural network architectures perform better than TF-IDF in our context, contrary to what was shown in the context of FSD. We then proposed to combine different textual representations in order to jointly exploit their strengths. Event detection, monitoring, and evaluation We have proposed different approaches for event detection and event tracking. In particular, we use the entropy and user diversity introduced in ... to evaluate the clusters. We then track their evolution over time by making comparisons between clusters at different times, in order to create chains of clusters. Finally, we studied how to evaluate event detection systems in contexts where only few human-annotated data are available. We proposed a method to automatically evaluate event detection systems by exploiting partially annotated data. Application to the commodities context In order to specify the types of events to supervise, we conducted a historical study of events that have impacted the price of raw materials. In particular, we focused on phosphate, a strategic raw material. We studied the different factors having an influence, proposed a reproducible method that can be applied to other raw materials or other fields. Finally, we drew up a list of elements to supervise to enable experts to anticipate price variations

    A graph-based framework for data retrieved from criminal-related documents

    Get PDF
    A digitalização das empresas e dos serviços tem potenciado o tratamento e análise de um crescente volume de dados provenientes de fontes heterogeneas, com desafios emergentes, nomeadamente ao nível da representação do conhecimento. Também os Órgãos de Polícia Criminal (OPC) enfrentam o mesmo desafio, tendo em conta o volume de dados não estruturados, provenientes de relatórios policiais, sendo analisados manualmente pelo investigadores criminais, consumindo tempo e recursos. Assim, a necessidade de extrair e representar os dados não estruturados existentes em documentos relacionados com o crime, de uma forma automática, permitindo a redução da análise manual efetuada pelos investigadores criminais. Apresenta-se como um desafio para a ciência dos computadores, dando a possibilidade de propor uma alternativa computacional que permita extrair e representar os dados, adaptando ou propondo métodos computacionais novos. Actualmente existem vários métodos computacionais aplicados ao domínio criminal, nomeadamente a identificação e classificação de entidades nomeadas, por exemplo narcóticos, ou a extracção de relações entre entidades relevantes para a investigação criminal. Estes métodos são maioritariamente aplicadas à lingua inglesa, e em Portugal não há muita atenção à investigação nesta área, inviabilizando a sua aplicação no contexto da investigação criminal. Esta tese propõe uma solução integrada para a representação dos dados não estruturados existentes em documentos, usando um conjunto de métodos computacionais: Preprocessamento de Documentos, que agrupa uma tarefa de Extracção, Transformação e Carregamento adaptado aos documentos relacionados com o crime, seguido por um pipeline de Processamento de Linguagem Natural aplicado à lingua portuguesa, para uma análise sintática e semântica dos dados textuais; Método de Extracção de Informação 5W1H que agrupa métodos de Reconhecimento de Entidades Nomeadas, a detecção da função semântica e a extracção de termos criminais; Preenchimento da Base de Dados de Grafos e Enriquecimento, permitindo a representação dos dados obtidos numa base de dados de grafos Neo4j. Globalmente a solução integrada apresenta resultados promissores, cujos resultados foram validados usando protótipos desemvolvidos para o efeito. Demonstrou-se ainda a viabilidade da extracção dos dados não estruturados, a sua interpretação sintática e semântica, bem como a representação na base de dados de grafos; Abstract: The digitalization of companies processes has enhanced the treatment and analysis of a growing volume of data from heterogeneous sources, with emerging challenges, namely those related to knowledge representation. The Criminal Police has similar challenges, considering the amount of unstructured data from police reports manually analyzed by criminal investigators, with the corresponding time and resources. There is a need to automatically extract and represent the unstructured data existing in criminal-related documents and reduce the manual analysis by criminal investigators. Computer science faces a challenge to apply emergent computational models that can be an alternative to extract and represent the data using new or existing methods. A broad set of computational methods have been applied to the criminal domain, such as the identification and classification named-entities (NEs) or extraction of relations between the entities that are relevant for the criminal investigation, like narcotics. However, these methods have mainly been used in the English language. In Portugal, the research on this domain, applying computational methods, lacks related works, making its application in criminal investigation unfeasible. This thesis proposes an integrated solution for the representation of unstructured data retrieved from documents, using a set of computational methods, such as Preprocessing Criminal-Related Documents module. This module is supported by Extraction, Transformation, and Loading tasks. Followed by a Natural Language Processing pipeline applied to the Portuguese language, for syntactic and semantic analysis of textual data. Next, the 5W1H Information Extraction Method combines the Named-Entity Recognition, Semantic Role Labelling, and Criminal Terms Extraction tasks. Finally, the Graph Database Population and Enrichment allows us the representation of data retrieved into a Neo4j graph database. Globally, the framework presents promising results that were validated using prototypes developed for this purpose. In addition, the feasibility of extracting unstructured data, its syntactic and semantic interpretation, and the graph database representation has also been demonstrated

    A Framework and Process Library for Human-Robot Collaboration in Creative Design and Fabrication

    Get PDF
    In the last two decades, the increasing affordability of industrial robots, along with the growing maturity of computational design software, has led architects to integrate robots into their design process. Robots have exceptional capabilities that enable the fabrication of geometrically complicated components and assembly of complex structures. However, the robot control and motion programming tools currently being adopted by designers were all initially intended for engineering-based manufacturing industries. When using computer-controlled tools, designers cannot adapt their designs to the production process in real time. Current industrial robot control systems force the designer to envision and embed all of the required machining data in the digital model before the fabrication process begins. This requirement makes the process of design to fabrication a unidirectional workflow. In pursuit of a solution, a growing body of research is exploring various human-robot collaboration methods for architectural practices. However, many of these studies are project- based, targeting the ad hoc needs of a particular robotic application or fabrication process. Consequently, this dissertation investigates a generalizable framework for human-robot collaboration that is rooted in the principles of distributed cognition. As an essential part of the research argument, the role of the tools of production in the formation of a designer's cognitive system is considered. This framework, defined for a bi-directional design and fabrication workflow, relies on and integrates material and fabrication feedback into the design process. The framework has three main components: interactive design, adaptive control, and a design and fabrication library. While different aspects of these components have been studied to various extents by other researchers, this dissertation is the first to define them in an integrated manner. Next, the requirements for each of these elements are introduced and discussed in detail. This dissertation focuses in more detail on the library component of the framework because compared to the first two components, it is the least investigated solution to date. A structure for the library is proposed so that the tacit knowledge of makers could be structured, captured, and reused. At its core, the library is a process-centric database where each process is supported by a set of tools, instructions, materials, and geometries required for the transformation of a part into its final form. Finally, this study demonstrates the generalizability of the library concept through a series of experiments developed for different material systems and with various robotic operations.Ph.D
    corecore