33 research outputs found

    Biomedical ontology alignment: An approach based on representation learning

    Get PDF
    While representation learning techniques have shown great promise in application to a number of different NLP tasks, they have had little impact on the problem of ontology matching. Unlike past work that has focused on feature engineering, we present a novel representation learning approach that is tailored to the ontology matching task. Our approach is based on embedding ontological terms in a high-dimensional Euclidean space. This embedding is derived on the basis of a novel phrase retrofitting strategy through which semantic similarity information becomes inscribed onto fields of pre-trained word vectors. The resulting framework also incorporates a novel outlier detection mechanism based on a denoising autoencoder that is shown to improve performance. An ontology matching system derived using the proposed framework achieved an F-score of 94% on an alignment scenario involving the Adult Mouse Anatomical Dictionary and the Foundational Model of Anatomy ontology (FMA) as targets. This compares favorably with the best performing systems on the Ontology Alignment Evaluation Initiative anatomy challenge. We performed additional experiments on aligning FMA to NCI Thesaurus and to SNOMED CT based on a reference alignment extracted from the UMLS Metathesaurus. Our system obtained overall F-scores of 93.2% and 89.2% for these experiments, thus achieving state-of-the-art results

    The Emerging Internet of Things Marketplace From an Industrial Perspective: A Survey

    Get PDF
    The Internet of Things (IoT) is a dynamic global information network consisting of internet-connected objects, such as Radio-frequency identification (RFIDs), sensors, actuators, as well as other instruments and smart appliances that are becoming an integral component of the future internet. Over the last decade, we have seen a large number of the IoT solutions developed by start-ups, small and medium enterprises, large corporations, academic research institutes (such as universities), and private and public research organisations making their way into the market. In this paper, we survey over one hundred IoT smart solutions in the marketplace and examine them closely in order to identify the technologies used, functionalities, and applications. More importantly, we identify the trends, opportunities and open challenges in the industry-based the IoT solutions. Based on the application domain, we classify and discuss these solutions under five different categories: smart wearable, smart home, smart, city, smart environment, and smart enterprise. This survey is intended to serve as a guideline and conceptual framework for future research in the IoT and to motivate and inspire further developments. It also provides a systematic exploration of existing research and suggests a number of potentially significant research directions.Comment: IEEE Transactions on Emerging Topics in Computing 201

    Exploiting word embeddings for modeling bilexical relations

    Get PDF
    There has been an exponential surge of text data in the recent years. As a consequence, unsupervised methods that make use of this data have been steadily growing in the field of natural language processing (NLP). Word embeddings are low-dimensional vectors obtained using unsupervised techniques on the large unlabelled corpora, where words from the vocabulary are mapped to vectors of real numbers. Word embeddings aim to capture syntactic and semantic properties of words. In NLP, many tasks involve computing the compatibility between lexical items under some linguistic relation. We call this type of relation a bilexical relation. Our thesis defines statistical models for bilexical relations that centrally make use of word embeddings. Our principle aim is that the word embeddings will favor generalization to words not seen during the training of the model. The thesis is structured in four parts. In the first part of this thesis, we present a bilinear model over word embeddings that leverages a small supervised dataset for a binary linguistic relation. Our learning algorithm exploits low-rank bilinear forms and induces a low-dimensional embedding tailored for a target linguistic relation. This results in compressed task-specific embeddings. In the second part of our thesis, we extend our bilinear model to a ternary setting and propose a framework for resolving prepositional phrase attachment ambiguity using word embeddings. Our models perform competitively with state-of-the-art models. In addition, our method obtains significant improvements on out-of-domain tests by simply using word-embeddings induced from source and target domains. In the third part of this thesis, we further extend the bilinear models for expanding vocabulary in the context of statistical phrase-based machine translation. Our model obtains a probabilistic list of possible translations of target language words, given a word in the source language. We do this by projecting pre-trained embeddings into a common subspace using a log-bilinear model. We empirically notice a significant improvement on an out-of-domain test set. In the final part of our thesis, we propose a non-linear model that maps initial word embeddings to task-tuned word embeddings, in the context of a neural network dependency parser. We demonstrate its use for improved dependency parsing, especially for sentences with unseen words. We also show downstream improvements on a sentiment analysis task.En els darrers anys hi ha hagut un sorgiment notable de dades en format textual. Conseqüentment, en el camp del Processament del Llenguatge Natural (NLP, de l'anglès "Natural Language Processing") s'han desenvolupat mètodes no supervistats que fan ús d'aquestes dades. Els anomenats "word embeddings", o embeddings de paraules, són vectors de dimensionalitat baixa que s'obtenen mitjançant tècniques no supervisades aplicades a corpus textuals de grans volums. Com a resultat, cada paraula del diccionari es correspon amb un vector de nombres reals, el propòsit del qual és capturar propietats sintàctiques i semàntiques de la paraula corresponent. Moltes tasques de NLP involucren calcular la compatibilitat entre elements lèxics en l'àmbit d'una relació lingüística. D'aquest tipus de relació en diem relació bilèxica. Aquesta tesi proposa models estadístics per a relacions bilèxiques que fan ús central d'embeddings de paraules, amb l'objectiu de millorar la generalització del model lingüístic a paraules no vistes durant l'entrenament. La tesi s'estructura en quatre parts. A la primera part presentem un model bilineal sobre embeddings de paraules que explota un conjunt petit de dades anotades sobre una relaxió bilèxica. L'algorisme d'aprenentatge treballa amb formes bilineals de poc rang, i indueix embeddings de poca dimensionalitat que estan especialitzats per la relació bilèxica per la qual s'han entrenat. Com a resultat, obtenim embeddings de paraules que corresponen a compressions d'embeddings per a una relació determinada. A la segona part de la tesi proposem una extensió del model bilineal a trilineal, i amb això proposem un nou model per a resoldre ambigüitats de sintagmes preposicionals que usa només embeddings de paraules. En una sèrie d'avaluacións, els nostres models funcionen de manera similar a l'estat de l'art. A més, el nostre mètode obté millores significatives en avaluacions en textos de dominis diferents al d'entrenament, simplement usant embeddings induïts amb textos dels dominis d'entrenament i d'avaluació. A la tercera part d'aquesta tesi proposem una altra extensió dels models bilineals per ampliar la cobertura lèxica en el context de models estadístics de traducció automàtica. El nostre model probabilístic obté, donada una paraula en la llengua d'origen, una llista de possibles traduccions en la llengua de destí. Fem això mitjançant una projecció d'embeddings pre-entrenats a un sub-espai comú, usant un model log-bilineal. Empíricament, observem una millora significativa en avaluacions en dominis diferents al d'entrenament. Finalment, a la quarta part de la tesi proposem un model no lineal que indueix una correspondència entre embeddings inicials i embeddings especialitzats, en el context de tasques d'anàlisi sintàctica de dependències amb models neuronals. Mostrem que aquest mètode millora l'analisi de dependències, especialment en oracions amb paraules no vistes durant l'entrenament. També mostrem millores en un tasca d'anàlisi de sentiment

    BeeFlow: Behavior Tree-based Serverless Workflow Modeling and Scheduling for Resource-Constrained Edge Clusters

    Full text link
    Serverless computing has gained popularity in edge computing due to its flexible features, including the pay-per-use pricing model, auto-scaling capabilities, and multi-tenancy support. Complex Serverless-based applications typically rely on Serverless workflows (also known as Serverless function orchestration) to express task execution logic, and numerous application- and system-level optimization techniques have been developed for Serverless workflow scheduling. However, there has been limited exploration of optimizing Serverless workflow scheduling in edge computing systems, particularly in high-density, resource-constrained environments such as system-on-chip clusters and single-board-computer clusters. In this work, we discover that existing Serverless workflow scheduling techniques typically assume models with limited expressiveness and cause significant resource contention. To address these issues, we propose modeling Serverless workflows using behavior trees, a novel and fundamentally different approach from existing directed-acyclic-graph- and state machine-based models. Behavior tree-based modeling allows for easy analysis without compromising workflow expressiveness. We further present observations derived from the inherent tree structure of behavior trees for contention-free function collections and awareness of exact and empirical concurrent function invocations. Based on these observations, we introduce BeeFlow, a behavior tree-based Serverless workflow system tailored for resource-constrained edge clusters. Experimental results demonstrate that BeeFlow achieves up to 3.2X speedup in a high-density, resource-constrained edge testbed and 2.5X speedup in a high-profile cloud testbed, compared with the state-of-the-art.Comment: Accepted by Journal of Systems Architectur

    Machine and deep learning techniques for detecting internet protocol version six attacks: a review

    Get PDF
    The rapid development of information and communication technologies has increased the demand for internet-facing devices that require publicly accessible internet protocol (IP) addresses, resulting in the depletion of internet protocol version 4 (IPv4) address space. As a result, internet protocol version 6 (IPv6) was designed to address this issue. However, IPv6 is still not widely used because of security concerns. An intrusion detection system (IDS) is one example of a security mechanism used to secure networks. Lately, the use of machine learning (ML) or deep learning (DL) detection models in IDSs is gaining popularity due to their ability to detect threats on IPv6 networks accurately. However, there is an apparent lack of studies that review ML and DL in IDS. Even the existing reviews of ML and DL fail to compare those techniques. Thus, this paper comprehensively elucidates ML and DL techniques and IPv6-based distributed denial of service (DDoS) attacks. Additionally, this paper includes a qualitative comparison with other related works. Moreover, this work also thoroughly reviews the existing ML and DL-based IDSs for detecting IPv6 and IPv4 attacks. Lastly, researchers could use this review as a guide in the future to improve their work on DL and ML-based IDS

    Special issue "digital twin technology in the AEC industry"

    Get PDF
    Sustainable building design has become a hot topic over the past decades. Many standards, databases, and tools have been developed for achieving a sustainable building. Not until recently have the importance of structural engineering and its contribution to sustainable building design been full recognised. However, due to the highly fragmented and diversity of knowledge across building and infrastructure domains, there is a lack of approach that can address all the sustainable issues within the structural design. This paper reviews the sustainable design from the perspective of structural engineering: (1) reviewing the current situation; (2) identifying the gaps and difficulties; and (3) making recommendations for future improvements. The strategies and indicators, as well as BIM-enabled methodology, for sustainable structural design (SSD) are also discussed in a holistic way. The results of this investigation show that most of the methods are not doing well in terms of delivering a successful sustainable structural design. It is expected that the future BIM could probably provide such a platform to address these issues

    Architecture and the Built Environment:

    Get PDF
    This publication provides an overview of TU Delft’s most significant research achievements in the field of architecture and the built environment during the years 2010–2012. It is the first presentation of the joint research portfolio of the Faculty of Architecture and OTB Research Institute since their integration into the Faculty of Architecture and the Built Environment. As such the portfolio holds a strong promise for the future. In a time when the economy seems to be finally picking up and in which such societal issues as energy, climate and ageing are more prominent than ever before, there are plenty of fields for us to explore in the next three years

    Exploring attributes, sequences, and time in Recommender Systems: From classical to Point-of-Interest recommendation

    Full text link
    Tesis Doctoral inédita leída en la Universidad Autónoma de Madrid, Escuela Politécnica Superior, Departamento de Ingenieria Informática. Fecha de lectura: 08-07-2021Since the emergence of the Internet and the spread of digital communications throughout the world, the amount of data stored on the Web has been growing exponentially. In this new digital era, a large number of companies have emerged with the purpose of ltering the information available on the web and provide users with interesting items. The algorithms and models used to recommend these items are called Recommender Systems. These systems are applied to a large number of domains, from music, books, or movies to dating or Point-of-Interest (POI), which is an increasingly popular domain where users receive recommendations of di erent places when they arrive to a city. In this thesis, we focus on exploiting the use of contextual information, especially temporal and sequential data, and apply it in novel ways in both traditional and Point-of-Interest recommendation. We believe that this type of information can be used not only for creating new recommendation models but also for developing new metrics for analyzing the quality of these recommendations. In one of our rst contributions we propose di erent metrics, some of them derived from previously existing frameworks, using this contextual information. Besides, we also propose an intuitive algorithm that is able to provide recommendations to a target user by exploiting the last common interactions with other similar users of the system. At the same time, we conduct a comprehensive review of the algorithms that have been proposed in the area of POI recommendation between 2011 and 2019, identifying the common characteristics and methodologies used. Once this classi cation of the algorithms proposed to date is completed, we design a mechanism to recommend complete routes (not only independent POIs) to users, making use of reranking techniques. In addition, due to the great di culty of making recommendations in the POI domain, we propose the use of data aggregation techniques to use information from di erent cities to generate POI recommendations in a given target city. In the experimental work we present our approaches on di erent datasets belonging to both classical and POI recommendation. The results obtained in these experiments con rm the usefulness of our recommendation proposals, in terms of ranking accuracy and other dimensions like novelty, diversity, and coverage, and the appropriateness of our metrics for analyzing temporal information and biases in the recommendations producedDesde la aparici on de Internet y la difusi on de las redes de comunicaciones en todo el mundo, la cantidad de datos almacenados en la red ha crecido exponencialmente. En esta nueva era digital, han surgido un gran n umero de empresas con el objetivo de ltrar la informaci on disponible en la red y ofrecer a los usuarios art culos interesantes. Los algoritmos y modelos utilizados para recomendar estos art culos reciben el nombre de Sistemas de Recomendaci on. Estos sistemas se aplican a un gran n umero de dominios, desde m usica, libros o pel culas hasta las citas o los Puntos de Inter es (POIs, en ingl es), un dominio cada vez m as popular en el que los usuarios reciben recomendaciones de diferentes lugares cuando llegan a una ciudad. En esta tesis, nos centramos en explotar el uso de la informaci on contextual, especialmente los datos temporales y secuenciales, y aplicarla de forma novedosa tanto en la recomendaci on cl asica como en la recomendaci on de POIs. Creemos que este tipo de informaci on puede utilizarse no s olo para crear nuevos modelos de recomendaci on, sino tambi en para desarrollar nuevas m etricas para analizar la calidad de estas recomendaciones. En una de nuestras primeras contribuciones proponemos diferentes m etricas, algunas derivadas de formulaciones previamente existentes, utilizando esta informaci on contextual. Adem as, proponemos un algoritmo intuitivo que es capaz de proporcionar recomendaciones a un usuario objetivo explotando las ultimas interacciones comunes con otros usuarios similares del sistema. Al mismo tiempo, realizamos una revisi on exhaustiva de los algoritmos que se han propuesto en el a mbito de la recomendaci o n de POIs entre 2011 y 2019, identi cando las caracter sticas comunes y las metodolog as utilizadas. Una vez realizada esta clasi caci on de los algoritmos propuestos hasta la fecha, dise~namos un mecanismo para recomendar rutas completas (no s olo POIs independientes) a los usuarios, haciendo uso de t ecnicas de reranking. Adem as, debido a la gran di cultad de realizar recomendaciones en el ambito de los POIs, proponemos el uso de t ecnicas de agregaci on de datos para utilizar la informaci on de diferentes ciudades y generar recomendaciones de POIs en una determinada ciudad objetivo. En el trabajo experimental presentamos nuestros m etodos en diferentes conjuntos de datos tanto de recomendaci on cl asica como de POIs. Los resultados obtenidos en estos experimentos con rman la utilidad de nuestras propuestas de recomendaci on en t erminos de precisi on de ranking y de otras dimensiones como la novedad, la diversidad y la cobertura, y c omo de apropiadas son nuestras m etricas para analizar la informaci on temporal y los sesgos en las recomendaciones producida

    Leveraging literals for knowledge graph embeddings

    Get PDF
    Wissensgraphen (Knowledge Graphs, KGs) repräsentieren strukturierte Fakten, die sich aus Entitäten und den zwischen diesen bestehenden Relationen zusammensetzen. Um die Effizienz von KG-Anwendungen zu maximieren, ist es von Vorteil, KGs in einen niedrigdimensionalen Vektorraum zu transformieren. KGs folgen dem Paradigma einer offenen Welt (Open World Assumption, OWA), d. h. fehlende Information wird als potenziell möglich angesehen, wodurch ihre Verwendung in realen Anwendungsszenarien oft eingeschränkt wird. Link-Vorhersage (Link Prediction, LP) zur Vervollständigung von KGs kommt daher eine hohe Bedeutung zu. LP kann in zwei unterschiedlichen Modi durchgeführt werden, transduktiv und induktiv, wobei die erste Möglichkeit voraussetzt, dass alle Entitäten der Testdaten in den Trainingsdaten vorhanden sind, während die zweite Möglichkeit auch zuvor nicht bekannte Entitäten in den Testdaten zulässt. Die vorliegende Arbeit untersucht die Verwendung von Literalen in der transduktiven und induktiven LP, da KGs zahlreiche numerische und textuelle Literale enthalten, die eine wesentliche Semantik aufweisen. Zur Evaluierung dieser LP Methoden werden spezielle Benchmark-Datensätze eingeführt. Insbesondere wird eine neuartige KG Embedding (KGE) Methode, RAILD, vorgeschlagen, die Textliterale zusammen mit kontextuellen Graphinformationen für die LP nutzt. Das Ziel von RAILD ist es, die bestehende Forschungslücke beim Lernen von Embeddings für beim Training ungesehene Relationen zu schließen. Dafür wird eine Architektur vorgeschlagen, die Sprachmodelle (Language Models, LMs) mit Netzwerkembeddings kombiniert. Hierzu erfolgt ein Feintuning von leistungsstarken vortrainierten LMs wie BERT zum Zweck der LP, wobei textuelle Beschreibungen von Entitäten und Relationen genutzt werden. Darüber hinaus wird ein neuer Algorithmus, WeiDNeR, eingeführt, um ein Relationsnetzwerk zu generieren, das zum Erlernen graphbasierter Embeddings von Relationen unter Verwendung eines Netzwerkembeddingsmodells dient. Die Vektorrepräsentationen dieser Relationen werden für die LP kombiniert. Zudem wird ein weiteres neuartiges Embeddingmodell, LitKGE, vorgestellt, das numerische Literale für die transduktive LP verwendet. Es zielt darauf ab, numerische Merkmale für Entitäten durch Graphtraversierung zu erzeugen. Hierfür wird ein weiterer Algorithmus, WeiDNeR_Extended, eingeführt, der ein Netzwerk aus Objekt- und Datentypproperties erzeugt. Aus den aus diesem Netzwerk extrahierten Propertypfaden werden dann numerische Merkmale von Entitäten generiert. Des Weiteren wird der Einsatz eines mehrsprachigen LM zur Kodierung von Entitätenbeschreibungen in verschiedenen natürlichen Sprachen zum Zweck der LP untersucht. Für die Evaluierung der KGE-Modelle wurden die Benchmark-Datensätze LiterallyWikidata und Wikidata68K erstellt. Die vielversprechenden Ergebnisse, die mit den vorgestellten Modellen erzielt wurden, eröffnen interessante Fragestellungen für die zukünftige Forschung auf dem Gebiet der KGEs und ihrer Folgeanwendungen

    Semantic and pragmatic characterization of learning objects

    Get PDF
    Tese de doutoramento. Engenharia Informática. Universidade do Porto. Faculdade de Engenharia. 201
    corecore