33 research outputs found
Biomedical ontology alignment: An approach based on representation learning
While representation learning techniques have shown great promise in application to a number of different NLP tasks, they have had little impact on the problem of ontology matching. Unlike past work that has focused on feature engineering, we present a novel representation learning approach that is tailored to the ontology matching task. Our approach is based on embedding ontological terms in a high-dimensional Euclidean space. This embedding is derived on the basis of a novel phrase retrofitting strategy through which semantic similarity information becomes inscribed onto fields of pre-trained word vectors. The resulting framework also incorporates a novel outlier detection mechanism based on a denoising autoencoder that is shown to improve performance. An ontology matching system derived using the proposed framework achieved an F-score of 94% on an alignment scenario involving the Adult Mouse Anatomical Dictionary and the Foundational Model of Anatomy ontology (FMA) as targets. This compares favorably with the best performing systems on the Ontology Alignment Evaluation Initiative anatomy challenge. We performed additional experiments on aligning FMA to NCI Thesaurus and to SNOMED CT based on a reference alignment extracted from the UMLS Metathesaurus. Our system obtained overall F-scores of 93.2% and 89.2% for these experiments, thus achieving state-of-the-art results
The Emerging Internet of Things Marketplace From an Industrial Perspective: A Survey
The Internet of Things (IoT) is a dynamic global information network
consisting of internet-connected objects, such as Radio-frequency
identification (RFIDs), sensors, actuators, as well as other instruments and
smart appliances that are becoming an integral component of the future
internet. Over the last decade, we have seen a large number of the IoT
solutions developed by start-ups, small and medium enterprises, large
corporations, academic research institutes (such as universities), and private
and public research organisations making their way into the market. In this
paper, we survey over one hundred IoT smart solutions in the marketplace and
examine them closely in order to identify the technologies used,
functionalities, and applications. More importantly, we identify the trends,
opportunities and open challenges in the industry-based the IoT solutions.
Based on the application domain, we classify and discuss these solutions under
five different categories: smart wearable, smart home, smart, city, smart
environment, and smart enterprise. This survey is intended to serve as a
guideline and conceptual framework for future research in the IoT and to
motivate and inspire further developments. It also provides a systematic
exploration of existing research and suggests a number of potentially
significant research directions.Comment: IEEE Transactions on Emerging Topics in Computing 201
Exploiting word embeddings for modeling bilexical relations
There has been an exponential surge of text data in the recent years. As a consequence, unsupervised methods that make use of this data have been steadily growing in the field of natural language processing (NLP). Word embeddings are low-dimensional vectors obtained using unsupervised techniques on the large unlabelled corpora, where words from the vocabulary are mapped to vectors of real numbers. Word embeddings aim to capture syntactic and semantic properties of words.
In NLP, many tasks involve computing the compatibility between lexical items under some linguistic relation. We call this type of relation a bilexical relation. Our thesis defines statistical models for bilexical relations
that centrally make use of word embeddings. Our principle aim is that the word embeddings will favor generalization to words not seen during the training of the model.
The thesis is structured in four parts. In the first part of this thesis, we present a bilinear model over word embeddings that leverages a small supervised dataset for a binary linguistic relation. Our learning algorithm exploits low-rank bilinear forms and induces a low-dimensional embedding tailored for a target linguistic relation. This results in compressed task-specific embeddings.
In the second part of our thesis, we extend our bilinear model to a ternary
setting and propose a framework for resolving prepositional phrase attachment ambiguity using word embeddings. Our models perform competitively with state-of-the-art models. In addition, our method obtains significant improvements on out-of-domain tests by simply using word-embeddings induced from source and target domains.
In the third part of this thesis, we further extend the bilinear models for expanding vocabulary in the context of statistical phrase-based machine translation. Our model obtains a probabilistic list of possible translations of target language words, given a word in the source language. We do this by projecting pre-trained embeddings into a common subspace using a log-bilinear model. We empirically notice a significant improvement on an out-of-domain test set.
In the final part of our thesis, we propose a non-linear model that maps initial word embeddings to task-tuned word embeddings, in the context of a neural network dependency parser. We demonstrate its use for improved dependency parsing, especially for sentences with unseen words. We also show downstream improvements on a sentiment analysis task.En els darrers anys hi ha hagut un sorgiment notable de dades en format textual. Conseqüentment, en el camp del Processament del Llenguatge Natural (NLP, de l'anglès "Natural Language Processing") s'han desenvolupat mètodes no supervistats que fan ús d'aquestes dades. Els anomenats "word embeddings", o embeddings de paraules, són vectors de dimensionalitat baixa que s'obtenen mitjançant tècniques no supervisades aplicades a corpus textuals de grans volums. Com a resultat, cada paraula del diccionari es correspon amb un vector de nombres reals, el propòsit del qual és capturar propietats sintàctiques i semàntiques de la paraula corresponent. Moltes tasques de NLP involucren calcular la compatibilitat entre elements lèxics en l'àmbit d'una relació lingüística. D'aquest tipus de relació en diem relació bilèxica. Aquesta tesi proposa models estadístics per a relacions bilèxiques que fan ús central d'embeddings de paraules, amb l'objectiu de millorar la generalització del model lingüístic a paraules no vistes durant l'entrenament. La tesi s'estructura en quatre parts. A la primera part presentem un model bilineal sobre embeddings de paraules que explota un conjunt petit de dades anotades sobre una relaxió bilèxica. L'algorisme d'aprenentatge treballa amb formes bilineals de poc rang, i indueix embeddings de poca dimensionalitat que estan especialitzats per la relació bilèxica per la qual s'han entrenat. Com a resultat, obtenim embeddings de paraules que corresponen a compressions d'embeddings per a una relació determinada. A la segona part de la tesi proposem una extensió del model bilineal a trilineal, i amb això proposem un nou model per a resoldre ambigüitats de sintagmes preposicionals que usa només embeddings de paraules. En una sèrie d'avaluacións, els nostres models funcionen de manera similar a l'estat de l'art. A més, el nostre mètode obté millores significatives en avaluacions en textos de dominis diferents al d'entrenament, simplement usant embeddings induïts amb textos dels dominis d'entrenament i d'avaluació. A la tercera part d'aquesta tesi proposem una altra extensió dels models bilineals per ampliar la cobertura lèxica en el context de models estadístics de traducció automàtica. El nostre model probabilístic obté, donada una paraula en la llengua d'origen, una llista de possibles traduccions en la llengua de destí. Fem això mitjançant una projecció d'embeddings pre-entrenats a un sub-espai comú, usant un model log-bilineal. Empíricament, observem una millora significativa en avaluacions en dominis diferents al d'entrenament. Finalment, a la quarta part de la tesi proposem un model no lineal que indueix una correspondència entre embeddings inicials i embeddings especialitzats, en el context de tasques d'anàlisi sintàctica de dependències amb models neuronals. Mostrem que aquest mètode millora l'analisi de dependències, especialment en oracions amb paraules no vistes durant l'entrenament. També mostrem millores en un tasca d'anàlisi de sentiment
BeeFlow: Behavior Tree-based Serverless Workflow Modeling and Scheduling for Resource-Constrained Edge Clusters
Serverless computing has gained popularity in edge computing due to its
flexible features, including the pay-per-use pricing model, auto-scaling
capabilities, and multi-tenancy support. Complex Serverless-based applications
typically rely on Serverless workflows (also known as Serverless function
orchestration) to express task execution logic, and numerous application- and
system-level optimization techniques have been developed for Serverless
workflow scheduling. However, there has been limited exploration of optimizing
Serverless workflow scheduling in edge computing systems, particularly in
high-density, resource-constrained environments such as system-on-chip clusters
and single-board-computer clusters. In this work, we discover that existing
Serverless workflow scheduling techniques typically assume models with limited
expressiveness and cause significant resource contention. To address these
issues, we propose modeling Serverless workflows using behavior trees, a novel
and fundamentally different approach from existing directed-acyclic-graph- and
state machine-based models. Behavior tree-based modeling allows for easy
analysis without compromising workflow expressiveness. We further present
observations derived from the inherent tree structure of behavior trees for
contention-free function collections and awareness of exact and empirical
concurrent function invocations. Based on these observations, we introduce
BeeFlow, a behavior tree-based Serverless workflow system tailored for
resource-constrained edge clusters. Experimental results demonstrate that
BeeFlow achieves up to 3.2X speedup in a high-density, resource-constrained
edge testbed and 2.5X speedup in a high-profile cloud testbed, compared with
the state-of-the-art.Comment: Accepted by Journal of Systems Architectur
Machine and deep learning techniques for detecting internet protocol version six attacks: a review
The rapid development of information and communication technologies has increased the demand for internet-facing devices that require publicly accessible internet protocol (IP) addresses, resulting in the depletion of internet protocol version 4 (IPv4) address space. As a result, internet protocol version 6 (IPv6) was designed to address this issue. However, IPv6 is still not widely used because of security concerns. An intrusion detection system (IDS) is one example of a security mechanism used to secure networks. Lately, the use of machine learning (ML) or deep learning (DL) detection models in IDSs is gaining popularity due to their ability to detect threats on IPv6 networks accurately. However, there is an apparent lack of studies that review ML and DL in IDS. Even the existing reviews of ML and DL fail to compare those techniques. Thus, this paper comprehensively elucidates ML and DL techniques and IPv6-based distributed denial of service (DDoS) attacks. Additionally, this paper includes a qualitative comparison with other related works. Moreover, this work also thoroughly reviews the existing ML and DL-based IDSs for detecting IPv6 and IPv4 attacks. Lastly, researchers could use this review as a guide in the future to improve their work on DL and ML-based IDS
Special issue "digital twin technology in the AEC industry"
Sustainable building design has become a hot topic over the past decades. Many standards, databases, and tools have been developed for achieving a sustainable building. Not until recently have the importance of structural engineering and its contribution to sustainable building design been full recognised. However, due to the highly fragmented and diversity of knowledge across building and infrastructure domains, there is a lack of approach that can address all the sustainable issues within the structural design. This paper reviews the sustainable design from the perspective of structural engineering: (1) reviewing the current situation; (2) identifying the gaps and difficulties; and (3) making recommendations for future improvements. The strategies and indicators, as well as BIM-enabled methodology, for sustainable structural design (SSD) are also discussed in a holistic way. The results of this investigation show that most of the methods are not doing well in terms of delivering a successful sustainable structural design. It is expected that the future BIM could probably provide such a platform to address these issues
Architecture and the Built Environment:
This publication provides an overview of TU Delft’s most significant research achievements in the field of architecture and the built environment during the years 2010–2012. It is the first presentation of the joint research portfolio of the Faculty of Architecture and OTB Research Institute since their integration into the Faculty of Architecture and the Built Environment. As such the portfolio holds a strong promise for the future. In a time when the economy seems to be finally picking up and in which such societal issues as energy, climate and ageing are more prominent than ever before, there are plenty of fields for us to explore in the next three years
Exploring attributes, sequences, and time in Recommender Systems: From classical to Point-of-Interest recommendation
Tesis Doctoral inédita leída en la Universidad Autónoma de Madrid, Escuela Politécnica Superior, Departamento de Ingenieria Informática. Fecha de lectura: 08-07-2021Since the emergence of the Internet and the spread of digital communications
throughout the world, the amount of data stored on the Web has been
growing exponentially. In this new digital era, a large number of companies
have emerged with the purpose of ltering the information available on the
web and provide users with interesting items. The algorithms and models
used to recommend these items are called Recommender Systems. These
systems are applied to a large number of domains, from music, books, or
movies to dating or Point-of-Interest (POI), which is an increasingly popular
domain where users receive recommendations of di erent places when
they arrive to a city.
In this thesis, we focus on exploiting the use of contextual information, especially
temporal and sequential data, and apply it in novel ways in both
traditional and Point-of-Interest recommendation. We believe that this type
of information can be used not only for creating new recommendation models
but also for developing new metrics for analyzing the quality of these
recommendations. In one of our rst contributions we propose di erent
metrics, some of them derived from previously existing frameworks, using
this contextual information. Besides, we also propose an intuitive algorithm
that is able to provide recommendations to a target user by exploiting the
last common interactions with other similar users of the system.
At the same time, we conduct a comprehensive review of the algorithms
that have been proposed in the area of POI recommendation between 2011
and 2019, identifying the common characteristics and methodologies used.
Once this classi cation of the algorithms proposed to date is completed, we
design a mechanism to recommend complete routes (not only independent
POIs) to users, making use of reranking techniques. In addition, due to the
great di culty of making recommendations in the POI domain, we propose
the use of data aggregation techniques to use information from di erent
cities to generate POI recommendations in a given target city.
In the experimental work we present our approaches on di erent datasets
belonging to both classical and POI recommendation. The results obtained
in these experiments con rm the usefulness of our recommendation proposals,
in terms of ranking accuracy and other dimensions like novelty, diversity,
and coverage, and the appropriateness of our metrics for analyzing temporal
information and biases in the recommendations producedDesde la aparici on de Internet y la difusi on de las redes de comunicaciones
en todo el mundo, la cantidad de datos almacenados en la red ha crecido
exponencialmente. En esta nueva era digital, han surgido un gran n umero
de empresas con el objetivo de ltrar la informaci on disponible en la red
y ofrecer a los usuarios art culos interesantes. Los algoritmos y modelos
utilizados para recomendar estos art culos reciben el nombre de Sistemas de
Recomendaci on. Estos sistemas se aplican a un gran n umero de dominios,
desde m usica, libros o pel culas hasta las citas o los Puntos de Inter es (POIs,
en ingl es), un dominio cada vez m as popular en el que los usuarios reciben
recomendaciones de diferentes lugares cuando llegan a una ciudad.
En esta tesis, nos centramos en explotar el uso de la informaci on contextual,
especialmente los datos temporales y secuenciales, y aplicarla de forma novedosa
tanto en la recomendaci on cl asica como en la recomendaci on de POIs.
Creemos que este tipo de informaci on puede utilizarse no s olo para crear
nuevos modelos de recomendaci on, sino tambi en para desarrollar nuevas
m etricas para analizar la calidad de estas recomendaciones. En una de
nuestras primeras contribuciones proponemos diferentes m etricas, algunas
derivadas de formulaciones previamente existentes, utilizando esta informaci
on contextual. Adem as, proponemos un algoritmo intuitivo que es
capaz de proporcionar recomendaciones a un usuario objetivo explotando
las ultimas interacciones comunes con otros usuarios similares del sistema.
Al mismo tiempo, realizamos una revisi on exhaustiva de los algoritmos que
se han propuesto en el a mbito de la recomendaci o n de POIs entre 2011 y
2019, identi cando las caracter sticas comunes y las metodolog as utilizadas.
Una vez realizada esta clasi caci on de los algoritmos propuestos hasta la
fecha, dise~namos un mecanismo para recomendar rutas completas (no s olo
POIs independientes) a los usuarios, haciendo uso de t ecnicas de reranking.
Adem as, debido a la gran di cultad de realizar recomendaciones en el
ambito de los POIs, proponemos el uso de t ecnicas de agregaci on de datos
para utilizar la informaci on de diferentes ciudades y generar recomendaciones
de POIs en una determinada ciudad objetivo.
En el trabajo experimental presentamos nuestros m etodos en diferentes
conjuntos de datos tanto de recomendaci on cl asica como de POIs. Los
resultados obtenidos en estos experimentos con rman la utilidad de nuestras
propuestas de recomendaci on en t erminos de precisi on de ranking y de
otras dimensiones como la novedad, la diversidad y la cobertura, y c omo de
apropiadas son nuestras m etricas para analizar la informaci on temporal y
los sesgos en las recomendaciones producida
Leveraging literals for knowledge graph embeddings
Wissensgraphen (Knowledge Graphs, KGs) repräsentieren strukturierte Fakten, die sich aus Entitäten und den zwischen diesen bestehenden Relationen zusammensetzen. Um die Effizienz von KG-Anwendungen zu maximieren, ist es von Vorteil, KGs in einen niedrigdimensionalen Vektorraum zu transformieren. KGs folgen dem Paradigma einer offenen Welt (Open World Assumption, OWA), d. h. fehlende Information wird als potenziell möglich angesehen, wodurch ihre Verwendung in realen Anwendungsszenarien oft eingeschränkt wird. Link-Vorhersage (Link Prediction, LP) zur Vervollständigung von KGs kommt daher eine hohe Bedeutung zu. LP kann in zwei unterschiedlichen Modi durchgeführt werden, transduktiv und induktiv, wobei die erste Möglichkeit voraussetzt, dass alle Entitäten der Testdaten in den Trainingsdaten vorhanden sind, während die zweite Möglichkeit auch zuvor nicht bekannte Entitäten in den Testdaten zulässt. Die vorliegende Arbeit untersucht die Verwendung von Literalen in der transduktiven und induktiven LP, da KGs zahlreiche numerische und textuelle Literale enthalten, die eine wesentliche Semantik aufweisen. Zur Evaluierung dieser LP Methoden werden spezielle Benchmark-Datensätze eingeführt.
Insbesondere wird eine neuartige KG Embedding (KGE) Methode, RAILD, vorgeschlagen, die Textliterale zusammen mit kontextuellen Graphinformationen für die LP nutzt. Das Ziel von RAILD ist es, die bestehende Forschungslücke beim Lernen von Embeddings für beim Training ungesehene Relationen zu schließen. Dafür wird eine Architektur vorgeschlagen, die Sprachmodelle (Language Models, LMs) mit Netzwerkembeddings kombiniert. Hierzu erfolgt ein Feintuning von leistungsstarken vortrainierten LMs wie BERT zum Zweck der LP, wobei textuelle Beschreibungen von Entitäten und Relationen genutzt werden. Darüber hinaus wird ein neuer Algorithmus, WeiDNeR, eingeführt, um ein Relationsnetzwerk zu generieren, das zum Erlernen graphbasierter Embeddings von Relationen unter Verwendung eines Netzwerkembeddingsmodells dient. Die Vektorrepräsentationen dieser Relationen werden für die LP kombiniert. Zudem wird ein weiteres neuartiges Embeddingmodell, LitKGE, vorgestellt, das numerische Literale für die transduktive LP verwendet. Es zielt darauf ab, numerische Merkmale für Entitäten durch Graphtraversierung zu erzeugen. Hierfür wird ein weiterer Algorithmus, WeiDNeR_Extended, eingeführt, der ein Netzwerk aus Objekt- und Datentypproperties erzeugt. Aus den aus diesem Netzwerk extrahierten Propertypfaden werden dann numerische Merkmale von Entitäten generiert.
Des Weiteren wird der Einsatz eines mehrsprachigen LM zur Kodierung von Entitätenbeschreibungen in verschiedenen natürlichen Sprachen zum Zweck der LP untersucht. Für die Evaluierung der KGE-Modelle wurden die Benchmark-Datensätze LiterallyWikidata und Wikidata68K erstellt. Die vielversprechenden Ergebnisse, die mit den vorgestellten Modellen erzielt wurden, eröffnen interessante Fragestellungen für die zukünftige Forschung auf dem Gebiet der KGEs und ihrer Folgeanwendungen
Semantic and pragmatic characterization of learning objects
Tese de doutoramento. Engenharia Informática. Universidade do Porto. Faculdade de Engenharia. 201