653 research outputs found
Structure Regularization for Structured Prediction: Theories and Experiments
While there are many studies on weight regularization, the study on structure
regularization is rare. Many existing systems on structured prediction focus on
increasing the level of structural dependencies within the model. However, this
trend could have been misdirected, because our study suggests that complex
structures are actually harmful to generalization ability in structured
prediction. To control structure-based overfitting, we propose a structure
regularization framework via \emph{structure decomposition}, which decomposes
training samples into mini-samples with simpler structures, deriving a model
with better generalization power. We show both theoretically and empirically
that structure regularization can effectively control overfitting risk and lead
to better accuracy. As a by-product, the proposed method can also substantially
accelerate the training speed. The method and the theoretical results can apply
to general graphical models with arbitrary structures. Experiments on
well-known tasks demonstrate that our method can easily beat the benchmark
systems on those highly-competitive tasks, achieving state-of-the-art
accuracies yet with substantially faster training speed
Building Lightweight Ontologies for Faceted Search with Named Entity Recognition: Case WarMemoirSampo
Peer reviewe
KALA: Knowledge-Augmented Language Model Adaptation
Pre-trained language models (PLMs) have achieved remarkable success on
various natural language understanding tasks. Simple fine-tuning of PLMs, on
the other hand, might be suboptimal for domain-specific tasks because they
cannot possibly cover knowledge from all domains. While adaptive pre-training
of PLMs can help them obtain domain-specific knowledge, it requires a large
training cost. Moreover, adaptive pre-training can harm the PLM's performance
on the downstream task by causing catastrophic forgetting of its general
knowledge. To overcome such limitations of adaptive pre-training for PLM
adaption, we propose a novel domain adaption framework for PLMs coined as
Knowledge-Augmented Language model Adaptation (KALA), which modulates the
intermediate hidden representations of PLMs with domain knowledge, consisting
of entities and their relational facts. We validate the performance of our KALA
on question answering and named entity recognition tasks on multiple datasets
across various domains. The results show that, despite being computationally
efficient, our KALA largely outperforms adaptive pre-training. Code is
available at: https://github.com/Nardien/KALA/.Comment: NAACL 202
Named Entity Recognition and Linking in Greek Legislation
Δείχνουμε πώς η αναγνώριση οντοτήτων σε κείμενα Ελληνικής νομοθεσίας μπορεί να
επιτευχθεί με την χρήση ενός αναγνωριστή ονομασμένων οντοτήτων (named entity rec-
ognizer, NER). Η δουλειά μας είναι η πρώτη του είδους της που ασχολείται με την ελλη-
νική γλώσσα σε τόσο βάθος και μία από ελάχιστες που μελετούν νομικό κείμενο. Εφαρ-
μόζουμε αναζήτηση δικτύου (grid search) σε πολλαπλές αρχιτεκτονικές νευρωνικών δι-
κτύων και συνδυασμούς υπερ-παραμέτρων (hyper-parameters) για να μεγιστοποιήσουμε
την αποτε- λεσματικότητα της προσέγγισής μας. Δείχνουμε ότι, χρησιμοποιώντας ένα με-
γάλο νομικό λεξικό χτίσαμε ενσωματωμένες/συμβολικές λέξεις (word/token-shaped em-
beddings) χρη- σιμοποιώντας το Word2Vec και τελικά πετυχαίνουμε κατά μέσο όρο 86%
ακρίβεια σε ανα- γνώριση οργανισμών, νομικών αναφορών, γεωγραφικών τοποθεσιών,
ανθρώπων, γεω-πολιτικών οντοτήτων (GPEs) και δημοσίων εγγράφων. Η αξιολόγηση
της μεθοδολογίας μας βασίζεται στις μετρικές της ακριβείας (precision), της ανάκλησης
(recall) και της f 1 μετρικής (f1-score) ανά τύπο οντότητας για κάθε νευρωνικό δίκτυο. Τέ-
λος, μετράμε την αναλογία των σωστά προβλεπόμενων συνδέσμων για την διασύνδεση
RDF συνόλων δεδομένων (datasets) που παράγονται από την προσέγγισή μας με άλλα
γνωστά σύνολα δεδομένων που έχουν εκδοθεί δημόσια και πώς μπορούμε να εξάγουμε
νέα γνώση έμμεσα με την προσέγγισή μας από την DBpedia, το ELI (Europeal Legislation
Identifier) και το GAG (Greek administrative geography, Ελληνική διοικητική γεωγραφία)
του Καλλικράτη.We show how entity recognition in Greek legislation texts can be achieved by utilizing a
named entity recognizer (NER). Our work is the first of its kind for the Greek language in
such an extended form and one of the few that examines legal text. We apply grid search
on multiple neural network architectures and combination of hyper-parameters to maxi-
mize the efficiency of our approach. We show that, utilizing a big legal corpus we built
word/token-shape embeddings using Word2Vec, and finally achieve 86% accuracy on av-
erage in recognition of organizations, legal references, geographical landmarks, persons,
geo-political entities (GPEs) and public documents. The evaluation of our methodology is
based on the metrics of precision, recall, f 1 -score per entity type for each neural network.
Finally, we measure the ratio of correctly guessed links for the interlinking of RDF datasets
produced by our approach with well-known public datasets and how new knowledge can
be inferred indirectly by our approach from DBpedia, ELI (Europeal Legislation Identifier)
and GAG (Greek administrative geography) of Kallikratis
Space-partitioning with cascade-connected ANN structures for positioning in mobile communication systems
The world around us is getting more connected with each day passing by – new portable
devices employing wireless connections to various networks wherever one might be. Locationaware
computing has become an important bit of telecommunication services and industry. For
this reason, the research efforts on new and improved localisation algorithms are constantly
being performed. Thus far, the satellite positioning systems have achieved highest popularity
and penetration regarding the global position estimation. In spite the numerous investigations
aimed at enabling these systems to equally procure the position in both indoor and outdoor
environments, this is still a task to be completed.
This research work presented herein aimed at improving the state-of-the-art positioning
techniques through the use of two highly popular mobile communication systems: WLAN and
public land mobile networks. These systems already have widely deployed network structures
(coverage) and a vast number of (inexpensive) mobile clients, so using them for additional,
positioning purposes is rational and logical.
First, the positioning in WLAN systems was analysed and elaborated. The indoor test-bed,
used for verifying the models’ performances, covered almost 10,000m2 area. It has been chosen
carefully so that the positioning could be thoroughly explored. The measurement campaigns
performed therein covered the whole of test-bed environment and gave insight into location
dependent parameters available in WLAN networks. Further analysis of the data lead to
developing of positioning models based on ANNs.
The best single ANN model obtained 9.26m average distance error and 7.75m median distance
error. The novel positioning model structure, consisting of cascade-connected ANNs, improved
those results to 8.14m and 4.57m, respectively. To adequately compare the proposed
techniques with other, well-known research techniques, the environment positioning error
parameter was introduced. This parameter enables to take the size of the test environment into
account when comparing the accuracy of the indoor positioning techniques.
Concerning the PLMN positioning, in-depth analysis of available system parameters and
signalling protocols produced a positioning algorithm, capable of fusing the system received
signal strength parameters received from multiple systems and multiple operators. Knowing
that most of the areas are covered by signals from more than one network operator and even
more than one system from one operator, it becomes easy to note the great practical value of
this novel algorithm. On the other hand, an extensive drive-test measurement campaign,
covering more than 600km in the central areas of Belgrade, was performed. Using this algorithm and applying the single ANN models to the recorded measurements, a 59m average
distance error and 50m median distance error were obtained. Moreover, the positioning in
indoor environment was verified and the degradation of performances, due to the crossenvironment
model use, was reported: 105m average distance error and 101m median distance
error.
When applying the new, cascade-connected ANN structure model, distance errors were
reduced to 26m and 2m, for the average and median distance errors, respectively.
The obtained positioning accuracy was shown to be good enough for the implementation of a
broad scope of location based services by using the existing and deployed, commonly
available, infrastructure
Deep Neural Architectures for End-to-End Relation Extraction
The rapid pace of scientific and technological advancements has led to a meteoric growth in knowledge, as evidenced by a sharp increase in the number of scholarly publications in recent years. PubMed, for example, archives more than 30 million biomedical articles across various domains and covers a wide range of topics including medicine, pharmacy, biology, and healthcare. Social media and digital journalism have similarly experienced their own accelerated growth in the age of big data. Hence, there is a compelling need for ways to organize and distill the vast, fragmented body of information (often unstructured in the form of natural human language) so that it can be assimilated, reasoned about, and ultimately harnessed. Relation extraction is an important natural language task toward that end. In relation extraction, semantic relationships are extracted from natural human language in the form of (subject, object, predicate) triples such that subject and object are mentions of discrete concepts and predicate indicates the type of relation between them. The difficulty of relation extraction becomes clear when we consider the myriad of ways the same relation can be expressed in natural language. Much of the current works in relation extraction assume that entities are known at extraction time, thus treating entity recognition as an entirely separate and independent task. However, recent studies have shown that entity recognition and relation extraction, when modeled together as interdependent tasks, can lead to overall improvements in extraction accuracy. When modeled in such a manner, the task is referred to as end-to-end relation extraction. In this work, we present four studies that introduce incrementally sophisticated architectures designed to tackle the task of end-to-end relation extraction. In the first study, we present a pipeline approach for extracting protein-protein interactions as affected by particular mutations. The pipeline system makes use of recurrent neural networks for protein detection, lexicons for gene normalization, and convolutional neural networks for relation extraction. In the second study, we show that a multi-task learning framework, with parameter sharing, can achieve state-of-the-art results for drug-drug interaction extraction. At its core, the model uses graph convolutions, with a novel attention-gating mechanism, over dependency parse trees. In the third study, we present a more efficient and general-purpose end-to-end neural architecture designed around the idea of the table-filling paradigm; for an input sentence of length n, all entities and relations are extracted in a single pass of the network in an indirect fashion by populating the cells of a corresponding n by n table using metric-based features. We show that this approach excels in both the general English and biomedical domains with extraction times that are up to an order of magnitude faster compared to the prior best. In the fourth and last study, we present an architecture for relation extraction that, in addition to being end-to-end, is able to handle cross-sentence and N-ary relations. Overall, our work contributes to the advancement of modern information extraction by exploring end-to-end solutions that are fast, accurate, and generalizable to many high-value domains
- …