653 research outputs found

    Structure Regularization for Structured Prediction: Theories and Experiments

    Full text link
    While there are many studies on weight regularization, the study on structure regularization is rare. Many existing systems on structured prediction focus on increasing the level of structural dependencies within the model. However, this trend could have been misdirected, because our study suggests that complex structures are actually harmful to generalization ability in structured prediction. To control structure-based overfitting, we propose a structure regularization framework via \emph{structure decomposition}, which decomposes training samples into mini-samples with simpler structures, deriving a model with better generalization power. We show both theoretically and empirically that structure regularization can effectively control overfitting risk and lead to better accuracy. As a by-product, the proposed method can also substantially accelerate the training speed. The method and the theoretical results can apply to general graphical models with arbitrary structures. Experiments on well-known tasks demonstrate that our method can easily beat the benchmark systems on those highly-competitive tasks, achieving state-of-the-art accuracies yet with substantially faster training speed

    KALA: Knowledge-Augmented Language Model Adaptation

    Full text link
    Pre-trained language models (PLMs) have achieved remarkable success on various natural language understanding tasks. Simple fine-tuning of PLMs, on the other hand, might be suboptimal for domain-specific tasks because they cannot possibly cover knowledge from all domains. While adaptive pre-training of PLMs can help them obtain domain-specific knowledge, it requires a large training cost. Moreover, adaptive pre-training can harm the PLM's performance on the downstream task by causing catastrophic forgetting of its general knowledge. To overcome such limitations of adaptive pre-training for PLM adaption, we propose a novel domain adaption framework for PLMs coined as Knowledge-Augmented Language model Adaptation (KALA), which modulates the intermediate hidden representations of PLMs with domain knowledge, consisting of entities and their relational facts. We validate the performance of our KALA on question answering and named entity recognition tasks on multiple datasets across various domains. The results show that, despite being computationally efficient, our KALA largely outperforms adaptive pre-training. Code is available at: https://github.com/Nardien/KALA/.Comment: NAACL 202

    Named Entity Recognition and Linking in Greek Legislation

    Get PDF
    Δείχνουμε πώς η αναγνώριση οντοτήτων σε κείμενα Ελληνικής νομοθεσίας μπορεί να επιτευχθεί με την χρήση ενός αναγνωριστή ονομασμένων οντοτήτων (named entity rec- ognizer, NER). Η δουλειά μας είναι η πρώτη του είδους της που ασχολείται με την ελλη- νική γλώσσα σε τόσο βάθος και μία από ελάχιστες που μελετούν νομικό κείμενο. Εφαρ- μόζουμε αναζήτηση δικτύου (grid search) σε πολλαπλές αρχιτεκτονικές νευρωνικών δι- κτύων και συνδυασμούς υπερ-παραμέτρων (hyper-parameters) για να μεγιστοποιήσουμε την αποτε- λεσματικότητα της προσέγγισής μας. Δείχνουμε ότι, χρησιμοποιώντας ένα με- γάλο νομικό λεξικό χτίσαμε ενσωματωμένες/συμβολικές λέξεις (word/token-shaped em- beddings) χρη- σιμοποιώντας το Word2Vec και τελικά πετυχαίνουμε κατά μέσο όρο 86% ακρίβεια σε ανα- γνώριση οργανισμών, νομικών αναφορών, γεωγραφικών τοποθεσιών, ανθρώπων, γεω-πολιτικών οντοτήτων (GPEs) και δημοσίων εγγράφων. Η αξιολόγηση της μεθοδολογίας μας βασίζεται στις μετρικές της ακριβείας (precision), της ανάκλησης (recall) και της f 1 μετρικής (f1-score) ανά τύπο οντότητας για κάθε νευρωνικό δίκτυο. Τέ- λος, μετράμε την αναλογία των σωστά προβλεπόμενων συνδέσμων για την διασύνδεση RDF συνόλων δεδομένων (datasets) που παράγονται από την προσέγγισή μας με άλλα γνωστά σύνολα δεδομένων που έχουν εκδοθεί δημόσια και πώς μπορούμε να εξάγουμε νέα γνώση έμμεσα με την προσέγγισή μας από την DBpedia, το ELI (Europeal Legislation Identifier) και το GAG (Greek administrative geography, Ελληνική διοικητική γεωγραφία) του Καλλικράτη.We show how entity recognition in Greek legislation texts can be achieved by utilizing a named entity recognizer (NER). Our work is the first of its kind for the Greek language in such an extended form and one of the few that examines legal text. We apply grid search on multiple neural network architectures and combination of hyper-parameters to maxi- mize the efficiency of our approach. We show that, utilizing a big legal corpus we built word/token-shape embeddings using Word2Vec, and finally achieve 86% accuracy on av- erage in recognition of organizations, legal references, geographical landmarks, persons, geo-political entities (GPEs) and public documents. The evaluation of our methodology is based on the metrics of precision, recall, f 1 -score per entity type for each neural network. Finally, we measure the ratio of correctly guessed links for the interlinking of RDF datasets produced by our approach with well-known public datasets and how new knowledge can be inferred indirectly by our approach from DBpedia, ELI (Europeal Legislation Identifier) and GAG (Greek administrative geography) of Kallikratis

    Space-partitioning with cascade-connected ANN structures for positioning in mobile communication systems

    Get PDF
    The world around us is getting more connected with each day passing by – new portable devices employing wireless connections to various networks wherever one might be. Locationaware computing has become an important bit of telecommunication services and industry. For this reason, the research efforts on new and improved localisation algorithms are constantly being performed. Thus far, the satellite positioning systems have achieved highest popularity and penetration regarding the global position estimation. In spite the numerous investigations aimed at enabling these systems to equally procure the position in both indoor and outdoor environments, this is still a task to be completed. This research work presented herein aimed at improving the state-of-the-art positioning techniques through the use of two highly popular mobile communication systems: WLAN and public land mobile networks. These systems already have widely deployed network structures (coverage) and a vast number of (inexpensive) mobile clients, so using them for additional, positioning purposes is rational and logical. First, the positioning in WLAN systems was analysed and elaborated. The indoor test-bed, used for verifying the models’ performances, covered almost 10,000m2 area. It has been chosen carefully so that the positioning could be thoroughly explored. The measurement campaigns performed therein covered the whole of test-bed environment and gave insight into location dependent parameters available in WLAN networks. Further analysis of the data lead to developing of positioning models based on ANNs. The best single ANN model obtained 9.26m average distance error and 7.75m median distance error. The novel positioning model structure, consisting of cascade-connected ANNs, improved those results to 8.14m and 4.57m, respectively. To adequately compare the proposed techniques with other, well-known research techniques, the environment positioning error parameter was introduced. This parameter enables to take the size of the test environment into account when comparing the accuracy of the indoor positioning techniques. Concerning the PLMN positioning, in-depth analysis of available system parameters and signalling protocols produced a positioning algorithm, capable of fusing the system received signal strength parameters received from multiple systems and multiple operators. Knowing that most of the areas are covered by signals from more than one network operator and even more than one system from one operator, it becomes easy to note the great practical value of this novel algorithm. On the other hand, an extensive drive-test measurement campaign, covering more than 600km in the central areas of Belgrade, was performed. Using this algorithm and applying the single ANN models to the recorded measurements, a 59m average distance error and 50m median distance error were obtained. Moreover, the positioning in indoor environment was verified and the degradation of performances, due to the crossenvironment model use, was reported: 105m average distance error and 101m median distance error. When applying the new, cascade-connected ANN structure model, distance errors were reduced to 26m and 2m, for the average and median distance errors, respectively. The obtained positioning accuracy was shown to be good enough for the implementation of a broad scope of location based services by using the existing and deployed, commonly available, infrastructure

    Deep Neural Architectures for End-to-End Relation Extraction

    Get PDF
    The rapid pace of scientific and technological advancements has led to a meteoric growth in knowledge, as evidenced by a sharp increase in the number of scholarly publications in recent years. PubMed, for example, archives more than 30 million biomedical articles across various domains and covers a wide range of topics including medicine, pharmacy, biology, and healthcare. Social media and digital journalism have similarly experienced their own accelerated growth in the age of big data. Hence, there is a compelling need for ways to organize and distill the vast, fragmented body of information (often unstructured in the form of natural human language) so that it can be assimilated, reasoned about, and ultimately harnessed. Relation extraction is an important natural language task toward that end. In relation extraction, semantic relationships are extracted from natural human language in the form of (subject, object, predicate) triples such that subject and object are mentions of discrete concepts and predicate indicates the type of relation between them. The difficulty of relation extraction becomes clear when we consider the myriad of ways the same relation can be expressed in natural language. Much of the current works in relation extraction assume that entities are known at extraction time, thus treating entity recognition as an entirely separate and independent task. However, recent studies have shown that entity recognition and relation extraction, when modeled together as interdependent tasks, can lead to overall improvements in extraction accuracy. When modeled in such a manner, the task is referred to as end-to-end relation extraction. In this work, we present four studies that introduce incrementally sophisticated architectures designed to tackle the task of end-to-end relation extraction. In the first study, we present a pipeline approach for extracting protein-protein interactions as affected by particular mutations. The pipeline system makes use of recurrent neural networks for protein detection, lexicons for gene normalization, and convolutional neural networks for relation extraction. In the second study, we show that a multi-task learning framework, with parameter sharing, can achieve state-of-the-art results for drug-drug interaction extraction. At its core, the model uses graph convolutions, with a novel attention-gating mechanism, over dependency parse trees. In the third study, we present a more efficient and general-purpose end-to-end neural architecture designed around the idea of the table-filling paradigm; for an input sentence of length n, all entities and relations are extracted in a single pass of the network in an indirect fashion by populating the cells of a corresponding n by n table using metric-based features. We show that this approach excels in both the general English and biomedical domains with extraction times that are up to an order of magnitude faster compared to the prior best. In the fourth and last study, we present an architecture for relation extraction that, in addition to being end-to-end, is able to handle cross-sentence and N-ary relations. Overall, our work contributes to the advancement of modern information extraction by exploring end-to-end solutions that are fast, accurate, and generalizable to many high-value domains
    corecore