Search CORE

5,141 research outputs found

Academic freedom: a research bibliography

Author: Karran Terence
Publication venue: Unpublished monograph
Publication date: 22/01/2009
Field of study

A generic bibliography containing over 1000 references (books, journal articles, monographs, conference papers, reports, etc.) on the subject of academic freedom in higher education, from various countries (Australia, Canada, Europe, the UK and the USA, etc.) which cover (inter alia) the genesis and history of the concept, intellectual and artistic freedom, the legal interpretation of academic freedom, individual and institutional academic freedom, the responsibilities and duties associated with academic freedom, etc

University of Lincoln Institutional Repository

The structure of the photon and its interactions

Author: Surrow Bernd
Publication venue
Publication date: 01/01/1998
Field of study

The OPAL experiment at LEP has performed a variety of measurements of photon-photon and electron-photon scattering at the electron-positron collider LEP to gain a deeper insight into the structure of the photon and its interactions. This review presents a summary of these results.Comment: 18 pages with 20 eps files. Invited talk given at the Workshop on Photon Interactions and the Photon Structure, Lund, Sweden, September 10-13, 1998 to be published in the proceeding

arXiv.org e-Print Archive

CERN Document Server

Efficient Algorithms for Fast Integration on Large Data Sets from Multiple Sources

Author: Aseltine Robert H.
Mi Tian
Rajasekaran Sanguthevar
Publication venue: OpenCommons@UConn
Publication date: 28/06/2012
Field of study

Background Recent large scale deployments of health information technology have created opportunities for the integration of patient medical records with disparate public health, human service, and educational databases to provide comprehensive information related to health and development. Data integration techniques, which identify records belonging to the same individual that reside in multiple data sets, are essential to these efforts. Several algorithms have been proposed in the literatures that are adept in integrating records from two different datasets. Our algorithms are aimed at integrating multiple (in particular more than two) datasets efficiently. Methods Hierarchical clustering based solutions are used to integrate multiple (in particular more than two) datasets. Edit distance is used as the basic distance calculation, while distance calculation of common input errors is also studied. Several techniques have been applied to improve the algorithms in terms of both time and space: 1) Partial Construction of the Dendrogram (PCD) that ignores the level above the threshold; 2) Ignoring the Dendrogram Structure (IDS); 3) Faster Computation of the Edit Distance (FCED) that predicts the distance with the threshold by upper bounds on edit distance; and 4) A pre-processing blocking phase that limits dynamic computation within each block. Results We have experimentally validated our algorithms on large simulated as well as real data. Accuracy and completeness are defined stringently to show the performance of our algorithms. In addition, we employ a four-category analysis. Comparison with FEBRL shows the robustness of our approach. Conclusions In the experiments we conducted, the accuracy we observed exceeded 90% for the simulated data in most cases. 97.7% and 98.1% accuracy were achieved for the constant and proportional threshold, respectively, in a real dataset of 1,083,878 records

Crossref

Springer - Publisher Connector

PubMed Central

DigitalCommons@UConn

OpenCommons at University of Connecticut

Seventh Biennial Report : June 2003 - March 2005

Author
Publication venue: Max-Planck-Institut für Informatik
Publication date: 01/01/2005
Field of study

MPG.PuRe

Об эффективности минимизирующего подхода к оптимизации запросов

Author: N. Mendkovich
Н. А. Мендкович
Publication venue: 'P.G. Demidov Yaroslavl State University'
Publication date: 20/04/2016
Field of study

A standard problem of DBMSs usage is a lack of efficiency and high cost of the access to the stored data. The acceptable level of system performance may be achieved by query optimization technics that determine the most efficient way to execute a given query by its modification and considering possible query execution plans. The goal of this paper is to prove the efficiency of the query minimization algorithms based on minimization of the query restriction by elimination of the redundant conditions. The paper represents minimization algorithms based on the mathematical transformations, which detect and remove redundant conditions from query restriction to simplify it. It includes minimization algorithms based on “condition absorption”, prime implicants, and a set of linear inequalities minimization technics. The paper also includes theoretical justification of the efficiency of minimization approach to the query optimization based on restriction simplification. We also observe experimental results of the implementation of these optimization techniques and their influence on the query processing speed. In the end, we represent an observation of the query minimization impact on the whole optimization process Стандартной проблемой использования СУБД является недостаток эффективности и высокая стоимость доступа к хранимым данным. Допустимый уровень работы системы может достигаться с помощью технологий оптимизации запросов, определяющих наиболее эффективный способ выполнения конкретного запроса с помощью его модификации и определения возможных планов выполнения. Целью данной работы является доказательство эффективности алгоритмов минимизации запроса, основанных на минимизации ограничения запроса и удаления избыточных условий. Статья представляет алгоритмы минимизации, основанные на математических преобразованиях, определяющих и удаляющих избыточные условия из ограничения запроса, чтобы упростить его. Она включает алгоритмы, основанные на технологиях «поглощения условий», первичных импликант и минимизации множеств линейных неравенств. Работа также включает теоретическое доказательство эффективности минимизирующего подхода, основанного на упрощении ограничения. Мы также рассматриваем экспериментальные результаты применения этих технологий оптимизации и их влияния на скорость обработки запроса. В конце мы представляем обзор влияния минимизации запроса на весь процесс оптимизации запроса.

Modeling and Analysis of Information Systems / Моделирование и анализ информационных систем (МАИС)

SAT Competition 2020

Author: Froleyks Nils
Heule Marijn
Iser Markus
Järvisalo Matti
Suda Martin
Publication venue: Elsevier
Publication date: 31/08/2021
Field of study

The SAT Competitions constitute a well-established series of yearly open international algorithm implementation competitions, focusing on the Boolean satisfiability (or propositional satisfiability, SAT) problem. In this article, we provide a detailed account on the 2020 instantiation of the SAT Competition, including the new competition tracks and benchmark selection procedures, overview of solving strategies implemented in top-performing solvers, and a detailed analysis of the empirical data obtained from running the competition

KITopen

SAT Competition 2020

Author: Froleyks Nils
Heule Marijn
Iser Markus
Järvisalo Matti
Suda Martin
Publication venue
Publication date: 31/08/2021
Field of study

KITopen

Helsingin yliopiston digitaalinen arkisto

MWAND: A New Early Termination Algorithm for Fast and Efficient Query Evaluation

Author: Lougmiri Zekri
Mansouria Zemani Imene
Mohamed Senouci
Publication venue: 'Universidad Internacional de La Rioja'
Publication date: 17/03/2022
Field of study

Nowadays, current information systems are so large and maintain huge amount of data. At every time, they process millions of documents and millions of queries. In order to choose the most important responses from this amount of data, it is well to apply what is so called early termination algorithms. These ones attempt to extract the Top-K documents according to a specified increasing monotone function. The principal idea behind is to reach and score the most significant less number of documents. So, they avoid fully processing the whole documents. WAND algorithm is at the state of the art in this area. Despite it is efficient, it is missing effectiveness and precision. In this paper, we propose two contributions, the principal proposal is a new early termination algorithm based on WAND approach, we call it MWAND (Modified WAND). This one is faster and more precise than the first. It has the ability to avoid unnecessary WAND steps. In this work, we integrate a tree structure as an index into WAND and we add new levels in query processing. In the second contribution, we define new fine metrics to ameliorate the evaluation of the retrieved information. The experimental results on real datasets show that MWAND is more efficient than the WAND approach

Re-UNIR

Data-efficient methods for information extraction

Author: Yaseen Usama
Publication venue: Ludwig-Maximilians-Universität München
Publication date: 15/02/2023
Field of study

Strukturierte Wissensrepräsentationssysteme wie Wissensdatenbanken oder Wissensgraphen bieten Einblicke in Entitäten und Beziehungen zwischen diesen Entitäten in der realen Welt. Solche Wissensrepräsentationssysteme können in verschiedenen Anwendungen der natürlichen Sprachverarbeitung eingesetzt werden, z. B. bei der semantischen Suche, der Beantwortung von Fragen und der Textzusammenfassung. Es ist nicht praktikabel und ineffizient, diese Wissensrepräsentationssysteme manuell zu befüllen. In dieser Arbeit entwickeln wir Methoden, um automatisch benannte Entitäten und Beziehungen zwischen den Entitäten aus Klartext zu extrahieren. Unsere Methoden können daher verwendet werden, um entweder die bestehenden unvollständigen Wissensrepräsentationssysteme zu vervollständigen oder ein neues strukturiertes Wissensrepräsentationssystem von Grund auf zu erstellen. Im Gegensatz zu den gängigen überwachten Methoden zur Informationsextraktion konzentrieren sich unsere Methoden auf das Szenario mit wenigen Daten und erfordern keine große Menge an kommentierten Daten. Im ersten Teil der Arbeit haben wir uns auf das Problem der Erkennung von benannten Entitäten konzentriert. Wir haben an der gemeinsamen Aufgabe von Bacteria Biotope 2019 teilgenommen. Die gemeinsame Aufgabe besteht darin, biomedizinische Entitätserwähnungen zu erkennen und zu normalisieren. Unser linguistically informed Named-Entity-Recognition-System besteht aus einem Deep-Learning-basierten Modell, das sowohl verschachtelte als auch flache Entitäten extrahieren kann; unser Modell verwendet mehrere linguistische Merkmale und zusätzliche Trainingsziele, um effizientes Lernen in datenarmen Szenarien zu ermöglichen. Unser System zur Entitätsnormalisierung verwendet String-Match, Fuzzy-Suche und semantische Suche, um die extrahierten benannten Entitäten mit den biomedizinischen Datenbanken zu verknüpfen. Unser System zur Erkennung von benannten Entitäten und zur Entitätsnormalisierung erreichte die niedrigste Slot-Fehlerrate von 0,715 und belegte den ersten Platz in der gemeinsamen Aufgabe. Wir haben auch an zwei gemeinsamen Aufgaben teilgenommen: Adverse Drug Effect Span Detection (Englisch) und Profession Span Detection (Spanisch); beide Aufgaben sammeln Daten von der Social Media Plattform Twitter. Wir haben ein Named-Entity-Recognition-Modell entwickelt, das die Eingabedarstellung des Modells durch das Stapeln heterogener Einbettungen aus verschiedenen Domänen verbessern kann; unsere empirischen Ergebnisse zeigen komplementäres Lernen aus diesen heterogenen Einbettungen. Unser Beitrag belegte den 3. Platz in den beiden gemeinsamen Aufgaben. Im zweiten Teil der Arbeit untersuchten wir Strategien zur Erweiterung synthetischer Daten, um ressourcenarme Informationsextraktion in spezialisierten Domänen zu ermöglichen. Insbesondere haben wir backtranslation an die Aufgabe der Erkennung von benannten Entitäten auf Token-Ebene und der Extraktion von Beziehungen auf Satzebene angepasst. Wir zeigen, dass die Rückübersetzung sprachlich vielfältige und grammatikalisch kohärente synthetische Sätze erzeugen kann und als wettbewerbsfähige Erweiterungsstrategie für die Aufgaben der Erkennung von benannten Entitäten und der Extraktion von Beziehungen dient. Bei den meisten realen Aufgaben zur Extraktion von Beziehungen stehen keine kommentierten Daten zur Verfügung, jedoch ist häufig ein großer unkommentierter Textkorpus vorhanden. Bootstrapping-Methoden zur Beziehungsextraktion können mit diesem großen Korpus arbeiten, da sie nur eine Handvoll Startinstanzen benötigen. Bootstrapping-Methoden neigen jedoch dazu, im Laufe der Zeit Rauschen zu akkumulieren (bekannt als semantische Drift), und dieses Phänomen hat einen drastischen negativen Einfluss auf die endgültige Genauigkeit der Extraktionen. Wir entwickeln zwei Methoden zur Einschränkung des Bootstrapping-Prozesses, um die semantische Drift bei der Extraktion von Beziehungen zu minimieren. Unsere Methoden nutzen die Graphentheorie und vortrainierte Sprachmodelle, um verrauschte Extraktionsmuster explizit zu identifizieren und zu entfernen. Wir berichten über die experimentellen Ergebnisse auf dem TACRED-Datensatz für vier Relationen. Im letzten Teil der Arbeit demonstrieren wir die Anwendung der Domänenanpassung auf die anspruchsvolle Aufgabe der mehrsprachigen Akronymextraktion. Unsere Experimente zeigen, dass die Domänenanpassung die Akronymextraktion in wissenschaftlichen und juristischen Bereichen in sechs Sprachen verbessern kann, darunter auch Sprachen mit geringen Ressourcen wie Persisch und Vietnamesisch.The structured knowledge representation systems such as knowledge base or knowledge graph can provide insights regarding entities and relationship(s) among these entities in the real-world, such knowledge representation systems can be employed in various natural language processing applications such as semantic search, question answering and text summarization. It is infeasible and inefficient to manually populate these knowledge representation systems. In this work, we develop methods to automatically extract named entities and relationships among the entities from plain text and hence our methods can be used to either complete the existing incomplete knowledge representation systems to create a new structured knowledge representation system from scratch. Unlike mainstream supervised methods for information extraction, our methods focus on the low-data scenario and do not require a large amount of annotated data. In the first part of the thesis, we focused on the problem of named entity recognition. We participated in the shared task of Bacteria Biotope 2019, the shared task consists of recognizing and normalizing the biomedical entity mentions. Our linguistically informed named entity recognition system consists of a deep learning based model which can extract both nested and flat entities; our model employed several linguistic features and auxiliary training objectives to enable efficient learning in data-scarce scenarios. Our entity normalization system employed string match, fuzzy search and semantic search to link the extracted named entities to the biomedical databases. Our named entity recognition and entity normalization system achieved the lowest slot error rate of 0.715 and ranked first in the shared task. We also participated in two shared tasks of Adverse Drug Effect Span detection (English) and Profession Span Detection (Spanish); both of these tasks collect data from the social media platform Twitter. We developed a named entity recognition model which can improve the input representation of the model by stacking heterogeneous embeddings from a diverse domain(s); our empirical results demonstrate complementary learning from these heterogeneous embeddings. Our submission ranked 3rd in both of the shared tasks. In the second part of the thesis, we explored synthetic data augmentation strategies to address low-resource information extraction in specialized domains. Specifically, we adapted backtranslation to the token-level task of named entity recognition and sentence-level task of relation extraction. We demonstrate that backtranslation can generate linguistically diverse and grammatically coherent synthetic sentences and serve as a competitive augmentation strategy for the task of named entity recognition and relation extraction. In most of the real-world relation extraction tasks, the annotated data is not available, however, quite often a large unannotated text corpus is available. Bootstrapping methods for relation extraction can operate on this large corpus as they only require a handful of seed instances. However, bootstrapping methods tend to accumulate noise over time (known as semantic drift) and this phenomenon has a drastic negative impact on the final precision of the extractions. We develop two methods to constrain the bootstrapping process to minimise semantic drift for relation extraction; our methods leverage graph theory and pre-trained language models to explicitly identify and remove noisy extraction patterns. We report the experimental results on the TACRED dataset for four relations. In the last part of the thesis, we demonstrate the application of domain adaptation to the challenging task of multi-lingual acronym extraction. Our experiments demonstrate that domain adaptation can improve acronym extraction within scientific and legal domains in 6 languages including low-resource languages such as Persian and Vietnamese

Digitale Hochschulschriften der LMU