Search CORE

10 research outputs found

PrimeQA: The Prime Repository for State-of-the-Art Multilingual Question Answering Research and Development

Author: Bhat Riyaz
Bornea Mihaela
Fadnis Kshitij
Florian Radu
Franz Martin
Iyer Bhavani
Kumar Vishwajeet
Li Yulong
McCarley Scott
Rosenthal Sara
Roukos Salim
Sen Jaydeep
Sil Avirup
Sultan Md Arafat
Zhang Rong
Publication venue
Publication date: 25/01/2023
Field of study

The field of Question Answering (QA) has made remarkable progress in recent years, thanks to the advent of large pre-trained language models, newer realistic benchmark datasets with leaderboards, and novel algorithms for key components such as retrievers and readers. In this paper, we introduce PRIMEQA: a one-stop and open-source QA repository with an aim to democratize QA re-search and facilitate easy replication of state-of-the-art (SOTA) QA methods. PRIMEQA supports core QA functionalities like retrieval and reading comprehension as well as auxiliary capabilities such as question generation.It has been designed as an end-to-end toolkit for various use cases: building front-end applications, replicating SOTA methods on pub-lic benchmarks, and expanding pre-existing methods. PRIMEQA is available at : https://github.com/primeqa

arXiv.org e-Print Archive

Τεχνικές και συστήματα συνεχούς και κατανεμημένης επεξεργασίας σε βάσεις δεδομένων

Author: Bornea Mihaela-Ancuta
Publication venue: 'National Documentation Centre (EKT)'
Publication date: 01/01/2010
Field of study

Modern information processing is moving into a realm where we often need to process large amounts of data on daily basis, as a result of increasing monitoring of environmental parameters, market transactions, web activity, and other sources of valuable information. The availability of this data establishes new business and application scenarios with increasing performance requirements. Current database engines are expected to analyze the data while providing a high query throughput. In order to accommodate user needs, it is necessary to reconsider the role of several database components in a new light. In particular, advances of the state of the art need to be introduced in the query processing, transaction processing and index management areas. Advances in the query processing module of the database engine are triggered by the need to accommodate new scenarios. While traditional query processing is designed with the assumption that data is always available, modern database systems should be ready to process various types of incoming, streamed data, often arriving at uncontrollable rates or experiencing variable delays. New query processing algorithms, and in particular join algorithms, need to be designed in order to retrieve valuable information from this type of input as soon as incoming data is available. This thesis addresses two common operations: the join of two (or more) data inputs streamed in by autonomous sources and the join of a data stream with a disk-resident relation. The transaction processing system is nowadays faced with new challenges as a result of data replication. Data replication introduces the potential for higher availability and better performance compared to the traditional centralized management, by sharing the database load between several engines. However, maintaining consistency amongst the replicas in the presence of updates is far from trivial. Transaction processing needs to be governed by novel concurrency control mechanisms suitable for data replication. This thesis shows how to provide transactional properties, atomicity, consistency, isolation and durability, in a replicated database in an efficient and scalable manner. Selecting indices that are appropriate for the workload is an important task for the database, since indices directly impact the performance of query processing. It is also an interesting optimization problem, as indices may help some parts of the workload, while requiring maintenance overhead when the data is updated. As database applications become increasingly complex, index tuning becomes more and more challenging for database administrators. This thesis introduces an online technique that captures the relevant changes in the workload and manages the indexes in parallel with query processing.Η επεξεργασία πληροφοριών στις μέρες μας απαιτεί συχνά την καθημερινή επεξεργασία μεγάλων όγκων δεδομένων, όπως αυτά προκύπτουν από τη συνεχώς αυξανόμενη παρακολούθηση περιβαλλοντικών παραμέτρων, επιχειρηματικών συναλλαγών, δραστηριοτήτων στο διαδίκτυο, καθώς και άλλων σημαντικών πληροφοριών που προέρχονται από διάφορες πηγές. Το κρίσιμο ζήτημα της διαθεσιμότητας των δεδομένων αυτών οδηγεί σε νέες εφαρμογές που χαρακτηρίζονται από ιδιαίτερα υψηλές απαιτήσεις στην απόδοση. Οι σύγχρονες βάσεις δεδομένων καλούνται να πραγματοποιούν την ανάλυση των δεδομένων αυτών, ενώ την ίδια στιγμή πρέπει να είναι σε θέση να εξυπηρετούν μεγάλο όγκο επερωτήσεων χωρίς καθυστερήσεις. Για να ικανοποιηθούν οι ανάγκες των χρηστών, πολλά επιμέρους στοιχεία των βάσεων δεδομένων πρέπει να μελετηθούν εκ νέου και υπό το νέο πρίσμα των σύγχρονων αναγκών. Πιο συγκεκριμένα, απαιτούνται νέες προσεγγίσεις στις επιμέρους περιοχές της επεξεργασίας επερωτήσεων, της επεξεργασίας συναλλαγών και της διαχείρισης ευρετηρίων. Στην περιοχή της επεξεργασίας επερωτήσεων, οι παραδοσιακές βάσεις δεδομένων έχουν σχεδιαστεί θεωρώντας ότι τα δεδομένα είναι πάντα διαθέσιμα. Τα σύγχρονα συστήματα βάσεων δεδομένων, όμως, πρέπει να είναι σε θέση να επεξεργαστούν διάφορους τύπους εισερχομένων δεδομένων, ροών δεδομένων, που συχνά εισέρχονται στο σύστημα με απρόβλεπτο ρυθμό ή/και εμφανίζοντας μεγάλες καθυστερήσεις στην άφιξή τους. Για το λόγο αυτό καθίσταται αναγκαία η ανάπτυξη νέων αλγορίθμων σύζευξης για την ανάκτηση σημαντικής πληροφορίας από τέτοιου είδους δεδομένα και μάλιστα από την πρώτη στιγμή που ξεκινά η εισαγωγή τους στο σύστημα. Στην παρούσα διατριβή αντιμετωπίζονται αρχικά δύο βασικές λειτουργίες, η σύζευξη δύο ή περισσοτέρων ροών δεδομένων προερχόμενες από αυτόνομες πηγές, καθώς και η σύζευξη μίας ροής δεδομένων με μία συσχέτιση που βρίσκεται στο δίσκο του συστήματος. Παράλληλα, η δημιουργία αντιγράφων δεδομένων εισάγει νέες προκλήσεις στη διαχείριση των συναλλαγών. Η ύπαρξη των αντιγράφων των δεδομένων οδηγεί σε μεγαλύτερη διαθεσιμότητα και σε καλύτερες αποδόσεις στα σύγχρονα συστήματα συγκρινόμενα με τις παραδοσιακές βάσεις δεδομένων μέσω του διαμοιρασμού του φόρτου της βάσης σε διαφορετικές μηχανές. Ένα σημαντικό ζήτημα που τίθεται όμως στις περιπτώσεις αυτές είναι η διατήρηση της συνέπειας ανάμεσα στα αντίγραφα αυτά, καθώς τα δεδομένα ανανεώνονται συνεχώς. Κρίνεται λοιπόν απαραίτητη η δημιουργία νέων μεθόδων ελέγχου του συγχρονισμού των αντιγράφων αυτών. Υιοθετώντας την προσέγγιση των βάσεων δεδομένων που χρησιμοποιούν αντίγραφα, αναπτύξαμε στην παρούσα διατριβή αποδοτικούς μηχανισμούς για τη διατήρηση των απαραίτητων ιδιοτήτων των αντιγράφων: ατομικότητα, συνέπεια, απομόνωση, ανθεκτικότητα. Τέλος, η επιλογή των κατάλληλων ευρετηρίων αποτελεί σημαντικό ζήτημα στη διαχείριση των βάσεων δεδομένων, λόγω της μεγάλης εξάρτησης που παρουσιάζει η απόδοση της επεξεργασίας επερωτήσεων από τη χρήση ευρετηρίων. Η επιλογή των ευρετηρίων αποτελεί ένα ιδιαίτερα ενδιαφέρον πρόβλημα βελτιστοποίησης, καθώς αυτά μπορούν να εξυπηρετούν ένα μέρος του φόρτου επερωτήσεων απαιτώντας όμως ταυτόχρονα επιπλέον διαχειριστικό κόστος κάθε φορά που τα δεδομένα ανανεώνονται. Στις σύγχρονες βάσεις δεδομένων που εμφανίζουν μεγάλη πολυπλοκότητα, η επιτυχής διαχείριση των ευρετηρίων αποτελεί σημαντική πρόκληση για τους διαχειριστές των συστημάτων αυτών. Στην παρούσα διατριβή παρουσιάζουμε μια συνεχή τεχνική που αντιλαμβάνεται τις αλλαγές στα δεδομένα που σχετίζονται με το φόρτο του συστήματος και διαχειρίζεται αποδοτικά τα ευρετήρια, ενώ παράλληλα επιτρέπει την αδιάλειπτη επεξεργασία των επερωτήσεων

Hellenic National Archive of Doctoral Dissertations

Double Index NEsted-loop Reactive Join for Result Rate Optimization

Author: Bornea Mihaela A. Vassalos, Vasilis Kotidis, Yannis and Deligiannakis, Antonios
Publication venue: IEEE Comput. Soc
Publication date: 01/01/2009
Field of study

Adaptive join algorithms have recently attracted a lot of attention in emerging applications where data is provided by autonomous data sources through heterogeneous network environments. Their main advantage over traditional join techniques is that they can start producing join results as soon as the first input tuples are available, thus improving pipelining by smoothing join result production and by masking source or network delays. In this paper we propose Double Index NEsted-loops Reactive join (DINER), a new adaptive join algorithm for result rate maximization. DINER combines two key elements: an intuitive flushing policy that aims to increase the productivity of in-memory tuples in producing results during the online phase of the join, and a novel re-entrant join technique that allows the algorithm to rapidly switch between processing in-memory and disk-resident tuples, thus better exploiting temporary delays when new data is not available. Our experiments using real and synthetic data sets demonstrate that DINER outperforms previous adaptive join algorithms in producing result tuples at a significantly higher rate, while making better use of the available memory

Pergamos : Unified Institutional Repository / Digital Library Platform of the National and Kapodistrian University of Athens

Double Index NEsted-Loop Reactive Join for Result Rate Optimization

Author: Antonios Deligiannakis
Mihaela A. Bornea
Vasilis Vassalos
Yannis Kotidis
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2009
Field of study

Abstract — Adaptive join algorithms have recently attracted a lot of attention in emerging applications where data is provided by autonomous data sources through heterogeneous network environments. Their main advantage over traditional join tech-niques is that they can start producing join results as soon as the first input tuples are available, thus improving pipelining by smoothing join result production and by masking source or network delays. In this paper we propose Double Index NEsted-loops Reactive join (DINER), a new adaptive join algorithm for result rate maximization. DINER combines two key elements: an intuitive flushing policy that aims to increase the productivity of in-memory tuples in producing results during the online phase of the join, and a novel re-entrant join technique that allows the algorithm to rapidly switch between processing in-memory and disk-resident tuples, thus better exploiting temporary delays when new data is not available. Our experiments using real and synthetic data sets demonstrate that DINER outperforms previous adaptive join algorithms in producing result tuples at a significantly higher rate, while making better use of the available memory. I

CiteSeerX

Crossref

Multilingual Transfer Learning for QA using Translation as Data Augmentation

Author: Bornea Mihaela
Florian Radu
Pan Lin
Rosenthal Sara
Sil Avirup
Publication venue: Association for the Advancement of Artificial Intelligence
Publication date: 10/12/2020
Field of study

Prior work on multilingual question answering has mostly focused on using large multilingual pre-trained language models (LM) to perform zero-shot language-wise learning: train a QA model on English and test on other languages. In this work, we explore strategies that improve cross-lingual transfer by bringing the multilingual embeddings closer in the semantic space. Our first strategy augments the original English training data with machine translation-generated data. This results in a corpus of multilingual silver-labeled QA pairs that is 14 times larger than the original training set. In addition, we propose two novel strategies, language adversarial training and language arbitration framework, which significantly improve the (zero-resource) cross-lingual transfer performance and result in LM embeddings that are less language-variant. Empirically, we show that the proposed models outperform the previous zero-shot baseline on the recently introduced multilingual MLQA and TyDiQA datasets

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

GAAMA 2.0: An Integrated System That Answers Boolean and Extractive Questions

Author: Bornea Mihaela
Ferritto Anthony
Florian Radu
McCarley Scott
Rosenthal Sara
Sil Avirup
Sultan Md Arafat
Publication venue: Association for the Advancement of Artificial Intelligence
Publication date: 06/09/2023
Field of study

Recent machine reading comprehension datasets include extractive and boolean questions but current approaches do not offer integrated support for answering both question types. We present a front-end demo to a multilingual machine reading comprehension system that handles boolean and extractive questions. It provides a yes/no answer and highlights the supporting evidence for boolean questions. It provides an answer for extractive questions and highlights the answer in the passage. Our system, GAAMA 2.0, achieved first place on the TyDi QA leaderboard at the time of submission. We contrast two different implementations of our approach: including multiple transformer models for easy deployment, and a shared transformer model utilizing adapters to reduce GPU memory footprint for a resource-constrained environment

Association for the Advancement of Artificial Intelligence: AAAI Publications

Money laundering regulatory risk evaluation using Bitmap Index-based Decision Tree

Author: Bornea Mihaela A.
Castellón González Pamela
Eldin Helmy Tamer Hossam
Flores Denys A.
Jayasree Vikas
Jayasree Vikas
Laxmaiah M.
Laxmaiah M.
Luo Xingrong
Möser Malte
Nikoloska Svetlana
Phua Clifton
Pulakkazhy Sreekumar
Roberto Cortinas
Roberto Cortinas
Suresh C.H.
Weibing Peng
Zareapoo Masoumeh
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref