Search CORE

11 research outputs found

Performance Comparison Analysis of ArangoDB, MySQL, and Neo4j: An Experimental Study of Querying Connected Data

Author: Asplund Einar
Ayele Workneh Yilma
Duneld Martin
Sandell Johan
Publication venue
Publication date: 03/01/2024
Field of study

ScholarSpace at University of Hawai'i at Manoa

Performance Comparison Analysis of ArangoDB, MySQL, and Neo4j: An Experimental Study of Querying Connected Data

Author: Asplund Einar
Ayele Workneh Yilma
Duneld Martin
Sandell Johan
Publication venue
Publication date: 30/01/2024
Field of study

Choosing and developing performant database solutions helps organizations optimize their operational practices and decision-making. Since graph data is becoming more common, it is crucial to develop and use them in big data with complex relationships with high and consistent performance. However, legacy database technologies such as MySQL are tailored to store relational databases and need to perform more complex queries to retrieve graph data. Previous research has dealt with performance aspects such as CPU and memory usage. In contrast, energy usage and temperature of the servers are lacking. Thus, this paper evaluates and compares state-of-the-art graphs and relational databases from the performance aspects to allow a more informed selection of technologies. Graph-based big data applications benefit from informed selection database technologies for data retrieval and analytics problems. The results show that Neo4j performs faster in querying connected data than MySQL and ArangoDB, and energy, CPU, and memory usage performances are reported in this paper.Comment: https://hdl.handle.net/10125/10731

arXiv.org e-Print Archive

Negation Scope Delimitation in Clinical Text Using Three Approaches: NegEx, PyConTextNLP and SynNeg

Author: Hercules Dalianis
Hideyuki Tanushi
Maria Kvist
Maria Skeppstedt
Martin Duneld
Sumithra Velupillai
Publication venue
Publication date: 24/04/2020
Field of study

ABSTRACT Negation detection is a key component in clinical information extraction systems, as health record text contains reasonings in which the physician excludes different diagnoses by negating them. Many systems for negation detection rely on negation cues (e.g. not), but only few studies have investigated if the syntactic structure of the sentences can be used for determining the scope of these cues. We have in this paper compared three different systems for negation detection in Swedish clinical text (NegEx, PyConTextNLP and SynNeg), which have different approaches for determining the scope of negation cues. NegEx uses the distance between the cue and the disease, PyConTextNLP relies on a list of conjunctions limiting the scope of a cue, and in SynNeg the boundaries of the sentence units, provided by a syntactic parser, limit the scope of the cues. The three systems produced similar results, detecting negation with an F-score of around 80%, but using a parser had advantages when handling longer, complex sentences or short sentences with contradictory statements

CiteSeerX

Beyond Benchmarks: Spotting Key Topical Sentences While Improving Automated Essay Scoring Performance with Topic-Aware BERT

Author: Aron Henriksson
Jalal Nouri
Martin Duneld
Xiu Li
Yongchao Wu
Publication venue: 'MDPI AG'
Publication date: 01/12/2022
Field of study

Automated Essay Scoring (AES) automatically allocates scores to essays at scale and may help teachers reduce the heavy burden during grading activities. Recently, researchers have deployed neural-based AES approaches to improve upon the state-of-the-art AES performance. These neural-based AES methods mainly take student essays as the sole input and focus on learning the relationship between student essays and essay scores through deep neural networks. However, their only product, the predicted holistic score, is far from providing adequate pedagogical information, such as automated writing evaluation (AWE). In this work, we propose Topic-aware BERT, a new method of learning relations among scores, student essays, as well as topical information in essay instructions. Beyond improving the AES benchmark performance, Topic-aware BERT can automatically retrieve key topical sentences in student essays by probing self-attention maps in intermediate layers. We evaluate the performance of Topic-aware BERT of different variants to (i) perform AES and (ii) retrieve key topical sentences using the open dataset Automated Student Assessment Prize and a manually annotated dataset. Our experiments show that Topic-aware BERT achieves a strong AES performance compared with the previous best neural-based AES methods and demonstrates effectiveness in identifying key topical sentences in argumentative essays

Publikationer från Stockholms universitet

Directory of Open Access Journals

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Evaluating Embeddings from Pre-Trained Language Models and Knowledge Graphs for Educational Content Recommendation

Author: Duneld Martin
Henriksson Aron
Li Xiu
Nouri Jalal
Wu Yongchao
Publication venue: Stockholms universitet, Institutionen för data- och systemvetenskap
Publication date: 01/01/2024
Field of study

Educational content recommendation is a cornerstone of AI-enhanced learning. In particular, to facilitate navigating the diverse learning resources available on learning platforms, methods are needed for automatically linking learning materials, e.g. in order to recommend textbook content based on exercises. Such methods are typically based on semantic textual similarity (STS) and the use of embeddings for text representation. However, it remains unclear what types of embeddings should be used for this task. In this study, we carry out an extensive empirical evaluation of embeddings derived from three different types of models: (i) static embeddings trained using a concept-based knowledge graph, (ii) contextual embeddings from a pre-trained language model, and (iii) contextual embeddings from a large language model (LLM). In addition to evaluating the models individually, various ensembles are explored based on different strategies for combining two models in an early vs. late fusion fashion. The evaluation is carried out using digital textbooks in Swedish for three different subjects and two types of exercises. The results show that using contextual embeddings from an LLM leads to superior performance compared to the other models, and that there is no significant improvement when combining these with static embeddings trained using a knowledge graph. When using embeddings derived from a smaller language model, however, it helps to combine them with knowledge graph embeddings. The performance of the best-performing model is high for both types of exercises, resulting in a mean Recall@3 of 0.96 and 0.95 and a mean MRR of 0.87 and 0.86 for quizzes and study questions, respectively, demonstrating the feasibility of using STS based on text embeddings for educational content recommendation. The ability to link digital learning materials in an unsupervised manner -- relying only on readily available pre-trained models -- facilitates the development of AI-enhanced learning

Publikationer från Stockholms universitet

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Synonym Extraction and Abbreviation Expansion with Ensembles of Semantic Spaces

Author: Daudaravicius Vidas
Duneld Martin
Henriksson Aron
Moen Hans
Skeppstedt Maria
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

Background: Terminologies that account for variation in language use by linking synonyms and abbreviations to their corresponding concept are important enablers of high-quality information extraction from medical texts. Due to the use of specialized sub-languages in the medical domain, manual construction of semantic resources that accurately reflect language use is both costly and challenging, often resulting in low coverage. Although models of distributional semantics applied to large corpora provide a potential means of supporting development of such resources, their ability to isolate synonymy from other semantic relations is limited. Their application in the clinical domain has also only recently begun to be explored. Combining distributional models and applying them to different types of corpora may lead to enhanced performance on the tasks of automatically extracting synonyms and abbreviation-expansion pairs. Results: A combination of two distributional models – Random Indexing and Random Permutation – employed in conjunction with a single corpus outperforms using either of the models in isolation. Furthermore, combining semantic spaces induced from different types of corpora – a corpus of clinical text and a corpus of medical journal articles – further improves results, outperforming a combination of semantic spaces induced from a single source, as well as a single semantic space induced from the conjoint corpus. A combination strategy that simply sums the cosine similarity scores of candidate terms is generally the most profitable out of the ones explored. Finally, applying simple post-processing filtering rules yields substantial performance gains on the tasks of extracting abbreviation-expansion pairs, but not synonyms. The best results, measured as recall in a list of ten candidate terms, for the three tasks are: 0.39 for abbreviations to long forms, 0.33 for long forms to abbreviations, and 0.47 for synonyms. Conclusions: This study demonstrates that ensembles of semantic spaces can yield improved performance on the tasks of automatically extracting synonyms and abbreviation-expansion pairs. This notion, which merits further exploration, allows different distributional models – with different model parameters – and different types of corpora to be combined, potentially allowing enhanced performance to be obtained on a wide range of natural language processing tasks

Springer - Publisher Connector

NORA - Norwegian Open Research Archives

Uncertainty Detection as Approximate Max-Margin Sequence Labelling

Author: Dalianis Hercules
Duneld Martin
Eriksson Gunnar
Karlgren Jussi
Täckström Oscar
Velupillai Sumithra Ulrika
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2010
Field of study

This paper reports experiments for the CoNLL 2010 shared task on learning to detect hedges and their scope in natural language text. We have addressed the experimental tasks as supervised linear maximum margin prediction problems. For sentence level hedge detection in the biological domain we use an L1-regularised binary support vector machine, while for sentence level weasel detection in the Wikipedia domain, we use an L2-regularised approach. We model the in-sentence uncertainty cue and scope detection task as an L2-regularised approximate maximum margin sequence labelling problem, using the BIO-encoding. In addition to surface level features, we use a variety of linguistic features based on a functional dependency analysis. A greedy forward selection strategy is used in exploring the large set of potential features. Our official results for Task 1 for the biological domain are 85.2 F1-score, for the Wikipedia set 55.4 F1-score. For Task 2, our official results are 2.1 for the entire task with a score of 62.5 for cue detection. After resolving errors and final bugs, our final results are for Task 1, biological: 86.0, Wikipedia: 58.2; Task 2, scopes: 39.6 and cues: 78.5

Publikationer från Stockholms universitet

Uncertainty Detection as Approximate Max-Margin Sequence Labelling

Author: Dalianis Hercules
Duneld Martin
Eriksson Gunnar
Karlgren Jussi
Täckström Oscar
Velupillai Sumithra Ulrika
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2010
Field of study

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Negation Scope Delimitation in Clinical Text Using Three Approaches : NegEx, PyConTextNLP and SynNeg

Author: Dalianis Hercules
Duneld Martin
Kvist Maria
Skeppstedt Maria
Tanushi Hideyuki
Velupillai Sumithra
Publication venue: Linköping : Linköping University Electronic Press
Publication date: 01/01/2013
Field of study

Negation detection is a key component in clinical information extraction systems, as health record text contains reasonings in which the physician excludes different diagnoses by negating them. Many systems for negation detection rely on negation cues (e.g. not), but only few studies have investigated if the syntactic structure of the sentences can be used for determining the scope of these cues. We have in this paper compared three different systems for negation detection in Swedish clinical text (NegEx, PyConTextNLP and SynNeg), which have different approaches for determining the scope of negation cues. NegEx uses the distance between the cue and the disease, PyConTextNLP relies on a list of conjunctions limiting the scope of a cue, and in SynNeg the boundaries of the sentence units, provided by a syntactic parser, limit the scope of the cues. The three systems produced similar results, detecting negation with an F-score of around 80%, but using a parser had advantages when handling longer, complex sentences or short sentences with contradictory statements

Publikationer från Stockholms universitet

Digitala Vetenskapliga Arkivet - Academic Archive On-line