Search CORE

12 research outputs found

Detecting Autism by Analyzing a Simulated Social Interaction

Author: Baskow Irina
Behnia Behnoush
Berlingerio Michele
Bonchi Francesco
Drimalla Hanna
Dziobek Isabel
Gärtner Thomas
Hurley Neil
Ifrim Georgiana
Landwehr Niels
Roepke Stefan
Scheffer Tobias
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

Drimalla H, Landwehr N, Baskow I, et al. Detecting Autism by Analyzing a Simulated Social Interaction. In: Berlingerio M, Bonchi F, Gärtner T, Hurley N, Ifrim G, eds. Machine Learning and Knowledge Discovery in Databases. European Conference, ECML PKDD 2018, Dublin, Ireland, September 10–14, 2018, Proceedings, Part I. Lecture Notes in Computer Science. Vol 11051. Cham: Springer International Publishing; 2019: 193-208

Publications at Bielefeld University

An Intelligent-Detection Network for Handwritten Mathematical Expression Recognition

Author: Ye Ziqi
Publication venue
Publication date: 26/11/2023
Field of study

The use of artificial intelligence technology in education is growing rapidly, with increasing attention being paid to handwritten mathematical expression recognition (HMER) by researchers. However, many existing methods for HMER may fail to accurately read formulas with complex structures, as the attention results can be inaccurate due to illegible handwriting or large variations in writing styles. Our proposed Intelligent-Detection Network (IDN) for HMER differs from traditional encoder-decoder methods by utilizing object detection techniques. Specifically, we have developed an enhanced YOLOv7 network that can accurately detect both digital and symbolic objects. The detection results are then integrated into the bidirectional gated recurrent unit (BiGRU) and the baseline symbol relationship tree (BSRT) to determine the relationships between symbols and numbers. The experiments demonstrate that the proposed method outperforms those encoder-decoder networks in recognizing complex handwritten mathematical expressions. This is due to the precise detection of symbols and numbers. Our research has the potential to make valuable contributions to the field of HMER. This could be applied in various practical scenarios, such as assignment grading in schools and information entry of paper documents.Comment: 6 pages, 5figures, 31st International Conference on Computers in Educatio

arXiv.org e-Print Archive

Ordinal Label Proportions

Author: BY Sun
C Bishop
E Frank
JC Huhn
N Cristianini
N Quadrianto
P McCullagh
PA Gutiérrez
R Santos-Rodríguez
TG Dietterich
W Chu
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

Crossref

Ghent University Academic Bibliography

Explore Bristol Research

Machine learning and knowledge discovery in databases: European conference, ECML PKDD 2018, Dublin, Ireland, September 10-14, 2018, proceedings, part I

Author: Berlingerio Michele
Bonchi Francesco
Gärtner Thomas
Hurley Neil
Ifrim Georgiana
Publication venue: Springer International Publishing AG
Publication date: 01/01/2019
Field of study

CERN Document Server

XplainLLM: A QA Explanation Dataset for Understanding LLM Decision-Making

Author: Chen Jianda
Chen Zichen
Gaidhani Mitali
Singh Ambuj
Sra Misha
Publication venue
Publication date: 14/11/2023
Field of study

Large Language Models (LLMs) have recently made impressive strides in natural language understanding tasks. Despite their remarkable performance, understanding their decision-making process remains a big challenge. In this paper, we look into bringing some transparency to this process by introducing a new explanation dataset for question answering (QA) tasks that integrates knowledge graphs (KGs) in a novel way. Our dataset includes 12,102 question-answer-explanation (QAE) triples. Each explanation in the dataset links the LLM's reasoning to entities and relations in the KGs. The explanation component includes a why-choose explanation, a why-not-choose explanation, and a set of reason-elements that underlie the LLM's decision. We leverage KGs and graph attention networks (GAT) to find the reason-elements and transform them into why-choose and why-not-choose explanations that are comprehensible to humans. Through quantitative and qualitative evaluations, we demonstrate the potential of our dataset to improve the in-context learning of LLMs, and enhance their interpretability and explainability. Our work contributes to the field of explainable AI by enabling a deeper understanding of the LLMs decision-making process to make them more transparent and thereby, potentially more reliable, to researchers and practitioners alike. Our dataset is available at: https://github.com/chen-zichen/XplainLLM_dataset.gitComment: 17 pages, 6 figures, 7 tables. Our dataset is available at: https://github.com/chen-zichen/XplainLLM_dataset.gi

arXiv.org e-Print Archive

The British Geological Survey Rock Classification Scheme, its representation as linked data, and a comparison with some other lithology vocabularies

Author: Heaven Rachel E.
McCormick Tim
Publication venue: Elsevier
Publication date: 01/12/2023
Field of study

Controlled vocabularies are critical to constructing FAIR (findable, accessible, interoperable, re-useable) data. One of the most widely required, yet complex, vocabularies in earth science is for rock and sediment type, or ‘lithology’. Since 1999 the British Geological Survey has used its own Rock Classification Scheme in many of its workflows and products including the national digital geological map. This scheme pre-dates others that have been published, and is deeply embedded in BGS’ processes. By publishing this classification scheme now as a Simple Knowledge Organisation System (SKOS) machine-readable informal ontology, we make it available for ourselves and third parties to use in modern semantic applications, and we open the future possibility of using the tools SKOS provides to align our scheme with other published schemes. These include the IUGS-CGI Simple Lithology Scheme, the European Commission INSPIRE Lithology Code List, the Queensland Geological Survey Lithotype Scheme, the USGS Lithologic Classification of Geologic Map Units, and Mindat.org. The BGS lithology classification was initially based on four narrative reports that can be downloaded from the BGS website, although it has been added to subsequently. The classification is almost entirely mono-hierarchical in nature and includes 3454 currently valid concepts in a classification 11 levels deep. It includes igneous rocks and sediments, metamorphic rocks, sediments and sedimentary rocks, and superficial deposits including anthropogenic deposits. The SKOS informal ontology built on it is stored in a triplestore and the triples are updated nightly by extracting from a relational database where the ontology is maintained. Bulk downloads and version history are available on github. The RCS concepts themselves are used in other BGS linked data, namely the Lexicon of Named Rock Units and the linked data representation of the 1:625 000 scale geological map of the UK. Comparing the RCS with the other published lithology schemes, all are broadly similar but show characteristics that reveal the interests and requirements of the groups that developed them, in terms of their level of detail both overall and in constituent parts. It should be possible to align the RCS with the other classifications, and future work will focus on automated mechanisms to do this, and possibly on constructing a formal ontology for the RCS

Directory of Open Access Journals

NERC Open Research Archive

On the Inherent Privacy Properties of Discrete Denoising Diffusion Models

Author: Chien Eli
Kreačić Eleonora
Li Pan
Potluru Vamsi K.
Wang Haoyu
Wei Rongzhe
Yin Haoteng
Publication venue
Publication date: 24/10/2023
Field of study

Privacy concerns have led to a surge in the creation of synthetic datasets, with diffusion models emerging as a promising avenue. Although prior studies have performed empirical evaluations on these models, there has been a gap in providing a mathematical characterization of their privacy-preserving capabilities. To address this, we present the pioneering theoretical exploration of the privacy preservation inherent in discrete diffusion models (DDMs) for discrete dataset generation. Focusing on per-instance differential privacy (pDP), our framework elucidates the potential privacy leakage for each data point in a given training dataset, offering insights into data preprocessing to reduce privacy risks of the synthetic dataset generation via DDMs. Our bounds also show that training with

s

-sized data points leads to a surge in privacy leakage from

(\epsilon, \mathcal{O}(\frac{1}{s^2\epsilon}))

-pDP to

(\epsilon, \mathcal{O}(\frac{1}{s\epsilon}))

-pDP during the transition from the pure noise to the synthetic clean data phase, and a faster decay in diffusion coefficients amplifies the privacy guarantee. Finally, we empirically verify our theoretical findings on both synthetic and real-world datasets

arXiv.org e-Print Archive

Founder Success in Norwegian Startups: A Machine Learning Approach : A study on the use of machine learning and personality traits to predict startup performance from a pre-seed perspective

Author: Otterlei Håkon
Wik Alexander Hogstad
Publication venue
Publication date: 01/01/2023
Field of study

This thesis aims to investigate founder characteristics in the Norwegian startup ecosystem and if machine learning can help venture capital firm identity successful founders at a startup’s earliest stages, when information is greatly limited. The authors collected and refined data from multiple sources, resulting in a unique dataset of 1918 tech-driven, scalable startups and 2700 unique founders. Especially outstanding in the dataset is the inclusion of personality traits estimated though the use of artificial intelligence. Four supervised machine learning models were employed to classify the founders into two created success categories, low success, and high success. The two tree-based methods, Extreme Gradient Boosting and Random Forest performed best considering the evaluation metrics, resulting in a classification accuracy of over 62%, while Logistic Regression and K-Nearest Neighbours did not follow far behind. The thesis finds significant evidence that the Number of Founders of a company and the personality trait Conscientiousness are strong predictors of success in the Norwegian startup landscape. Both of our findings showcase a positive correlation with startup performance, meaning entrepreneurs who inherits high Conscientiousness and are part of founding teams are more likely to succeed as entrepreneurs in Norway. The research has two use cases. One, to narrow the research gap on founders in Norwegian startups, and two, motivate venture capital firms in Norway to adapt and implement machine learning models to help with decision-making, despite the challenges of limited data. The authors encourage others to continue research on this area, such as investigating the validity of personality traits obtained through artificial intelligence and broadening and expanding the research to other companies in Norway and other Scandinavian countries. The thesis recognizes the potential ethical considerations that arise when collecting public data on private individuals. The weaknesses of this research are also discussed, which include the chosen data structure and biases in the data.nhhma

NHH Brage