12 research outputs found
Detecting Autism by Analyzing a Simulated Social Interaction
Drimalla H, Landwehr N, Baskow I, et al. Detecting Autism by Analyzing a Simulated Social Interaction. In: Berlingerio M, Bonchi F, GĂ€rtner T, Hurley N, Ifrim G, eds. Machine Learning and Knowledge Discovery in Databases. European Conference, ECML PKDD 2018, Dublin, Ireland, September 10â14, 2018, Proceedings, Part I. Lecture Notes in Computer Science. Vol 11051. Cham: Springer International Publishing; 2019: 193-208
An Intelligent-Detection Network for Handwritten Mathematical Expression Recognition
The use of artificial intelligence technology in education is growing
rapidly, with increasing attention being paid to handwritten mathematical
expression recognition (HMER) by researchers. However, many existing methods
for HMER may fail to accurately read formulas with complex structures, as the
attention results can be inaccurate due to illegible handwriting or large
variations in writing styles. Our proposed Intelligent-Detection Network (IDN)
for HMER differs from traditional encoder-decoder methods by utilizing object
detection techniques. Specifically, we have developed an enhanced YOLOv7
network that can accurately detect both digital and symbolic objects. The
detection results are then integrated into the bidirectional gated recurrent
unit (BiGRU) and the baseline symbol relationship tree (BSRT) to determine the
relationships between symbols and numbers. The experiments demonstrate that the
proposed method outperforms those encoder-decoder networks in recognizing
complex handwritten mathematical expressions. This is due to the precise
detection of symbols and numbers. Our research has the potential to make
valuable contributions to the field of HMER. This could be applied in various
practical scenarios, such as assignment grading in schools and information
entry of paper documents.Comment: 6 pages, 5figures, 31st International Conference on Computers in
Educatio
XplainLLM: A QA Explanation Dataset for Understanding LLM Decision-Making
Large Language Models (LLMs) have recently made impressive strides in natural
language understanding tasks. Despite their remarkable performance,
understanding their decision-making process remains a big challenge. In this
paper, we look into bringing some transparency to this process by introducing a
new explanation dataset for question answering (QA) tasks that integrates
knowledge graphs (KGs) in a novel way. Our dataset includes 12,102
question-answer-explanation (QAE) triples. Each explanation in the dataset
links the LLM's reasoning to entities and relations in the KGs. The explanation
component includes a why-choose explanation, a why-not-choose explanation, and
a set of reason-elements that underlie the LLM's decision. We leverage KGs and
graph attention networks (GAT) to find the reason-elements and transform them
into why-choose and why-not-choose explanations that are comprehensible to
humans. Through quantitative and qualitative evaluations, we demonstrate the
potential of our dataset to improve the in-context learning of LLMs, and
enhance their interpretability and explainability. Our work contributes to the
field of explainable AI by enabling a deeper understanding of the LLMs
decision-making process to make them more transparent and thereby, potentially
more reliable, to researchers and practitioners alike. Our dataset is available
at: https://github.com/chen-zichen/XplainLLM_dataset.gitComment: 17 pages, 6 figures, 7 tables. Our dataset is available at:
https://github.com/chen-zichen/XplainLLM_dataset.gi
The British Geological Survey Rock Classification Scheme, its representation as linked data, and a comparison with some other lithology vocabularies
Controlled vocabularies are critical to constructing FAIR (findable, accessible, interoperable, re-useable) data. One of the most widely required, yet complex, vocabularies in earth science is for rock and sediment type, or âlithologyâ. Since 1999 the British Geological Survey has used its own Rock Classification Scheme in many of its workflows and products including the national digital geological map. This scheme pre-dates others that have been published, and is deeply embedded in BGSâ processes. By publishing this classification scheme now as a Simple Knowledge Organisation System (SKOS) machine-readable informal ontology, we make it available for ourselves and third parties to use in modern semantic applications, and we open the future possibility of using the tools SKOS provides to align our scheme with other published schemes. These include the IUGS-CGI Simple Lithology Scheme, the European Commission INSPIRE Lithology Code List, the Queensland Geological Survey Lithotype Scheme, the USGS Lithologic Classification of Geologic Map Units, and Mindat.org. The BGS lithology classification was initially based on four narrative reports that can be downloaded from the BGS website, although it has been added to subsequently. The classification is almost entirely mono-hierarchical in nature and includes 3454 currently valid concepts in a classification 11 levels deep. It includes igneous rocks and sediments, metamorphic rocks, sediments and sedimentary rocks, and superficial deposits including anthropogenic deposits. The SKOS informal ontology built on it is stored in a triplestore and the triples are updated nightly by extracting from a relational database where the ontology is maintained. Bulk downloads and version history are available on github. The RCS concepts themselves are used in other BGS linked data, namely the Lexicon of Named Rock Units and the linked data representation of the 1:625 000 scale geological map of the UK. Comparing the RCS with the other published lithology schemes, all are broadly similar but show characteristics that reveal the interests and requirements of the groups that developed them, in terms of their level of detail both overall and in constituent parts. It should be possible to align the RCS with the other classifications, and future work will focus on automated mechanisms to do this, and possibly on constructing a formal ontology for the RCS
On the Inherent Privacy Properties of Discrete Denoising Diffusion Models
Privacy concerns have led to a surge in the creation of synthetic datasets,
with diffusion models emerging as a promising avenue. Although prior studies
have performed empirical evaluations on these models, there has been a gap in
providing a mathematical characterization of their privacy-preserving
capabilities. To address this, we present the pioneering theoretical
exploration of the privacy preservation inherent in discrete diffusion models
(DDMs) for discrete dataset generation. Focusing on per-instance differential
privacy (pDP), our framework elucidates the potential privacy leakage for each
data point in a given training dataset, offering insights into data
preprocessing to reduce privacy risks of the synthetic dataset generation via
DDMs. Our bounds also show that training with -sized data points leads to a
surge in privacy leakage from -pDP to -pDP during the transition from the pure
noise to the synthetic clean data phase, and a faster decay in diffusion
coefficients amplifies the privacy guarantee. Finally, we empirically verify
our theoretical findings on both synthetic and real-world datasets
Founder Success in Norwegian Startups: A Machine Learning Approach : A study on the use of machine learning and personality traits to predict startup performance from a pre-seed perspective
This thesis aims to investigate founder characteristics in the Norwegian startup ecosystem and if
machine learning can help venture capital firm identity successful founders at a startupâs earliest
stages, when information is greatly limited. The authors collected and refined data from multiple
sources, resulting in a unique dataset of 1918 tech-driven, scalable startups and 2700 unique
founders. Especially outstanding in the dataset is the inclusion of personality traits estimated
though the use of artificial intelligence.
Four supervised machine learning models were employed to classify the founders into two created
success categories, low success, and high success. The two tree-based methods, Extreme Gradient
Boosting and Random Forest performed best considering the evaluation metrics, resulting in a
classification accuracy of over 62%, while Logistic Regression and K-Nearest Neighbours did not
follow far behind. The thesis finds significant evidence that the Number of Founders of a company
and the personality trait Conscientiousness are strong predictors of success in the Norwegian
startup landscape. Both of our findings showcase a positive correlation with startup performance,
meaning entrepreneurs who inherits high Conscientiousness and are part of founding teams are
more likely to succeed as entrepreneurs in Norway.
The research has two use cases. One, to narrow the research gap on founders in Norwegian
startups, and two, motivate venture capital firms in Norway to adapt and implement machine
learning models to help with decision-making, despite the challenges of limited data. The authors
encourage others to continue research on this area, such as investigating the validity of personality
traits obtained through artificial intelligence and broadening and expanding the research to other
companies in Norway and other Scandinavian countries.
The thesis recognizes the potential ethical considerations that arise when collecting public data on
private individuals. The weaknesses of this research are also discussed, which include the chosen
data structure and biases in the data.nhhma