63 research outputs found
A new unsupervised feature selection method for text clustering based on genetic algorithms
Nowadays a vast amount of textual information is collected and stored in various databases around the world, including the Internet as the largest database of all. This rapidly increasing growth of published text means that even the most avid reader cannot hope to keep up with all the reading in a field and consequently the nuggets of insight or new knowledge are at risk of languishing undiscovered in the literature. Text mining offers a solution to this problem by replacing or supplementing the human reader with automatic systems undeterred by the text explosion. It involves analyzing a large collection of documents to discover previously unknown information. Text clustering is one of the most important areas in text mining, which includes text preprocessing, dimension reduction by selecting some terms (features) and finally clustering using selected terms. Feature selection appears to be the most important step in the process. Conventional unsupervised feature selection methods define a measure of the discriminating power of terms to select proper terms from corpus. However up to now the valuation of terms in groups has not been investigated in reported works. In this paper a new and robust unsupervised feature selection approach is proposed that evaluates terms in groups. In addition a new Modified Term Variance measuring method is proposed for evaluating groups of terms. Furthermore a genetic based algorithm is designed and implemented for finding the most valuable groups of terms based on the new measure. These terms then will be utilized to generate the final feature vector for the clustering process . In order to evaluate and justify our approach the proposed method and also a conventional term variance method are implemented and tested using corpus collection Reuters-21578. For a more accurate comparison, methods have been tested on three corpuses and for each corpus clustering task has been done ten times and results are averaged. Results of comparing these two methods are very promising and show that our method produces better average accuracy and F1-measure than the conventional term variance method
A Metadata Schema for the Description ofLanguage Resources (LRs)
This paper presents the metadata schema for describing language resources (LRs) currently under development for the needs of META-SHARE, an open distributed facility for the exchange and sharing of LRs. An essential ingredient in its setup is the existence of formal and standardized LR descriptions, cornerstone of the interoperability layer of any such initiative. The description of LRs is granular and abstractive, combining the taxonomy of LRs with an inventory of a structured set of descriptive elements, of which only a minimal subset is obligatory; the schema additionally proposes recommended and optional elements. Moreover, the schema includes a set of relations catering for the appropriate inter-linking of resources. The current paper presents the main principles and features of the metadata schema, focusing on the description of text corpora and lexical / conceptual resources
Recommended from our members
Hybrid teaching intelligence: Lessons learned from an embodied mathematics learning experience
As AI increasingly enters classrooms, educational designers have begun investigating students' learning processes vis-à-vis simultaneous feedback from active sources—AI and the teacher. Nevertheless, there is a need to delve into a more comprehensive understanding of the orchestration of interactions between teachers and AI systems in educational settings. The research objective of this paper is to identify the challenges and opportunities when AI intertwines with instruction and examine how this hybrid teaching intelligence is being perceived by the students. The insights of this paper are extracted by analysing a case study that utilizes an AI-driven system (MOVES-NL) in the context of learning integer arithmetic. MOVES-NL is an advanced interactive tool that deploys whole-body movement and immediate formative feedback in a room-scale environment designed to enhance students' learning of integer arithmetic. In this paper, we present an in-situ study where 29 students in grades 6–8 interacted individually with MOVES-NL for approximately 1 hour each with the support of a facilitator/instructor. Mixed-methods analyses of multimodal data sources enabled a systematic multifaceted account of students' cognitive–affective experiences as they engaged with MOVES-NL while receiving human support (eg, by asking students to elaborate on their digital actions/decisions). Finally, we propose design insights for instructional and technology design in support of student hybrid learning. The findings of this research contribute to the ongoing discourse on the role of hybrid intelligence in supporting education by offering practical insights and recommendations for educators and designers seeking to optimize the integration of technology in classrooms. Practitioner notes What is already known about this topic Students and teachers develop different relations with and through AI, beyond just interacting with it. AI can support and augment the teachers' capabilities. Hybrid intelligence (HI) has already demonstrated promising potential to advance current educational theories and practices. What this paper adds This research identifies the important learning opportunities and adversities emerging when AI intertwines with instruction and examines how learners perceive those moments. The results show that the system and the facilitator's feedback were complementary to the success of the learning experience. AI-enabled students to reflect upon and test their previous knowledge and guided teachers to work with students to consolidate challenging topics. Findings provide insights into how the teacher–AI collaboration could engage and motivate students to reflect conceptually upon mathematical rules. Implications for practice and/or policy This study encourages practitioners and scholars to consider hybrid teaching intelligence when designing student-centred AI learning tools, focusing on supporting the development of effective teacher–AI collaborative technologies
Multimodal Machine Learning Prediction of 12‐Month Suicide Attempts in Bipolar Disorder
Introduction: Bipolar disorder (BD) patients present an increased risk of suicide attempts. Most current machine learning (ML) studies predicting suicide attempts are cross-sectional, do not employ time-dependent variables, and do not assess more than one modality. Therefore, we aimed to predict 12-month suicide attempts in a sample of BD patients, using clinical and brain imaging data. Methods: A sample of 163 BD patients were recruited and followed up for 12 months. Gray matter volumes and cortical thickness were extracted from the T1-weighted images. Based on previous literature, we extracted 56 clinical and demographic features from digital health records. Support Vector Machine was used to differentiate BD subjects who attempted suicide. First, we explored single modality prediction (clinical features, GM, and thickness). Second, we implemented a multimodal stacking-based data fusion framework. Results: During the 12 months, 6.13% of patients attempted suicide. The unimodal classifier based on clinical data reached an area under the curve (AUC) of 0.83 and balanced accuracy (BAC) of 72.7%. The model based on GM reached an AUC of 0.86 and BAC of 76.4%. The multimodal classifier (clinical + GM) reached an AUC of 0.88 and BAC of 83.4%, significantly increasing the sensitivity. The most important features were related to suicide attempts history, medications, comorbidities, and depressive polarity. In the GM model, the most relevant features mapped in the frontal, temporal, and cerebellar regions. Conclusions: By combining models, we increased the detection of suicide attempts, reaching a sensitivity of 80%. Combining more than one modality proved a valid method to overcome limitations from single-modality models and increasing overall accuracy
Responsible Guidelines for Authorship Attribution Tasks in NLP
Authorship Attribution (AA) approaches in Natural Language Processing (NLP) are important in various domains, including forensic analysis and cybercrime. However, they pose Ethical, Legal, and Societal Implications/Aspects (ELSI/ELSA) challenges that remain underexplored. Inspired by foundational AI ethics guidelines and frameworks, this research introduces a comprehensive framework of responsible guidelines that focuses on AA tasks in NLP, which are tailored to different stakeholders and development phases. These guidelines are structured around four core principles: privacy and data protection, fairness and non-discrimination, transparency and explainability, and societal impact. Furthermore, to illustrate a practical application of our guidelines, we apply them to a recent AA study that targets identifying and linking potential human trafficking vendors. We believe the proposed guidelines can assist researchers and practitioners in justifying their decisions, assisting ethical committees in promoting responsible practices, and identifying ethical concerns related to NLP-based AA approaches. Our study aims to contribute to ensuring the responsible development and deployment of AA tools
Emotion Analysis and Dialogue Breakdown Detection in Dialogue of Chat Systems Based on Deep Neural Networks
In dialogues between robots or computers and humans, dialogue breakdown analysis is an important tool for achieving better chat dialogues. Conventional dialogue breakdown detection methods focus on semantic variance. Although these methods can detect dialogue breakdowns based on semantic gaps, they cannot always detect emotional breakdowns in dialogues. In chat dialogue systems, emotions are sometimes included in the utterances of the system when responding to the speaker. In this study, we detect emotions from utterances, analyze emotional changes, and use them as the dialogue breakdown feature. The proposed method estimates emotions by utterance unit and generates features by calculating the similarity of the emotions of the utterance and the emotions that have appeared in prior utterances. We employ deep neural networks using sentence distributed representation vectors as the feature. In an evaluation of experimental results, the proposed method achieved a higher dialogue breakdown detection rate when compared to the method using a sentence distributed representation vectors
The "curious case of contexts” in retrieval-augmented generation with a combination of labelled and unlabelled data
With the growing reliance on LLMs for a wide range of NLP tasks, optimizing the use of labeled and unlabeled data for effective context generation has become critical. This work explores the interplay between two prominent methodologies in few-shot learning: in-context learning (ICL), which utilizes labeled task-specific data, and retrieval-augmented generation (RAG), which leverages unlabeled external knowledge to augment generative models. Since each has its individual limitations, we propose a novel hybrid approach to obtain “the best of both worlds” by dynamically integrating both labeled and unlabeled data towards improving the downstream performance of LLMs. Our methodology, which we call LU-RAG (labeled and unlabeled RAG), recomputes the scores of top-k labeled instances and top-m unlabeled passages to refine context selection. Our experimental results demonstrate that LU-RAG consistently outperforms both standalone ICL and RAG across multiple benchmarks, showing significant gains in downstream performance. Furthermore, we show that LU-RAG performs better with a semantic neighborhood as compared to a lexical one, highlighting its ability to generalize effectively
Artificial intelligence tools for engagement prediction in neuromotor disorder patients during rehabilitation
BackgroundRobot-Assisted Gait Rehabilitation (RAGR) is an established clinical practice to encourage neuroplasticity in patients with neuromotor disorders. Nevertheless, tasks repetition imposed by robots may induce boredom, affecting clinical outcomes. Thus, quantitative assessment of engagement towards rehabilitation using physiological data and subjective evaluations is increasingly becoming vital.This study aimed at methodologically exploring the performance of artificial intelligence (AI) algorithms applied to structured datasets made of heart rate variability (HRV) and electrodermal activity (EDA) features to predict the level of patient engagement during RAGR.BackgroundRobot-Assisted Gait Rehabilitation (RAGR) is an established clinical practice to encourage neuroplasticity in patients with neuromotor disorders. Nevertheless, tasks repetition imposed by robots may induce boredom, affecting clinical outcomes. Thus, quantitative assessment of engagement towards rehabilitation using physiological data and subjective evaluations is increasingly becoming vital.This study aimed at methodologically exploring the performance of artificial intelligence (AI) algorithms applied to structured datasets made of heart rate variability (HRV) and electrodermal activity (EDA) features to predict the level of patient engagement during RAGR.MethodsThe study recruited 46 subjects (38 underage, 10.3 +/- 4.0 years old; 8 adults, 43.0 +/- 19.0 years old) with neuromotor impairments, who underwent 15 to 20 RAGR sessions with Lokomat. During 2 or 3 of these sessions, ad hoc questionnaires were administered to both patients and therapists to investigate their perception of a patient's engagement state. Their outcomes were used to build two engagement classification targets: self-perceived and therapist-perceived, both composed of three levels: "Underchallenged", "Minimally Challenged", and "Challenged". Patient's HRV and EDA physiological signals were processed from raw data collected with the Empatica E4 wristband, and 33 features were extracted from the conditioned signals. Performance outcomes of five different AI classifiers were compared for both classification targets. Nested k-fold cross-validation was used to deal with model selection and optimization. Finally, the effects on classifiers performance of three dataset preparation techniques, such as unimodal or bimodal approach, feature reduction, and data augmentation, were also tested.ResultsThe study found that combining HRV and EDA features into a comprehensive dataset improved the synergistic representation of engagement compared to unimodal datasets. Additionally, feature reduction did not yield any advantages, while data augmentation consistently enhanced classifiers performance. Support Vector Machine and Extreme Gradient Boosting models were found to be the most effective architectures for predicting self-perceived engagement and therapist-perceived engagement, with a macro-averaged F1 score of 95.6% and 95.4%, respectively.ConclusionThe study displayed the effectiveness of psychophysiology-based AI models in predicting rehabilitation engagement, thus promoting their practical application for personalized care and improved clinical health outcomes
- …
