1,478 research outputs found

    Combining chromosomal arm status and significantly aberrant genomic locations reveals new cancer subtypes

    Get PDF
    Many types of tumors exhibit chromosomal losses or gains, as well as local amplifications and deletions. Within any given tumor type, sample specific amplifications and deletionsare also observed. Typically, a region that is aberrant in more tumors,or whose copy number change is stronger, would be considered as a more promising candidate to be biologically relevant to cancer. We sought for an intuitive method to define such aberrations and prioritize them. We define V, the volume associated with an aberration, as the product of three factors: a. fraction of patients with the aberration, b. the aberrations length and c. its amplitude. Our algorithm compares the values of V derived from real data to a null distribution obtained by permutations, and yields the statistical significance, p value, of the measured value of V. We detected genetic locations that were significantly aberrant and combined them with chromosomal arm status to create a succint fingerprint of the tumor genome. This genomic fingerprint is used to visualize the tumors, highlighting events that are co ocurring or mutually exclusive. We allpy the method on three different public array CGH datasets of Medulloblastoma and Neuroblastoma, and demonstrate its ability to detect chromosomal regions that were known to be altered in the tested cancer types, as well as to suggest new genomic locations to be tested. We identified a potential new subtype of Medulloblastoma, which is analogous to Neuroblastoma type 1.Comment: 34 pages, 3 figures; to appear in Cancer Informatic

    Assessing learnersā€™ satisfaction in collaborative online courses through a big data approach

    Get PDF
    none4noMonitoring learners' satisfaction (LS) is a vital action for collecting precious information and design valuable online collaborative learning (CL) experiences. Today's CL platforms allow students for performing many online activities, thus generating a huge mass of data that can be processed to provide insights about the level of satisfaction on contents, services, community interactions, and effort. Big Data is a suitable paradigm for real-time processing of large data sets concerning the LS, in the final aim to provide valuable information that may improve the CL experience. Besides, the adoption of Big Data offers the opportunity to implement a non-intrusive and in-process evaluation strategy of online courses that complements the traditional and time-consuming ways to collect feedback (e.g. questionnaires or surveys). Although the application of Big Data in the CL domain is a recent explored research area with limited applications, it may have an important role in the future of online education. By adopting the design science research methodology, this article describes a novel method and approach to analyse individual students' contributions in online learning activities and assess the level of their satisfaction towards the course. A software artefact is also presented, which leverages Learning Analytics in a Big Data context, with the goal to provide in real-time valuable insights that people and systems can use to intervene properly in the program. The contribution of this paper can be of value for both researchers and practitioners: the former can be interested in the approach and method used for LS assessment; the latter can find of interest the system implemented and how it has been tested in a real online course.openElia G.; Solazzo G.; Lorenzo G.; Passiante G.Elia, G.; Solazzo, G.; Lorenzo, G.; Passiante, G

    Analysis and Prediction of Student Performance by Using A Hybrid Optimized BFO-ALO Based Approach: Student Performance Prediction using Hybrid Approach

    Get PDF
    Data mining offers effective solutions for a variety of industries, including education. Research in the subject of education is expanding rapidly because of thebigquantityof student data that can be utilized to uncover valuable learning behavior patterns. This research presents a method for forecasting the academic presentation of students in Portuguese as well as math subjects, and it is describing with the help of  33 attributes. Forecasting the educationalattainment of students is the most popular field of study in the modern period. Previous research has employed a variety of categorization algorithms to forecast student performance. Educational data mining is a topic that needs a lot of research to improve the precision of the classification technique and predict how well students will do in school. In this study, we made a method to predict how well a student will do that uses a mix of optimization techniques. BFO and ALO-based popular optimization techniques were applied to the data set. Python was used to process all the files and conduct a performance comparison analysis. In this study, we compared our model's performance with various existing baseline models and examined the accuracy with which the hybrid algorithm predicted the student data set. To verify the expected classification accuracy, a calculation was performed. The experiment's findings indicate that the BFO-ALO Based hybrid model, which, out of all the methods, with a 94.5 percent success rate, is the preferred choice

    N-gram Based Text Categorization Method for Improved Data Mining

    Get PDF
    Though naĆÆve Bayes text classifiers are widely used because of its simplicity and effectiveness, the techniques for improving performances of these classifiers have been rarely studied. NaĆÆve Bayes classifiers which are widely used for text classification in machine learning are based on the conditional probability of features belonging to a class, which the features are selected by feature selection methods. However, its performance is often imperfect because it does not model text well, and by inappropriate feature selection and some disadvantages of the Naive Bayes itself. Sentiment Classification or Text Classification is the act of taking a set of labeled text documents, learning a correlation between a documentā€™s contents and its corresponding labels and then predicting the labels of a set of unlabeled test documents as best as possible. Text Classification is also sometimes called Text Categorization. Text classification has many applications in natural language processing tasks such as E-mail filtering, Intrusion detection systems, news filtering, prediction of user preferences, and organization of documents. The Naive Bayes model makes strong assumptions about the data: it assumes that words in a document are independent. This assumption is clearly violated in natural language text: there are various types of dependences between words induced by the syntactic, semantic, pragmatic and conversational structure of a text. Also, the particular form of the probabilistic model makes assumptions about the distribution of words in documents that are violated in practice. We address this problem and show that it can be solved by modeling text data differently using N-Grams. N-gram Based Text Categorization is a simple method based on statistical information about the usage of sequences of words. We conducted an experiment to demonstrate that our simple modification is able to improve the performance of Naive Bayes for text classification significantly. Keywords: Data Mining, Text Classification, Text Categorization, NaĆÆve Bayes, N-Grams

    Unlocking Bias Detection: Leveraging Transformer-Based Models for Content Analysis

    Full text link
    Bias detection in text is imperative due to its role in reinforcing negative stereotypes, disseminating misinformation, and influencing decisions. Current language models often fall short in generalizing beyond their training sets. In response, we introduce the Contextualized Bi-Directional Dual Transformer (CBDT) Classifier. This novel architecture utilizes two synergistic transformer networks: the Context Transformer and the Entity Transformer, aiming for enhanced bias detection. Our dataset preparation follows the FAIR principles, ensuring ethical data usage. Through rigorous testing on various datasets, CBDT showcases its ability in distinguishing biased from neutral statements, while also pinpointing exact biased lexemes. Our approach outperforms existing methods, achieving a 2-4\% increase over benchmark performances. This opens avenues for adapting the CBDT model across diverse linguistic and cultural landscapes.Comment: UNDER REVIE

    Predicting Students Performance in Online Education through Deep Learning Model

    Get PDF
    This epidemic has prompted the development of Education 4.0, virtual learning, and the demand to adapt educational practices to meet the needs of younger demographics. A rising epidemic has necessitated the shutdown of campuses where education programs are now being carried out online in educational institutions all over the globe. The report includes a study on the effectiveness and perceptions of students toward digital learning during the pandemic. A Convolutional Neural Network (CNN) and Particle swarm optimization model, which forecasts the studentā€™s learning rates, are used to tackle this issue. This study will categorize student performance into low, medium, and high grades to forecast student achievement. The Kaggle studentā€™s performance assessment database is utilized to gather the student information logs, which are then pre-processed to eliminate noise and redundant data. The CNN derives features based on the studentā€™s attention and arbitrary patterns sequencing by examining the pre-processed information. Then, utilizing the Minimum Redundancy Maximum Relevance (mRMR) approach, the retrieved characteristics are evaluated. The lowest one that treats each characteristic individually is chosen as the greatest feature by mRMR. CNN uses stochastic Gradient Descent (SGD) to calculate the characteristic weights, which are then modified for improved extracting features. Finally, the CNN-WOA method forecasts the final academic achievement forecast outcome. Studies revealed that the suggested approach outperforms existing ones in terms of accuracy, precision, recall, and F-score while requiring less computing time

    SOCIALQ&A: A NOVEL APPROACH TO NOTIFIYING THE CORRECT USERS IN QUESTION AND ANSWERING SYSTEMS

    Get PDF
    Question and Answering (Q&A) systems are currently in use by a large number of Internet users. Q&A systems play a vital role in our daily life as an important platform for information and knowledge sharing. Hence, much research has been devoted to improving the performance of Q&A systems, with a focus on improving the quality of answers provided by users, reducing the wait time for users who ask questions, using a knowledge base to provide answers via text mining, and directing questions to appropriate users. Due to the growing popularity of Q&A systems, the number of questions in the system can become very large; thus, it is unlikely for an answer provider to simply stumble upon a question that he/she can answer properly. The primary objective of this research is to improve the quality of answers and to decrease wait times by forwarding questions to users who exhibit an interest or expertise in the area to which the question belongs. To that end, this research studies how to leverage social networks to enhance the performance of Q&A systems. We have proposed SocialQ&A, a social network based Q&A system that identifies and notifies the users who are most likely to answer a question. SocialQ&A incorporates three major components: User Interest Analyzer, Question Categorizer, and Question- User Mapper. The User Interest Analyzer associates each user with a vector of interest categories. The Question Categorizer algorithm associates a vector of interest categories to each question. Then, based on user interest and user social connectedness, the Question-User Mapper identifies a list of potential answer providers for each question. We have also implemented a real-world prototype for SocialQ&A and analyzed the data from questions/answers obtained from the prototype. Results suggest that social networks can be leveraged to improve the quality of answers and reduce the wait time for answers. Thus, this research provides a promising direction to improve the performance of Q&A systems

    Predictive Analysis of Studentsā€™ Learning Performance Using Data Mining Techniques: A Comparative Study of Feature Selection Methods

    Get PDF
    The utilization of data mining techniques for the prompt prediction of academic success has gained significant importance in the current era. There is an increasing interest in utilizing these methodologies to forecast the academic performance of students, thereby facilitating educators to intervene and furnish suitable assistance when required. The purpose of this study was to determine the optimal methods for feature engineering and selection in the context of regression and classification tasks. This study compared the Boruta algorithm and Lasso regression for regression, and Recursive Feature Elimination (RFE) and Random Forest Importance (RFI) for classification. According to the findings, Gradient Boost for the regression part of this study had the least Mean Absolute Error (MAE) and Root-Mean-Square Error (RMSE) of 12.93 and 18.28, respectively, in the case of the Boruta selection method. In contrast, RFI was found to be the superior classification method, yielding an accuracy rate of 78% in the classification part. This research emphasized the significance of employing appropriate feature engineering and selection methodologies to enhance the efficacy of machine learning algorithms. Using a diverse set of machine learning techniques, this study analyzed the OULA dataset, focusing on both feature engineering and selection. Our approach was to systematically compare the performance of different models, leading to insights about the most effective strategies for predicting student success
    • ā€¦
    corecore