159 research outputs found

    A Study on Efficacy of Ensamble Methods for Classification Learning

    Get PDF

    WISER: A Semantic Approach for Expert Finding in Academia based on Entity Linking

    Full text link
    We present WISER, a new semantic search engine for expert finding in academia. Our system is unsupervised and it jointly combines classical language modeling techniques, based on text evidences, with the Wikipedia Knowledge Graph, via entity linking. WISER indexes each academic author through a novel profiling technique which models her expertise with a small, labeled and weighted graph drawn from Wikipedia. Nodes in this graph are the Wikipedia entities mentioned in the author's publications, whereas the weighted edges express the semantic relatedness among these entities computed via textual and graph-based relatedness functions. Every node is also labeled with a relevance score which models the pertinence of the corresponding entity to author's expertise, and is computed by means of a proper random-walk calculation over that graph; and with a latent vector representation which is learned via entity and other kinds of structural embeddings derived from Wikipedia. At query time, experts are retrieved by combining classic document-centric approaches, which exploit the occurrences of query terms in the author's documents, with a novel set of profile-centric scoring strategies, which compute the semantic relatedness between the author's expertise and the query topic via the above graph-based profiles. The effectiveness of our system is established over a large-scale experimental test on a standard dataset for this task. We show that WISER achieves better performance than all the other competitors, thus proving the effectiveness of modelling author's profile via our "semantic" graph of entities. Finally, we comment on the use of WISER for indexing and profiling the whole research community within the University of Pisa, and its application to technology transfer in our University

    Text Classification with Imperfect Hierarchical Structure Knowledge

    Get PDF
    Many real world classification problems involve classes organized in a hierarchical tree-like structure. However in many cases the hierarchical structure is ignored and each class is treated in isolation or in other words the class structure is flattened (Dumais and Chen, 2000). In this paper, we propose a new approach of incorporating hierarchical structure knowledge by cascading it as an additional feature for Child level classifier. We posit that our cascading model will outperform the baseline “flat” model. Our empirical experiment provides strong evidences supporting our proposal. Interestingly, even imperfect hierarchical structure knowledge would also improve classification performance

    Enhanced Grey Wolf Optimization based Hyper-parameter optimized Convolution Neural Network for Kidney Image Classification

    Get PDF
    Over the last few years, Convolution Neural Networks (CNN) have shown dominant performance over real world applications due to their ability to find good solutions and deal with image data. However their performance is highly dependent on the network architecture and methods for optimizing their hyper parameters especially number and size of filters. Designing a good CNN architecture requires human expertise and domain knowledge. So, it is difficult in CNN to find sufficient number and size of filters for classification problems. The standard GWO algorithm used for any optimization purpose suffers from some issues such as slow convergence speed, trapping in local minima and unable to maintain balance between exploration and exploitation. In order to have proper balance between these phases, two modifications in GWO are introduced in this paper. A technique for finding optimum CNN architecture using methods based on Enhanced Grey Wolf Optimization (E-GWO) is proposed. The paper presents optimization of hyper parameters (numbers and size of filters in convolution layer) of CNN using E-GWO to improve the performance of the model. Kidney ultrasound images dataset collected from ultrasound centre is used to evaluate the performance of the proposed algorithm. Experimental results showed that optimization of CNN with E-GWO outperformed CNN optimized with traditional GA, PSO and GWO and conventional CNN yielding 97.01% accuracy. At last, the obtained results are statistically validated using t-test

    Transform Diabetes - Harnessing Transformer-Based Machine Learning and Layered Ensemble with Enhanced Training for Improved Glucose Prediction.

    Get PDF
    Type 1 diabetes is a common chronic disease characterized by the body’s inability to regulate the blood glucose level, leading to severe health consequences if not handled manually. Accurate blood glucose level predictions can enable better disease management and inform subsequent treatment decisions. However, predicting future blood glucose levels is a complex problem due to the inherent complexity and variability of the human body. This thesis investigates using a Transformer model to outperform a state-of-the-art Convolutional Recurrent Neural Network model by forecasting blood glucose levels on the same dataset. The problem is structured, and the data is preprocessed as a multivariate multi-step time series. A unique Layered Ensemble technique that Enhances the Training of the final model is introduced. This technique manages missing data and counters potential issues from other techniques by employing both a Long Short-Term Memory model and a Transformer model together. The experimental results show that this novel ensemble technique reduces the root mean squared error by approximately 14.28% when predicting the blood glucose level 30 minutes in the future compared to the state-of-the-art model. This improvement highlights the potential of this approach to assist diabetes patients with effective disease management

    Transform Diabetes - Harnessing Transformer-Based Machine Learning and Layered Ensemble with Enhanced Training for Improved Glucose Prediction.

    Get PDF
    Type 1 diabetes is a common chronic disease characterized by the body’s inability to regulate the blood glucose level, leading to severe health consequences if not handled manually. Accurate blood glucose level predictions can enable better disease management and inform subsequent treatment decisions. However, predicting future blood glucose levels is a complex problem due to the inherent complexity and variability of the human body. This thesis investigates using a Transformer model to outperform a state-of-the-art Convolutional Recurrent Neural Network model by forecasting blood glucose levels on the same dataset. The problem is structured, and the data is preprocessed as a multivariate multi-step time series. A unique Layered Ensemble technique that Enhances the Training of the final model is introduced. This technique manages missing data and counters potential issues from other techniques by employing both a Long Short-Term Memory model and a Transformer model together. The experimental results show that this novel ensemble technique reduces the root mean squared error by approximately 14.28% when predicting the blood glucose level 30 minutes in the future compared to the state-of-the-art model. This improvement highlights the potential of this approach to assist diabetes patients with effective disease management

    Data Analytics and Techniques: A Review

    Get PDF
    Big data of different types, such as texts and images, are rapidly generated from the internet and other applications. Dealing with this data using traditional methods is not practical since it is available in various sizes, types, and processing speed requirements. Therefore, data analytics has become an important tool because only meaningful information is analyzed and extracted, which makes it essential for big data applications to analyze and extract useful information. This paper presents several innovative methods that use data analytics techniques to improve the analysis process and data management. Furthermore, this paper discusses how the revolution of data analytics based on artificial intelligence algorithms might provide improvements for many applications. In addition, critical challenges and research issues were provided based on published paper limitations to help researchers distinguish between various analytics techniques to develop highly consistent, logical, and information-rich analyses based on valuable features. Furthermore, the findings of this paper may be used to identify the best methods in each sector used in these publications, assist future researchers in their studies for more systematic and comprehensive analysis and identify areas for developing a unique or hybrid technique for data analysis

    Predicción del rendimiento académico universitario mediante mecanismos de aprendizaje automático y métodos supervisados

    Get PDF
    Context:  In the education sector, variables have been identified which considerably affect students’ academic performance. In the last decade, research has been carried out from various fields such as psychology, statistics, and data analytics in order to predict academic performance. Method: Data analytics, especially through Machine Learning tools, allows predicting academic performance using supervised learning algorithms based on academic, demographic, and sociodemographic variables. In this work, the most influential variables in the course of students’ academic life are selected through wrapping, embedded, filter, and assembler methods, as well as the most important characteristics semester by semester using Machine Learning algorithms (Decision Trees, KNN, SVC, Naive Bayes, LDA), which were implemented using the Python language. Results: The results of the study show that the KNN is the model that best predicts academic performance for each of the semesters, followed by Decision Trees, with precision values that oscillate around 80 and 78,5% in some semesters. Conclusions: Regarding the variables, it cannot be said that a student’s per-semester academic average necessarily influences the prediction of academic performance for the next semester. The analysis of these results indicates that the prediction of academic performance using Machine Learning tools is a promising approach that can help improve students’ academic life allow institutions and teachers to take actions that contribute to the teaching-learning process.considerablemente en el rendimiento académico de los estudiantes. En la última década se han llevado a cabo investigaciones desde diversos campos como la psicología, la estadística y el análisis de datos con el fin de predecir el rendimiento académico. Método: La analítica de datos, especialmente a través de herramientas de Machine Learning, permite predecir el rendimiento académico utilizando algoritmos de aprendizaje supervisado basados ​​en variables académicas, demográficas y sociodemográficas. En este trabajo se seleccionan las variables más influyentes en el transcurso de la vida académica de los estudiantes mediante métodos de filtro, embebidos, y de ensamble, así como las características más importantes semestre a semestre utilizando algoritmos de Machine Learning (árbol de decisión, KNN, SVC, Naive Bayes, LDA), implementados en el lenguaje Python. Resultados: Los resultados del estudio muestran que el KNN es el modelo que mejor predice el rendimiento académico para cada uno de los semestres, seguido de los árboles de decisión, con valores de precisión que oscilan alrededor del 80 y 78,5 % en algunos semestres. Conclusiones: Con respecto a las variables, no se puede decir que el promedio académico semestral de un estudiante influya necesariamente en la predicción del rendimiento académico del siguiente semestre. El análisis de estos resultados indica que la predicción del rendimiento académico utilizando herramientas de Machine Learning es un enfoque promisorio que puede ayudar a mejorar la vida académica de los estudiantes y permitir a las instituciones y a los docentes adoptar acciones que ayuden al proceso de enseñanza-aprendizaje

    SQL injection attack detection in network flow data

    Get PDF
    [EN] SQL injections rank in the OWASP Top 3. The literature shows that analyzing network datagrams allows for detecting or preventing such attacks. Unfortunately, such detection usually implies studying all packets flowing in a computer network. Therefore, routers in charge of routing significant traffic loads usually cannot apply the solutions proposed in the literature. This work demonstrates that detecting SQL injection attacks on flow data from lightweight protocols is possible. For this purpose, we gathered two datasets collecting flow data from several SQL injection attacks on the most popular database engines. After evaluating several machine learning-based algorithms, we get a detection rate of over 97% with a false alarm rate of less than 0.07% with a Logistic Regression-based model.SIInstituto Nacional de Ciberseguridad de España (INCIBE)Universidad de Leó
    corecore