274 research outputs found
Investigating prediction modelling of academic performance for students in rural schools in Kenya
Academic performance prediction modelling provides an opportunity for learners' probable outcomes to be known early, before they sit for final examinations. This would be particularly useful for education stakeholders to initiate intervention measures to help students who require high intervention to pass final examinations. However, limitations of infrastructure in rural areas of developing countries, such as lack of or unstable electricity and Internet, impede the use of PCs. This study proposed that an academic performance prediction model could include a mobile phone interface specifically designed based on users' needs. The proposed mobile academic performance prediction system (MAPPS) could tackle the problem of underperformance and spur development in the rural areas. A six-step Cross-Industry Standard Process for Data Mining (CRISP-DM) theoretical framework was used to support the design of MAPPS. Experiments were conducted using two datasets collected in Kenya. One dataset had 2426 records of student data having 22 features, collected from 54 rural primary schools. The second dataset had 1105 student records with 19 features, collected from 11 peri-urban primary schools. Evaluation was conducted to investigate: (i) which is the best classifier model among the six common classifiers selected for the type of data used in this study; (ii) what is the optimal subset of features from the total number of features for both rural and peri-urban datasets; and (iii) what is the predictive performance of the Mobile Academic Performance Prediction System in classifying the high intervention class. It was found that the system achieved an F-Measure rate of nearly 80% in determining the students who need high intervention two years before the final examination. It was also found that the system was useful and usable in rural environments; the accuracy of prediction was good enough to motivate stakeholders to initiate strategic intervention measures. This study provides experimental evidence that Educational Data Mining (EDM) techniques can be used in the developing world by exploiting the ubiquitous mobile technology for student academic performance prediction
A Novel Malware Target Recognition Architecture for Enhanced Cyberspace Situation Awareness
The rapid transition of critical business processes to computer networks potentially exposes organizations to digital theft or corruption by advanced competitors. One tool used for these tasks is malware, because it circumvents legitimate authentication mechanisms. Malware is an epidemic problem for organizations of all types. This research proposes and evaluates a novel Malware Target Recognition (MaTR) architecture for malware detection and identification of propagation methods and payloads to enhance situation awareness in tactical scenarios using non-instruction-based, static heuristic features. MaTR achieves a 99.92% detection accuracy on known malware with false positive and false negative rates of 8.73e-4 and 8.03e-4 respectively. MaTR outperforms leading static heuristic methods with a statistically significant 1% improvement in detection accuracy and 85% and 94% reductions in false positive and false negative rates respectively. Against a set of publicly unknown malware, MaTR detection accuracy is 98.56%, a 65% performance improvement over the combined effectiveness of three commercial antivirus products
Application of machine learning to agricultural soil data
Agriculture is a major sector in the Indian economy. One key advantage of classification and prediction of soil parameters is to save time of specialized technicians developing expensive chemical analysis. In this context, this PhD thesis has been developed in three stages:
1. Classification for soil data: we used chemical soil measurements to classify many relevant soil parameters: village-wise fertility indices; soil pH and type; soil nutrients, in order to recommend suitable amounts of fertilizers; and preferable crop.
2. Regression for generic data: we developed an experimental comparison of many regressors to a large collection of generic datasets selected from the University of California at Irving (UCI) machine learning repository.
3. Regression for soil data: We applied the regressors used in stage 2 to the soil datasets, developing a direct prediction of their numeric values. The accuracy of the prediction was evaluated for the ten soil problems, as an alternative to the prediction of the quantified values (classification) developed in stage 1
Advanced vibration analysis for the diagnosis and prognosis of rotating machinery components within condition-based maintenance programs
Machines used in the industrial field may deteriorate with usage and age. Thus it is important to maintain them so as to avoid failure
during actual operation which may be dangerous or even disastrous.The literature has focused its attention on the development of optimal
maintenance strategies, such as condition-based maintenance (CBM), in order to improve system reliability, to avoid system failures, and to
decrease maintenance costs. CBM aims to detect the early occurrence and seriousness of a fault, to estimate the time interval during which
the equipment can still operate before failure, and to identify the components which are deteriorating. CBM has been widely and effectively
applied to rotating machines, which usually operate by means of bearings. The reliable and continuous work of bearings is important as the
break of one of them can compromise the work of the system. Thus the monitoring, prognosis and diagnosis of bearings represent crucial and
important tasks to support real-time maintenance programs.
This research has carried out a complete analysis of advanced soft computing techniques ranging from the multi-class classification to one-class classification, and of combination strategies based on classifier fusion and selection. The purpose of this analysis was to design and
develop high accurate and high robust methodologies to perform the detection, diagnosis and prognosis of defects on rolling elements bearings.
We used vibration signals recorded by four accelerometers on a mechanical device including rolling element bearings: the signals were collected both with all faultless bearings and after substituting one faultless bearing with an artificially damaged one. Four defects and three severity levels were considered.
This research has brought to the design and development of new classifiers which have proved to be very accurate and thus to represent
a valuable alternative to the traditional classifiers. Besides, the high accuracy and the high robustness to noise, shown by the obtained results, prove the effectiveness of the proposed methodologies, which can be thus profitably used to perform automatic prognosis and diagnosis of
rotating machinery components within real-time condition-based maintenance programs
Design of Machine Learning Algorithms with Applications to Breast Cancer Detection
Machine learning is concerned with the design and development of algorithms and
techniques that allow computers to 'learn' from experience with respect to some class
of tasks and performance measure. One application of machine learning is to improve
the accuracy and efficiency of computer-aided diagnosis systems to assist physician,
radiologists, cardiologists, neuroscientists, and health-care technologists. This thesis
focuses on machine learning and the applications to breast cancer detection. Emphasis
is laid on preprocessing of features, pattern classification, and model selection.
Before the classification task, feature selection and feature transformation may be
performed to reduce the dimensionality of the features and to improve the classification
performance. Genetic algorithm (GA) can be employed for feature selection based
on different measures of data separability or the estimated risk of a chosen classifier.
A separate nonlinear transformation can be performed by applying kernel principal
component analysis and kernel partial least squares.
Different classifiers are proposed in this work: The SOM-RBF network combines
self-organizing maps (SOMs) and radial basis function (RBF) networks, with the RBF
centers set as the weight vectors of neurons from the competitive layer of a trained
SaM. The pairwise Rayleigh quotient (PRQ) classifier seeks one discriminating boundary
by maximizing an unconstrained optimization objective, named as the PRQ criterion,
formed with a set of pairwise const~aints instead of individual training samples.
The strict 2-surface proximal (S2SP) classifier seeks two proximal planes that are not
necessary parallel to fit the distribution of the samples in the original feature space or
a kernel-defined feature space, by ma-ximizing two strict optimization objectives with
a 'square of sum' optimization factor. Two variations of the support vector data description
(SVDD) with negative samples (NSVDD) are proposed by involving different
forms of slack vectors, which learn a closed spherically shaped boundary, named as the
supervised compact hypersphere (SCH), around a set of samples in the target class. \Ve
extend the NSVDDs to solve the multi-class classification problems based on distances
between the samples and the centers of the learned SCHs in a kernel-defined feature
space, using a combination of linear discriminant analysis and the nearest-neighbor rule.
The problem of model selection is studied to pick the best values of the hyperparameters
for a parametric classifier. To choose the optimal kernel or regularization
parameters of a classifier, we investigate different criteria, such as the validation error
estimate and the leave-out-out bound, as well as different optimization methods, such
as grid search, gradient descent, and GA. By viewing the tuning problem of the multiple
parameters of an 2-norm support vector machine (SVM) as an identification problem
of a nonlinear dynamic system, we design a tuning system by employing the extended
Kalman filter based on cross validation. Independent kernel optimization based on
different measures of data separability are a~so investigated for different kernel-based
classifiers.
Numerous computer experiments using the benchmark datasets verify the theoretical
results, make comparisons among the techniques in measures of classification
accuracy or area under the receiver operating characteristics curve. Computational
requirements, such as the computing time and the number of hyper-parameters, are
also discussed.
All of the presented methods are applied to breast cancer detection from fine-needle
aspiration and in mammograms, as well as screening of knee-joint vibroarthrographic
signals and automatic monitoring of roller bearings with vibration signals. Experimental
results demonstrate the excellence of these methods with improved classification
performance.
For breast cancer detection, instead of only providing a binary diagnostic decision
of 'malignant' or 'benign', we propose methods to assign a measure of confidence
of malignancy to an individual mass, by calculating probabilities of being benign and
malignant with a single classifier or a set of classifiers
A Survey on Concept Drift Adaptation
Concept drift primarily refers to an online supervised learning scenario when the relation between the in- put data and the target variable changes over time. Assuming a general knowledge of supervised learning in this paper we characterize adaptive learning process, categorize existing strategies for handling concept drift, discuss the most representative, distinct and popular techniques and algorithms, discuss evaluation methodology of adaptive algorithms, and present a set of illustrative applications. This introduction to the concept drift adaptation presents the state of the art techniques and a collection of benchmarks for re- searchers, industry analysts and practitioners. The survey aims at covering the different facets of concept drift in an integrated way to reflect on the existing scattered state-of-the-art
Performance Evaluation of Smart Decision Support Systems on Healthcare
Medical activity requires responsibility not only from clinical knowledge and skill but
also on the management of an enormous amount of information related to patient care. It is
through proper treatment of information that experts can consistently build a healthy wellness
policy. The primary objective for the development of decision support systems (DSSs) is
to provide information to specialists when and where they are needed. These systems provide
information, models, and data manipulation tools to help experts make better decisions in a
variety of situations.
Most of the challenges that smart DSSs face come from the great difficulty of dealing
with large volumes of information, which is continuously generated by the most diverse types
of devices and equipment, requiring high computational resources. This situation makes this
type of system susceptible to not recovering information quickly for the decision making. As a
result of this adversity, the information quality and the provision of an infrastructure capable
of promoting the integration and articulation among different health information systems (HIS)
become promising research topics in the field of electronic health (e-health) and that, for this
same reason, are addressed in this research. The work described in this thesis is motivated
by the need to propose novel approaches to deal with problems inherent to the acquisition,
cleaning, integration, and aggregation of data obtained from different sources in e-health environments,
as well as their analysis.
To ensure the success of data integration and analysis in e-health environments, it
is essential that machine-learning (ML) algorithms ensure system reliability. However, in this
type of environment, it is not possible to guarantee a reliable scenario. This scenario makes
intelligent SAD susceptible to predictive failures, which severely compromise overall system
performance. On the other hand, systems can have their performance compromised due to the
overload of information they can support.
To solve some of these problems, this thesis presents several proposals and studies
on the impact of ML algorithms in the monitoring and management of hypertensive disorders
related to pregnancy of risk. The primary goals of the proposals presented in this thesis are
to improve the overall performance of health information systems. In particular, ML-based
methods are exploited to improve the prediction accuracy and optimize the use of monitoring
device resources. It was demonstrated that the use of this type of strategy and methodology
contributes to a significant increase in the performance of smart DSSs, not only concerning precision
but also in the computational cost reduction used in the classification process.
The observed results seek to contribute to the advance of state of the art in methods
and strategies based on AI that aim to surpass some challenges that emerge from the integration
and performance of the smart DSSs. With the use of algorithms based on AI, it is possible to
quickly and automatically analyze a larger volume of complex data and focus on more accurate
results, providing high-value predictions for a better decision making in real time and without
human intervention.A atividade médica requer responsabilidade não apenas com base no conhecimento
e na habilidade clínica, mas também na gestão de uma enorme quantidade de informações
relacionadas ao atendimento ao paciente. É através do tratamento adequado das informações
que os especialistas podem consistentemente construir uma política saudável de bem-estar. O
principal objetivo para o desenvolvimento de sistemas de apoio à decisão (SAD) é fornecer informações
aos especialistas onde e quando são necessárias. Esses sistemas fornecem informações,
modelos e ferramentas de manipulação de dados para ajudar os especialistas a tomar melhores
decisões em diversas situações.
A maioria dos desafios que os SAD inteligentes enfrentam advêm da grande dificuldade
de lidar com grandes volumes de dados, que é gerada constantemente pelos mais diversos
tipos de dispositivos e equipamentos, exigindo elevados recursos computacionais. Essa situação
torna este tipo de sistemas suscetível a não recuperar a informação rapidamente para a
tomada de decisão. Como resultado dessa adversidade, a qualidade da informação e a provisão
de uma infraestrutura capaz de promover a integração e a articulação entre diferentes sistemas
de informação em saúde (SIS) tornam-se promissores tópicos de pesquisa no campo da saúde
eletrônica (e-saúde) e que, por essa mesma razão, são abordadas nesta investigação. O trabalho
descrito nesta tese é motivado pela necessidade de propor novas abordagens para lidar
com os problemas inerentes à aquisição, limpeza, integração e agregação de dados obtidos de
diferentes fontes em ambientes de e-saúde, bem como sua análise.
Para garantir o sucesso da integração e análise de dados em ambientes e-saúde é
importante que os algoritmos baseados em aprendizagem de máquina (AM) garantam a confiabilidade
do sistema. No entanto, neste tipo de ambiente, não é possível garantir um cenário
totalmente confiável. Esse cenário torna os SAD inteligentes suscetíveis à presença de falhas
de predição que comprometem seriamente o desempenho geral do sistema. Por outro lado, os
sistemas podem ter seu desempenho comprometido devido à sobrecarga de informações que
podem suportar.
Para tentar resolver alguns destes problemas, esta tese apresenta várias propostas e
estudos sobre o impacto de algoritmos de AM na monitoria e gestão de transtornos hipertensivos
relacionados com a gravidez (gestação) de risco. O objetivo das propostas apresentadas nesta
tese é melhorar o desempenho global de sistemas de informação em saúde. Em particular, os
métodos baseados em AM são explorados para melhorar a precisão da predição e otimizar o
uso dos recursos dos dispositivos de monitorização. Ficou demonstrado que o uso deste tipo
de estratégia e metodologia contribui para um aumento significativo do desempenho dos SAD
inteligentes, não só em termos de precisão, mas também na diminuição do custo computacional
utilizado no processo de classificação.
Os resultados observados buscam contribuir para o avanço do estado da arte em métodos
e estratégias baseadas em inteligência artificial que visam ultrapassar alguns desafios que
advêm da integração e desempenho dos SAD inteligentes. Como o uso de algoritmos baseados
em inteligência artificial é possível analisar de forma rápida e automática um volume maior de
dados complexos e focar em resultados mais precisos, fornecendo previsões de alto valor para uma melhor tomada de decisão em tempo real e sem intervenção humana
Natural Language Processing in-and-for Design Research
We review the scholarly contributions that utilise Natural Language
Processing (NLP) methods to support the design process. Using a heuristic
approach, we collected 223 articles published in 32 journals and within the
period 1991-present. We present state-of-the-art NLP in-and-for design research
by reviewing these articles according to the type of natural language text
sources: internal reports, design concepts, discourse transcripts, technical
publications, consumer opinions, and others. Upon summarizing and identifying
the gaps in these contributions, we utilise an existing design innovation
framework to identify the applications that are currently being supported by
NLP. We then propose a few methodological and theoretical directions for future
NLP in-and-for design research
- …