2,872 research outputs found
Statistical methods for tissue array images - algorithmic scoring and co-training
Recent advances in tissue microarray technology have allowed
immunohistochemistry to become a powerful medium-to-high throughput analysis
tool, particularly for the validation of diagnostic and prognostic biomarkers.
However, as study size grows, the manual evaluation of these assays becomes a
prohibitive limitation; it vastly reduces throughput and greatly increases
variability and expense. We propose an algorithm - Tissue Array Co-Occurrence
Matrix Analysis (TACOMA) - for quantifying cellular phenotypes based on
textural regularity summarized by local inter-pixel relationships. The
algorithm can be easily trained for any staining pattern, is absent of
sensitive tuning parameters and has the ability to report salient pixels in an
image that contribute to its score. Pathologists' input via informative
training patches is an important aspect of the algorithm that allows the
training for any specific marker or cell type. With co-training, the error rate
of TACOMA can be reduced substantially for a very small training sample (e.g.,
with size 30). We give theoretical insights into the success of co-training via
thinning of the feature set in a high-dimensional setting when there is
"sufficient" redundancy among the features. TACOMA is flexible, transparent and
provides a scoring process that can be evaluated with clarity and confidence.
In a study based on an estrogen receptor (ER) marker, we show that TACOMA is
comparable to, or outperforms, pathologists' performance in terms of accuracy
and repeatability.Comment: Published in at http://dx.doi.org/10.1214/12-AOAS543 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Predictive Maintenance of an External Gear Pump using Machine Learning Algorithms
The importance of Predictive Maintenance is critical for engineering industries, such as manufacturing, aerospace and energy. Unexpected failures cause unpredictable downtime, which can be disruptive and high costs due to reduced productivity. This forces industries to ensure the reliability of their equip-ment. In order to increase the reliability of equipment, maintenance actions, such as repairs, replacements, equipment updates, and corrective actions are employed. These actions affect the flexibility, quality of operation and manu-facturing time. It is therefore essential to plan maintenance before failure occurs.Traditional maintenance techniques rely on checks conducted routinely based on running hours of the machine. The drawback of this approach is that maintenance is sometimes performed before it is required. Therefore, conducting maintenance based on the actual condition of the equipment is the optimal solu-tion. This requires collecting real-time data on the condition of the equipment, using sensors (to detect events and send information to computer processor).Predictive Maintenance uses these types of techniques or analytics to inform about the current, and future state of the equipment. In the last decade, with the introduction of the Internet of Things (IoT), Machine Learning (ML), cloud computing and Big Data Analytics, manufacturing industry has moved forward towards implementing Predictive Maintenance, resulting in increased uptime and quality control, optimisation of maintenance routes, improved worker safety and greater productivity.The present thesis describes a novel computational strategy of Predictive Maintenance (fault diagnosis and fault prognosis) with ML and Deep Learning applications for an FG304 series external gear pump, also known as a domino pump. In the absence of a comprehensive set of experimental data, synthetic data generation techniques are implemented for Predictive Maintenance by perturbing the frequency content of time series generated using High-Fidelity computational techniques. In addition, various types of feature extraction methods considered to extract most discriminatory informations from the data. For fault diagnosis, three types of ML classification algorithms are employed, namely Multilayer Perceptron (MLP), Support Vector Machine (SVM) and Naive Bayes (NB) algorithms. For prognosis, ML regression algorithms, such as MLP and SVM, are utilised. Although significant work has been reported by previous authors, it remains difficult to optimise the choice of hyper-parameters (important parameters whose value is used to control the learning process) for each specific ML algorithm. For instance, the type of SVM kernel function or the selection of the MLP activation function and the optimum number of hidden layers (and neurons).It is widely understood that the reliability of ML algorithms is strongly depen-dent upon the existence of a sufficiently large quantity of high-quality training data. In the present thesis, due to the unavailability of experimental data, a novel high-fidelity in-silico dataset is generated via a Computational Fluid Dynamic (CFD) model, which has been used for the training of the underlying ML metamodel. In addition, a large number of scenarios are recreated, ranging from healthy to faulty ones (e.g. clogging, radial gap variations, axial gap variations, viscosity variations, speed variations). Furthermore, the high-fidelity dataset is re-enacted by using degradation functions to predict the remaining useful life (fault prognosis) of an external gear pump.The thesis explores and compares the performance of MLP, SVM and NB algo-rithms for fault diagnosis and MLP and SVM for fault prognosis. In order to enable fast training and reliable testing of the MLP algorithm, some predefined network architectures, like 2n neurons per hidden layer, are used to speed up the identification of the precise number of neurons (shown to be useful when the sample data set is sufficiently large). Finally, a series of benchmark tests are presented, enabling to conclude that for fault diagnosis, the use of wavelet features and a MLP algorithm can provide the best accuracy, and the MLP al-gorithm provides the best prediction results for fault prognosis. In addition, benchmark examples are simulated to demonstrate the mesh convergence for the CFD model whereas, quantification analysis and noise influence on training data are performed for ML algorithms
Robust classification of advanced power quality disturbances in smart grids
The insertion of new devices, increased data flow, intermittent generation and massive
computerization have considerably increased current electrical systems’ complexity. This
increase resulted in necessary changes, such as the need for more intelligent electrical net works to adapt to this different reality. Artificial Intelligence (AI) plays an important role
in society, especially the techniques based on the learning process, and it is extended to the
power systems. In the context of Smart Grids (SG), where the information and innovative
solutions in monitoring is a primary concern, those techniques based on AI can present
several applications. This dissertation investigates the use of advanced signal processing
and ML algorithms to create a Robust Classifier of Advanced Power Quality (PQ) Dis turbances in SG. For this purpose, known models of PQ disturbances were generated with
random elements to approach real applications. From these models, thousands of signals
were generated with the performance of these disturbances. Signal processing techniques
using Discrete Wavelet Transform (DWT) were used to extract the signal’s main charac teristics. This research aims to use ML algorithms to classify these data according to their
respective features. ML algorithms were trained, validated, and tested. Also, the accuracy
and confusion matrix were analyzed, relating the logic behind the results. The stages of
data generation, feature extraction and optimization techniques were performed in the
MATLAB software. The Classification Learner toolbox was used for training, validation
and testing the 27 different ML algorithms and assess each performance. All stages of
the work were previously idealized, enabling their correct development and execution.
The results show that the Cubic Support Vector Machine (SVM) classifier achieved the
maximum accuracy of all algorithms, indicating the effectiveness of the proposed method
for classification. Considerations about the results were interpreted as explaining the per formance of each technique, its relations and their respective justifications.A inserção de novos dispositivos na rede, aumento do fluxo de dados, geração intermitente
e a informatização massiva aumentaram consideravelmente a complexidade dos sistemas
elétricos atuais. Esse aumento resultou em mudanças necessárias, como a necessidade de
redes elétricas mais inteligentes para se adaptarem a essa realidade diferente. A nova ger ação de técnicas de Inteligência Artificial, representada pelo "Big Data", Aprendizado de
Máquina ("Machine Learning"), Aprendizagem Profunda e Reconhecimento de Padrões
representa uma nova era na sociedade e no desenvolvimento global baseado na infor mação e no conhecimento. Com as mais recentes Redes Inteligentes, o uso de técnicas que
utilizem esse tipo de inteligência será ainda mais necessário. Esta dissertação investiga
o uso de processamento de sinais avançado e também algoritmos de Aprendizagem de
Máquina para desenvolver um classificador robusto de distúrbios de qualidade de energia
no contexto das Redes Inteligentes. Para isso, modelos já conhecidos de alguns proble mas de qualidade foram gerados junto com ruídos aleatórios para que o sistema fosse
similar a aplicações reais. A partir desses modelos, milhares de sinais foram gerados e a
Transformada Wavelet Discreta foi usada para extrair as principais características destas
perturbações. Esta dissertação tem como objetivo utilizar algoritmos baseados no con ceito de Aprendizado de Máquina para classificar os dados gerados de acordo com suas
classes. Todos estes algoritmos foram treinados, validados e por fim, testados. Além disso,
a acurácia e a matriz de confusão de cada um dos modelos foi apresentada e analisada. As
etapas de geração de dados, extração das principais características e otimização dos dados
foram realizadas no software MATLAB. Uma toolbox específica deste programa foi us ada para treinar, validar e testar os 27 algoritmos diferentes e avaliar cada desempenho.
Todas as etapas do trabalho foram previamente idealizadas, possibilitando seu correto
desenvolvimento e execução. Os resultados mostram que o classificador "Cubic Support
Vector Machine" obteve a máxima precisão entre todos os algoritmos, indicando a eficácia
do método proposto para classificação. As considerações sobre os resultados foram inter pretadas, como por exemplo a explicação da performance de cada técnica, suas relações
e suas justificativas
Recommended from our members
HYPOTHYROID DISEASE ANALYSIS BY USING MACHINE LEARNING
Thyroid illness frequently manifests as hypothyroidism. It is evident that people with hypothyroidism are primarily female. Because the majority of people are unaware of the illness, it is quickly becoming more serious. It is crucial to catch it early on so that medical professionals can treat it more effectively and prevent it from getting worse. Machine learning illness prediction is a challenging task. Disease prediction is aided greatly by machine learning. Once more, unique feature selection strategies have made the process of disease assumption and prediction easier. To properly monitor and cure this illness, accurate detection is essential. In order to build models that can forecast the development of hypothyroidism. In this project, we utilized machine learning approaches such Logistic Regression, Decision Trees, and Naive Bayes. Here we used thyroid function-related measures and characteristics from a UCI Machine Learning Repository dataset. The main goals were to properly assess each machine learning model\u27s performance and fine-tune its hyperparameters. With an accuracy rate of 99.87%, the findings of this study generated the model\u27s ability to predict hypothyroidism were pretty remarkable. This high degree of accuracy shows how useful these machine learning algorithms are as diagnostic v and therapeutic tools for hypothyroid patients early on. This experiment demonstrates the potential of machine learning in healthcare and has an impact on diagnosis. It is crucial that you do this appropriately
A survey of machine learning techniques applied to self organizing cellular networks
In this paper, a survey of the literature of the past fifteen years involving Machine Learning (ML) algorithms applied to self organizing cellular networks is performed. In order for future networks to overcome the current limitations and address the issues of current cellular systems, it is clear that more intelligence needs to be deployed, so that a fully autonomous and flexible network can be enabled. This paper focuses on the learning perspective of Self Organizing Networks (SON) solutions and provides, not only an overview of the most common ML techniques encountered in cellular networks, but also manages to classify each paper in terms of its learning solution, while also giving some examples. The authors also classify each paper in terms of its self-organizing use-case and discuss how each proposed solution performed. In addition, a comparison between the most commonly found ML algorithms in terms of certain SON metrics is performed and general guidelines on when to choose each ML algorithm for each SON function are proposed. Lastly, this work also provides future research directions and new paradigms that the use of more robust and intelligent algorithms, together with data gathered by operators, can bring to the cellular networks domain and fully enable the concept of SON in the near future
Data Driven Sample Generator Model with Application to Classification
Despite the rapidly growing interest, progress in the study of relations between physiological abnormalities and mental disorders is hampered by complexity of the human brain and high costs of data collection. The complexity can be captured by machine learning approaches, but they still may require significant amounts of data. In this thesis, we seek to mitigate the latter challenge by developing a data driven sample generator model for the generation of synthetic realistic training data. Our method greatly improves generalization in classification of schizophrenia patients and healthy controls from their structural magnetic resonance images. A feed forward neural network trained exclusively on continuously generated synthetic data produces the best area under the curve compared to classifiers trained on real data alone
- …