357 research outputs found
Learning from high-dimensional and class-imbalanced datasets using random forests
Class imbalance and high dimensionality are two major issues in several real-life applications, e.g., in the fields of bioinformatics, text mining and image classification. However, while both issues have been extensively studied in the machine learning community, they have mostly been treated separately, and little research has been thus far conducted on which approaches might be best suited to deal with datasets that are class-imbalanced and high-dimensional at the same time (i.e., with a large number of features). This work attempts to give a contribution to this challenging research area by studying the effectiveness of hybrid learning strategies that involve the integration of feature selection techniques, to reduce the data dimensionality, with proper methods that cope with the adverse effects of class imbalance (in particular, data balancing and cost-sensitive methods are considered). Extensive experiments have been carried out across datasets from different domains, leveraging a well-known classifier, the Random Forest, which has proven to be effective in high-dimensional spaces and has also been successfully applied to imbalanced tasks. Our results give evidence of the benefits of such a hybrid approach, when compared to using only feature selection or imbalance learning methods alone
Using Artificial Intelligence for COVID-19 Detection in Blood Exams: A Comparative Analysis
COVID-19 is an infectious disease that was declared a pandemic by the World Health Organization (WHO) in early March 2020. Since its early development, it has challenged health systems around the world. Although more than 12 billion vaccines have been administered, at the time of writing, it has more than 623 million confirmed cases and more than 6 million deaths reported to the WHO. These numbers continue to grow, soliciting further research efforts to reduce the impacts of such a pandemic. In particular, artificial intelligence techniques have shown great potential in supporting the early diagnosis, detection, and monitoring of COVID-19 infections from disparate data sources. In this work, we aim to make a contribution to this field by analyzing a high-dimensional dataset containing blood sample data from over forty thousand individuals recognized as infected or not with COVID-19. Encompassing a wide range of methods, including traditional machine learning algorithms, dimensionality reduction techniques, and deep learning strategies, our analysis investigates the performance of different classification models, showing that accurate detection of blood infections can be obtained. In particular, an F-score of 84% was achieved by the artificial neural network model we designed for this task, with a rate of 87% correct predictions on the positive class. Furthermore, our study shows that the dimensionality of the original data, i.e. the number of features involved, can be significantly reduced to gain efficiency without compromising the final prediction performance. These results pave the way for further research in this field, confirming that artificial intelligence techniques may play an important role in supporting medical decision-making
Exploiting Feature Selection in Human Activity Recognition: Methodological Insights and Empirical Results Using Mobile Sensor Data
Human Activity Recognition (HAR) using mobile sensor data has gained increasing attention over the last few years, with a fast-growing number of reported applications. The central role of machine learning in this field has been discussed by a vast amount of research works, with several strategies proposed for processing raw data, extracting suitable features, and inducing predictive models capable of recognizing multiple types of daily activities. Since many HAR systems are implemented in resource-constrained mobile devices, the efficiency of the induced models is a crucial aspect to consider. This paper highlights the importance of exploiting dimensionality reduction techniques that can simplify the model and increase efficiency by identifying and retaining only the most informative and predictive features for activity recognition. More in detail, a large experimental study is presented that encompasses different feature selection algorithms as well as multiple HAR benchmarks containing mobile sensor data. Such a comparative evaluation relies on a methodological framework that is meant to assess not only the extent to which each selection method is effective in identifying the most predictive features but also the overall stability of the selection process, i.e., its robustness to changes in the input data. Although often neglected, in fact, the stability of the selected feature sets is important for a wider exploitability of the induced models. Our experimental results give an interesting insight into which selection algorithms may be most suited in the HAR domain, complementing and significantly extending the studies currently available in this field
An Evaluation of Feature Selection Robustness on Class Noisy Data
With the increasing growth of data dimensionality, feature selection has become a crucial step in a variety of machine learning and data mining applications. In fact, it allows identifying the most important attributes of the task at hand, improving the efficiency, interpretability, and final performance of the induced models. In recent literature, several studies have examined the strengths and weaknesses of the available feature selection methods from different points of view. Still, little work has been performed to investigate how sensitive they are to the presence of noisy instances in the input data. This is the specific field in which our work wants to make a contribution. Indeed, since noise is arguably inevitable in several application scenarios, it would be important to understand the extent to which the different selection heuristics can be affected by noise, in particular class noise (which is more harmful in supervised learning tasks). Such an evaluation may be especially important in the context of class-imbalanced problems, where any perturbation in the set of training records can strongly affect the final selection outcome. In this regard, we provide here a two-fold contribution by presenting (i) a general methodology to evaluate feature selection robustness on class noisy data and (ii) an experimental study that involves different selection methods, both univariate and multivariate. The experiments have been conducted on eight high-dimensional datasets chosen to be representative of different real-world domains, with interesting insights into the intrinsic degree of robustness of the considered selection approaches
An Anomaly Detection Approach to Determine Optimal Cutting Time in Cheese Formation
The production of cheese, a beloved culinary delight worldwide, faces challenges in maintaining consistent product quality and operational efficiency. One crucial stage in this process is determining the precise cutting time during curd formation, which significantly impacts the quality of the cheese. Misjudging this timing can lead to the production of inferior products, harming a company’s reputation and revenue. Conventional methods often fall short of accurately assessing variations in coagulation conditions due to the inherent potential for human error. To address this issue, we propose an anomaly-detection-based approach. In this approach, we treat the class representing curd formation as the anomaly to be identified. Our proposed solution involves utilizing a one-class, fully convolutional data description network, which we compared against several stateof-the-art methods to detect deviations from the standard coagulation patterns. Encouragingly, our results show F1 scores of up to 0.92, indicating the effectiveness of our approach
The Portrayal of Complementary and Alternative Medicine in Mass Print Magazines Since 1980
Objectives: The objectives of this study were to examine and describe the portrayal of complementary and alternative medicine (CAM) in mass print media magazines.
Design: The sample included all 37 articles found in magazines with circulation rates of greater than 1 million published in the United States and Canada from 1980 to 2005. The analysis was quantitative and qualitative and included investigation of both manifest and latent magazine story messages.
Results: Manifest analysis noted that CAM was largely represented as a treatment for a patient with a medically diagnosed illness or specific symptoms. Discussions used biomedical terms such as patient rather than consumer and disease rather than wellness. Latent analysis revealed three themes: (1) CAMs were described as good but not good enough; (2) individualism and consumerism were venerated; and (3) questions of costs were raised in the context of confusion and ambivalence
Paediatric radiology seen from Africa. Part I: providing diagnostic imaging to a young population
Article approval pendingPaediatric radiology requires dedicated equipment, specific precautions related to ionising radiation, and specialist knowledge. Developing countries face difficulties in providing adequate imaging services for children. In many African countries, children represent an increasing proportion of the population, and additional challenges follow from extreme living conditions, poverty, lack of parental care, and exposure to tuberculosis, HIV, pneumonia, diarrhoea and violent trauma. Imaging plays a critical role in the treatment of these children, but is expensive and difficult to provide. The World Health Organisation initiatives, of which the World Health Imaging System for Radiography (WHIS-RAD) unit is one result, needs to expand into other areas such as the provision of maintenance servicing. New initiatives by groups such as Rotary and the World Health Imaging Alliance to install WHIS-RAD units in developing countries and provide digital solutions, need support. Paediatric radiologists are needed to offer their services for reporting, consultation and quality assurance for free by way of teleradiology. Societies for paediatric radiology are needed to focus on providing a volunteer teleradiology reporting group, information on child safety for basic imaging, guidelines for investigations specific to the disease spectrum, and solutions for optimising imaging in children
Biomarker dynamics affecting neoadjuvant therapy response and outcome of HER2-positive breast cancer subtype
HER2+ breast cancer (BC) is an aggressive subtype genetically and biologically heterogeneous. We evaluate the predictive and prognostic role of HER2 protein/gene expression levels combined with clinico-pathologic features in 154 HER2+ BCs patients who received trastuzumab-based neoadjuvant chemotherapy (NACT). The tumoral pathological complete response (pCR) rate was 40.9%. High tumoral pCR show a scarce mortality rate vs subjects with a lower response. 93.7% of ypT0 were HER2 IHC3+ BC, 6.3% were HER2 IHC 2+/SISH+ and 86.7% of ypN0 were HER2 IHC3+, the remaining were HER2 IHC2+/SISH+. Better pCR rate correlate with a high percentage of infiltrating immune cells and right-sided tumors, that reduce distant metastasis and improve survival, but no incidence difference. HER2 IHC score and laterality emerge as strong predictors of tumoral pCR after NACT from machine learning analysis. HER2 IHC3+ and G3 are poor prognostic factors for HER2+ BC patients, and could be considered in the application of neoadjuvant therapy. Increasing TILs concentrations, lower lymph node ratio and lower residual tumor cellularity are associated with a better outcome. The immune microenvironment and scarce lymph node involvement have crucial role in clinical outcomes. The combination of all predictors might offer new options for NACT effectiveness prediction and stratification of HER2+ BC during clinical decision-making
Odorant binding proteins : a biotechnological tool for odour control
The application of an odorant binding protein for odour control and fragrance delayed release from a textile surface was first explored in this work. Pig OBP-1 gene was cloned and expressed in Escherichia coli , and the purified protein was biochemically characterized. The IC50 values(concentrations of competitor that caused a decay of fluorescence to half-maximal intensity) were determined for four distinct fragrances, namely, citronellol, benzyl benzoate,citronellyl valerate and ethyl valerate. The results showed a strong binding of citronellyl valerate,citronellol and benzyl benzoate to the recombinant protein, while ethyl valerate displayed weaker binding. Cationized cotton substrates were coated with porcine odorant binding protein and tested for
their capacity to retain citronellol and to mask the smell of
cigarette smoke. The immobilized protein delayed the release
of citronellol when compared to the untreated cotton. According to a blind evaluation of 30 assessors, the smell of cigarette smoke, trapped onto the fabrics’ surface, was successfully attenuated by porcine odorant binding protein (more than 60 % identified the weakest smell intensity after protein exposure compared to β-cyclodextrin-treated and untreated cotton fabrics). This work demonstrated that porcine odorant binding protein can be an efficient solution to prevent and/orremove unpleasant odours trapped on the large surface of textiles. Its intrinsic properties make odorant binding proteins excellent candidates for controlled release systems which constitute a new application for this class of proteins.This work was co-funded by the European Social Fund through the management authority POPH and FCT. The authors Carla Silva and Teresa Matama would like to acknowledge their post-doctoral fellowships: SFRH/BPD/46515/2008 and SFRH/BPD/47555/2008, respectively
- …