Search CORE

21 research outputs found

DOME: recommendations for supervised machine learning validation in biology

Author: Broin P. O.
Capella-Gutierrez S.
Capriotti E.
Casadio R.
Cirillo D.
Del Angel V. D.
Del Conte A.
Dimopoulos A. C.
Dopazo J.
Fariselli P.
Fernandez J. M.
Fishman D.
Garcia-Gasulla D.
Harrow J.
Huber F.
Kreshuk A.
Lenaerts T.
Martelli P. L.
Navarro A.
Pinero J.
Piovesan D.
Pollastri G.
Psomopoulos F. E.
Reczko M.
Ronzano F.
Salgado D.
Satagopam V.
Savojardo C.
Spiwok V.
Tangaro M. A.
Tartari G.
Titma T.
Tosatto S. C. E.
Valencia A.
Walsh I.
Zambelli F.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2021
Field of study

Institutional Research Information System University of Turin

Deep Generative Adversarial Networks: Applications in Musculoskeletal Imaging

Author: 양재문
이영한
Publication venue: 'Radiological Society of North America (RSNA)'
Publication date: 01/03/2021
Field of study

In recent years, deep learning techniques have been applied in musculoskeletal radiology to increase the diagnostic potential of acquired images. Generative adversarial networks (GANs), which are deep neural networks that can generate or transform images, have the potential to aid in faster imaging by generating images with a high level of realism across multiple contrast and modalities from existing imaging protocols. This review introduces the key architectures of GANs as well as their technical background and challenges. Key research trends are highlighted, including: (a) reconstruction of high-resolution MRI; (b) image synthesis with different modalities and contrasts; (c) image enhancement that efficiently preserves high-frequency information suitable for human interpretation; (d) pixel-level segmentation with annotation sharing between domains; and (e) applications to different musculoskeletal anatomies. In addition, an overview is provided of the key issues wherein clinical applicability is challenging to capture with conventional performance metrics and expert evaluation. When clinically validated, GANs have the potential to improve musculoskeletal imaging.ope

Yonsei University Medical Library Open Access Repository

Doctor of Philosophy

Author: Saha Avishek
Publication venue: University of Utah
Publication date: 01/12/2012
Field of study

dissertationMachine learning is the science of building predictive models from data that automatically improve based on past experience. To learn these models, traditional learning algorithms require labeled data. They also require that the entire dataset fits in the memory of a single machine. Labeled data are available or can be acquired for small and moderately sized datasets but curating large datasets can be prohibitively expensive. Similarly, massive datasets are usually too huge to fit into the memory of a single machine. An alternative is to distribute the dataset over multiple machines. Distributed learning, however, poses new challenges as most existing machine learning techniques are inherently sequential. Additionally, these distributed approaches have to be designed keeping in mind various resource limitations of real-world settings, prime among them being intermachine communication. With the advent of big datasets machine learning algorithms are facing new challenges. Their design is no longer limited to minimizing some loss function but, additionally, needs to consider other resources that are critical when learning at scale. In this thesis, we explore different models and measures for learning with limited resources that have a budget. What budgetary constraints are posed by modern datasets? Can we reuse or combine existing machine learning paradigms to address these challenges at scale? How does the cost metrics change when we shift to distributed models for learning? These are some of the questions that have been investigated in this thesis. The answers to these questions hold the key to addressing some of the challenges faced when learning on massive datasets. In the first part of this thesis, we present three different budgeted scenarios that deal with scarcity of labeled data and limited computational resources. The goal is to leverage transfer information from related domains to learn under budgetary constraints. Our proposed techniques comprise semisupervised transfer, online transfer and active transfer. In the second part of this thesis, we study distributed learning with limited communication. We present initial sampling based results, as well as, propose communication protocols for learning distributed linear classifiers

The University of Utah: J. Willard Marriott Digital Library

Comprehensive Study of Automatic Speech Emotion Recognition Systems

Author: Jagtap Sonal
Kawade Rupali
Publication venue: Auricle Global Society of Education and Research
Publication date: 31/08/2023
Field of study

Speech emotion recognition (SER) is the technology that recognizes psychological characteristics and feelings from the speech signals through techniques and methodologies. SER is challenging because of more considerable variations in different languages arousal and valence levels. Various technical developments in artificial intelligence and signal processing methods have encouraged and made it possible to interpret emotions.SER plays a vital role in remote communication. This paper offers a recent survey of SER using machine learning (ML) and deep learning (DL)-based techniques. It focuses on the various feature representation and classification techniques used for SER. Further, it describes details about databases and evaluation metrics used for speech emotion recognition

International Journal on Recent and Innovation Trends in Computing and Communication

A Comparative Study of Sentiment Analysis Methods for Detecting Fake Reviews in E-Commerce

Author: Boongasame Laor
Puttarattanamanee Maneerat
Thammarak Karanrat
Publication venue: Ital Publication
Publication date: 01/06/2023
Field of study

The popularity of the e-commerce system has increased, especially under the COVID scenario. Consumer product reviews from the past have had a significant impact on influencing consumers' purchasing decisions. Fake reviews—those written by humans and computers that engage in dishonest behavior—are consequently generated to increase product sales. The fake reviews hurt consumers and are dishonest. The goal of this research is to examine and evaluate the performance of various methods for identifying fake reviews. The well-known and widely-used Amazon Review Data (2018) dataset was used for this research. The first 10 product categories on Amazon.com with favorable feedback will be provided in the data section. After that, perform fundamental data preparation procedures such as special character trimming, bag of words, TF-IDF, etc. The models are trained to create a dataset for detecting fake reviews. This research compares the performance of four different models: GPT-2, NBSVM, BiLSTM, and RoBERTa. The hyperparameters of the models are also tuned to find the optimal values. The research concludes that the RoBERTa model performs the best overall, with an accuracy of 97%. GPT-2 has an overall accuracy of 82%, NBSVM has an overall accuracy of 95%, and BiLSTM has an overall accuracy of 92%. The research also calculates the Area Under the Curve (AUC) for each model and finds that RoBERTa has an AUC of 0.9976, NBSVM has an AUC of 0.9888, BiLSTM has an AUC of 0.9753, and GPT-2 has an AUC of 0.9226. It can be observed that the RoBERTa model has the highest AUC value, which is close to 1. Therefore, it can be concluded that this model provides the most accurate prediction for detecting fake reviews, which is the main focus of this research. Doi: 10.28991/HIJ-2023-04-02-08 Full Text: PD

HighTech and Innovation Journal

Multi-Instance Multilabel Learning with Weak-Label for Predicting Protein Function in Electricigens

Author: Hai-Feng Hu
Jian-Sheng Wu
Li-Hua Tang
Shan-Cheng Yan
Publication venue
Publication date: 10/04/2020
Field of study

Nature often brings several domains together to form multidomain and multifunctional proteins with a vast number of possibilities. In our previous study, we disclosed that the protein function prediction problem is naturally and inherently Multi-Instance Multilabel (MIML) learning tasks. Automated protein function prediction is typically implemented under the assumption that the functions of labeled proteins are complete; that is, there are no missing labels. In contrast, in practice just a subset of the functions of a protein are known, and whether this protein has other functions is unknown. It is evident that protein function prediction tasks suffer from weak-label problem; thus protein function prediction with incomplete annotation matches well with the MIML with weak-label learning framework. In this paper, we have applied the state-of-the-art MIML with weak-label learning algorithm MIMLwel for predicting protein functions in two typical real-world electricigens organisms which have been widely used in microbial fuel cells (MFCs) researches. Our experimental results validate the effectiveness of MIMLwel algorithm in predicting protein functions with incomplete annotation

CiteSeerX

Language-Aware Soft Prompting: Text-to-Text Optimization for Fewand Zero-Shot Adaptation of V&L Models

Author: Bulat A
Tzimiropoulos G
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 26/10/2023
Field of study

Soft prompt learning has emerged as a promising direction for adapting V &L models to a downstream task using a few training examples. However, current methods significantly overfit the training data suffering from large accuracy degradation when tested on unseen classes from the same domain. In addition, all prior methods operate exclusively under the assumption that both vision and language data is present. To this end, we make the following 5 contributions: (1) To alleviate base class overfitting, we propose a novel Language-Aware Soft Prompting (LASP) learning method by means of a text-to-text cross-entropy loss that maximizes the probability of the learned prompts to be correctly classified with respect to pre-defined hand-crafted textual prompts. (2) To increase the representation capacity of the prompts, we also propose grouped LASP where each group of prompts is optimized with respect to a separate subset of textual prompts. (3) Moreover, we identify a visual-language misalignment introduced by prompt learning and LASP, and more importantly, propose a re-calibration mechanism to address it. (4) Importantly, we show that LASP is inherently amenable to including, during training, virtual classes, i.e. class names for which no visual samples are available, further increasing the robustness of the learned prompts. Expanding for the first time the setting to language-only adaptation, (5) we present a novel zero-shot variant of LASP where no visual samples at all are available for the downstream task. Through evaluations on 11 datasets, we show that our approach (a) significantly outperforms all prior works on soft prompting, and (b) matches and surpasses, for the first time, the accuracy on novel classes obtained by hand-crafted prompts and CLIP for 8 out of 11 test datasets. Finally, (c) we show that our zero-shot variant improves upon CLIP without requiring any extra data. Code will be made available

Queen Mary Research Online

Performance of Protein Disorder Prediction Programs on Amino Acid Substitutions

Author: Ali Heidi
Gurarslan Omer
Urolagin Siddhaling
Vihinen Mauno
Publication venue: 'Wiley'
Publication date: 01/01/2014
Field of study

Many proteins contain intrinsically disordered regions, which may be crucial for function, but on the other hand be related to the pathogenicity of variants. Prediction programs have been developed to detect disordered regions from sequences and used to predict the consequences of variants, although their performance for this task has not been assessed. We tested the performance of protein disorder prediction programs in detecting changes to disorder caused by amino acid substitutions. We assessed the performance of 29 protein disorder predictors and versions with 101 amino acid substitutions, whose effects have been experimentally validated. Disorder predictors detected the true positives at most with 6% success rate and true negatives with 34% rate for variants. The corresponding rates for the wild-type forms are 7% and 90%, respectively. The analysis revealed that disorder programs cannot reliably predict the effects of substitutions; consequently, the tested methods, and possibly similar programs, cannot be recommended for variant analysis without other information indicating to the relevance of disorder. These results inspired us to develop a new method, PON-Diso (http://structure.bmc.lu.se/PON-Diso), for disorder-related amino acid substitutions. With 50% success rate for independent test set and 70.5% rate in cross-validation, it outperforms the evaluated methods

Lund University Publications