68 research outputs found
Multi-modal Hate Speech Detection using Machine Learning
With the continuous growth of internet users and media content, it is very
hard to track down hateful speech in audio and video. Converting video or audio
into text does not detect hate speech accurately as human sometimes uses
hateful words as humorous or pleasant in sense and also uses different voice
tones or show different action in the video. The state-ofthe-art hate speech
detection models were mostly developed on a single modality. In this research,
a combined approach of multimodal system has been proposed to detect hate
speech from video contents by extracting feature images, feature values
extracted from the audio, text and used machine learning and Natural language
processing.Comment: 5 pages, 2 figures, conferenc
Secure Electronic Payment: Proposed method for the growth of E-commerce in Bangladesh
The innovations in technologies are changing the social, cultural and economic relationships in a vast variety of ways. Information technology has become a necessary tool for today’s organizations and banking industry is not an exception. Electronic commerce is rapidly growing modern business process in all over the world, but in Bangladesh it is increasing comparatively slower rate. Though Bangladesh was comparatively a late introducer of e-banking but now almost all the banks are providing internet banking to their customers. But what is important is the lack of comprehensive study of the e-payment security covering organizational and customers’ point of view. Considering the importance of secure e-payment to grow up e-commerce in Bangladesh this study was taken up. It was revealed from the survey that the customers’ and service providers main concern was security. E-banking service providers should have to ensure that online banking is safe and secure for every user in all kind of transactions. The Authors also proposed a secure e-payment model in general and on debit card and credit card to bring customers confidence on e-payment and increase the volume of e-commerce in the context of Bangladesh.
 
SynthEnsemble: A Fusion of CNN, Vision Transformer, and Hybrid Models for Multi-Label Chest X-Ray Classification
Chest X-rays are widely used to diagnose thoracic diseases, but the lack of
detailed information about these abnormalities makes it challenging to develop
accurate automated diagnosis systems, which is crucial for early detection and
effective treatment. To address this challenge, we employed deep learning
techniques to identify patterns in chest X-rays that correspond to different
diseases. We conducted experiments on the "ChestX-ray14" dataset using various
pre-trained CNNs, transformers, hybrid(CNN+Transformer) models and classical
models. The best individual model was the CoAtNet, which achieved an area under
the receiver operating characteristic curve (AUROC) of 84.2%. By combining the
predictions of all trained models using a weighted average ensemble where the
weight of each model was determined using differential evolution, we further
improved the AUROC to 85.4%, outperforming other state-of-the-art methods in
this field. Our findings demonstrate the potential of deep learning techniques,
particularly ensemble deep learning, for improving the accuracy of automatic
diagnosis of thoracic diseases from chest X-rays.Comment: Accepted in International Conference on Computer and Information
Technology (ICCIT) 202
Affective social anthropomorphic intelligent system
Human conversational styles are measured by the sense of humor, personality,
and tone of voice. These characteristics have become essential for
conversational intelligent virtual assistants. However, most of the
state-of-the-art intelligent virtual assistants (IVAs) are failed to interpret
the affective semantics of human voices. This research proposes an
anthropomorphic intelligent system that can hold a proper human-like
conversation with emotion and personality. A voice style transfer method is
also proposed to map the attributes of a specific emotion. Initially, the
frequency domain data (Mel-Spectrogram) is created by converting the temporal
audio wave data, which comprises discrete patterns for audio features such as
notes, pitch, rhythm, and melody. A collateral CNN-Transformer-Encoder is used
to predict seven different affective states from voice. The voice is also fed
parallelly to the deep-speech, an RNN model that generates the text
transcription from the spectrogram. Then the transcripted text is transferred
to the multi-domain conversation agent using blended skill talk,
transformer-based retrieve-and-generate generation strategy, and beam-search
decoding, and an appropriate textual response is generated. The system learns
an invertible mapping of data to a latent space that can be manipulated and
generates a Mel-spectrogram frame based on previous Mel-spectrogram frames to
voice synthesize and style transfer. Finally, the waveform is generated using
WaveGlow from the spectrogram. The outcomes of the studies we conducted on
individual models were auspicious. Furthermore, users who interacted with the
system provided positive feedback, demonstrating the system's effectiveness.Comment: Multimedia Tools and Applications (2023
CNN-XGBoost fusion-based affective state recognition using EEG spectrogram image analysis
Recognizing emotional state of human using brain signal is an active research domain with several open challenges. In this research, we propose a signal spectrogram image based CNN-XGBoost fusion method for recognising three dimensions of emotion, namely arousal (calm or excitement), valence (positive or negative feeling) and dominance (without control or empowered). We used a benchmark dataset called DREAMER where the EEG signals were collected from multiple stimulus along with self-evaluation ratings. In our proposed method, we first calculate the Short-Time Fourier Transform (STFT) of the EEG signals and convert them into RGB images to obtain the spectrograms. Then we use a two dimensional Convolutional Neural Network (CNN) in order to train the model on the spectrogram images and retrieve the features from the trained layer of the CNN using a dense layer of the neural network. We apply Extreme Gradient Boosting (XGBoost) classifier on extracted CNN features to classify the signals into arousal, valence and dominance of human emotion. We compare our results with the feature fusion-based state-of-the-art approaches of emotion recognition. To do this, we applied various feature extraction techniques on the signals which include Fast Fourier Transformation, Discrete Cosine Transformation, Poincare, Power Spectral Density, Hjorth parameters and some statistical features. Additionally, we use Chi-square and Recursive Feature Elimination techniques to select the discriminative features. We form the feature vectors by applying feature level fusion, and apply Support Vector Machine (SVM) and Extreme Gradient Boosting (XGBoost) classifiers on the fused features to classify different emotion levels. The performance study shows that the proposed spectrogram image based CNN-XGBoost fusion method outperforms the feature fusion-based SVM and XGBoost methods. The proposed method obtained the accuracy of 99.712% for arousal, 99.770% for valence and 99.770% for dominance in human emotion detection.publishedVersio
Vision transformer and explainable transfer learning models for auto detection of kidney cyst, stone and tumor from CT-radiography
Renal failure, a public health concern, and the scarcity of nephrologists around the globe have necessitated the development of an AI-based system to auto-diagnose kidney diseases. This research deals with the three major renal diseases categories: kidney stones, cysts, and tumors, and gathered and annotated a total of 12,446 CT whole abdomen and urogram images in order to construct an AI-based kidney diseases diagnostic system and contribute to the AI community’s research scope e.g., modeling digital-twin of renal functions. The collected images were exposed to exploratory data analysis, which revealed that the images from all of the classes had the same type of mean color distribution. Furthermore, six machine learning models were built, three of which are based on the state-of-the-art variants of the Vision transformers EANet, CCT, and Swin transformers, while the other three are based on well-known deep learning models Resnet, VGG16, and Inception v3, which were adjusted in the last layers. While the VGG16 and CCT models performed admirably, the swin transformer outperformed all of them in terms of accuracy, with an accuracy of 99.30 percent. The F1 score and precision and recall comparison reveal that the Swin transformer outperforms all other models and that it is the quickest to train. The study also revealed the blackbox of the VGG16, Resnet50, and Inception models, demonstrating that VGG16 is superior than Resnet50 and Inceptionv3 in terms of monitoring the necessary anatomy abnormalities. We believe that the superior accuracy of our Swin transformer-based model and the VGG16-based model can both be useful in diagnosing kidney tumors, cysts, and stones.publishedVersio
Connected Hidden Neurons (CHNNet): An Artificial Neural Network for Rapid Convergence
Despite artificial neural networks being inspired by the functionalities of
biological neural networks, unlike biological neural networks, conventional
artificial neural networks are often structured hierarchically, which can
impede the flow of information between neurons as the neurons in the same layer
have no connections between them. Hence, we propose a more robust model of
artificial neural networks where the hidden neurons, residing in the same
hidden layer, are interconnected that leads to rapid convergence. With the
experimental study of our proposed model in deep networks, we demonstrate that
the model results in a noticeable increase in convergence rate compared to the
conventional feed-forward neural network
Combining Machine Learning Classifiers for Stock Trading with Effective Feature Extraction
The unpredictability and volatility of the stock market render it challenging
to make a substantial profit using any generalized scheme. This paper intends
to discuss our machine learning model, which can make a significant amount of
profit in the US stock market by performing live trading in the Quantopian
platform while using resources free of cost. Our top approach was to use
ensemble learning with four classifiers: Gaussian Naive Bayes, Decision Tree,
Logistic Regression with L1 regularization and Stochastic Gradient Descent, to
decide whether to go long or short on a particular stock. Our best model
performed daily trade between July 2011 and January 2019, generating 54.35%
profit. Finally, our work showcased that mixtures of weighted classifiers
perform better than any individual predictor about making trading decisions in
the stock market
- …