28 research outputs found
LDEB -- Label Digitization with Emotion Binarization and Machine Learning for Emotion Recognition in Conversational Dialogues
Emotion recognition in conversations (ERC) is vital to the advancements of
conversational AI and its applications. Therefore, the development of an
automated ERC model using the concepts of machine learning (ML) would be
beneficial. However, the conversational dialogues present a unique problem
where each dialogue depicts nested emotions that entangle the association
between the emotional feature descriptors and emotion type (or label). This
entanglement that can be multiplied with the presence of data paucity is an
obstacle for a ML model. To overcome this problem, we proposed a novel approach
called Label Digitization with Emotion Binarization (LDEB) that disentangles
the twists by utilizing the text normalization and 7-bit digital encoding
techniques and constructs a meaningful feature space for a ML model to be
trained. We also utilized the publicly available dataset called the
FETA-DailyDialog dataset for feature learning and developed a hierarchical ERC
model using random forest (RF) and artificial neural network (ANN) classifiers.
Simulations showed that the ANN-based ERC model was able to predict emotion
with the best accuracy and precision scores of about 74% and 76%, respectively.
Simulations also showed that the ANN-model could reach a training accuracy
score of about 98% with 60 epochs. On the other hand, the RF-based ERC model
was able to predict emotions with the best accuracy and precision scores of
about 78% and 75%, respectively.Comment: 10 pages, 3 figures, 4 table
A Software Engineering Schema for Data Intensive Applications
The features developed by a software engineer (system specification) for a software system may significantly differ from the features required by a user (user requirements) for their envisioned system. These discrepancies are generally resulted from the complexity of the system, the vagueness of the user requirements, or the lack of knowledge and experience of the software engineer. The principles of software engineering and the recommendations of the ACM's Software Engineering Education Knowledge (SEEK) document can provide solutions to minimize these discrepancies; in turn, improve the quality of a software system and increase user satisfaction. In this paper, a software development framework, called SETh, is presented. The SETh framework consists of a set of visual models that support software engineering education and practices in a systematic manner. It also enables backward tracking/tracing and forward tracking/tracing capabilities - two important concepts that can facilitate the greenfield and evolutionary type software engineering projects. The SETh framework connects every step of the development of a software system tightly; hence, the learners and the experienced software engineers can study, understand, and build efficient software systems for emerging data science applications
Optimization: A Journal of Mathematical Programming and Operations Research
In this article we study support vector machine (SVM) classifiers in the face of uncertain knowledge sets and show how data uncertainty in knowledge sets can be treated in SVM classification by employing robust optimization. We present knowledge-based SVM classifiers with uncertain knowledge sets using convex quadratic optimization duality. We show that the knowledge-based SVM, where prior knowledge is in the form of uncertain linear constraints, results in an uncertain convex optimization problem with a set containment constraint. Using a new extension of Farkas' lemma, we reformulate the robust counterpart of the uncertain convex optimization problem in the case of interval uncertainty as a convex quadratic optimization problem. We then reformulate the resulting convex optimization problems as a simple quadratic optimization problem with non-negativity constraints using the Lagrange duality. We obtain the solution of the converted problem by a fixed point iterative algorithm and establish the convergence of the algorithm. We finally present some preliminary results of our computational experiments of the metho
Logistic Map-Based Fragile Watermarking for Pixel Level Tamper Detection and Resistance
An efficient fragile image watermarking technique for pixel level tamper detection and resistance is proposed. It uses five most
significant bits of the pixels to generate watermark bits and embeds them in the three least significant bits. The proposed technique
uses a logistic map and takes advantage of its sensitivity property to a small change in the initial condition. At the same time,
it incorporates the confusion/diffusion and hashing techniques used in many cryptographic systems to resist tampering at pixel
level as well as at block level. This paper also presents two new approaches called nonaggressive and aggressive tamper detection
algorithms. Simulations show that the proposed technique can provide more than 99.39% tamper detection capability with less
than 2.31% false-positive detection and less than 0.61% false-negative detection responses
No-reference visually significant blocking artifact metric for natural scene images
Quantifying visually annoying blocking artifacts is essential for image and video quality assessment. This paper presents a no-reference technique that uses the multi neural channels aspect of human visual system (HVS) to quantify visual impairment by altering the outputs of these sensory channels independently using statistical “standard score” formula in the Fourier domain. It also uses the bit patterns of the least significant bits
(LSB) to extract blocking artifacts. Simulation results show that the blocking artifact extracted using this approach follows subjective visual interpretation of blocking artifacts. This paper also presents a visually significant blocking artifact metric (VSBAM) along with some experimental results
Characterization of Differentially Private Logistic Regression
The purpose of this paper is to present an approach that can help data owners select suitable values for the privacy parameter of a differentially private logistic regression (DPLR), whose main intention is to achieve a balance between privacy strength and classification accuracy. The proposed approach implements a supervised learning technique and a feature extraction technique to address this challenging problem and generate solutions. The supervised learning technique selects subspaces from a training data set and generates DPLR classifiers for a range of values of the privacy parameter. The feature extraction technique transforms an original subspace to a differentially private subspace by querying the original subspace multiple times using the DPLR model and the privacy parameter values that were selected by the supervised learning module. The proposed approach then employs a signal processing technique called signal-interference-ratio as a measure to quantify the privacy level of the differentially private subspaces; hence, allows data owner learn the privacy level that the DPLR models can provide for a given subspace and a given classification accuracy
Modeling of class imbalance using an empirical approach with spambase dataset and random forest classification
Classification of imbalanced data is an important research problem as most of the data encountered in real world systems is imbalanced. Recently a representation learning technique called Synthetic Minority Over-sampling Technique (SMOTE) has been proposed to handle imbalanced data problem. Random Forest (RF) algorithm with SMOTE has been previously used to improve classification performance in minority class over majority class. Although RF with SMOTE demonstrates improved classification performance, the relationship between the classification performance and the imbalanced ratio between the majority and minority classes is not well defined. Therefore mathematical models that describe this relationship is useful especially in the big data environment which suffers from imbalanced data. In this paper, we proposed a mathematical model using an empirical approach applied to the well known Spambase dataset and Random Forest classification approach including its adoption with SMOTE representation learning technique. We have presented a linear model which describes the relationship between true positive classification rate and the imbalanced ratio between the majority and minority classes. This model can help IT researchers to develop better spam filter algorithms