12,608 research outputs found
A gentle transition from Java programming to Web Services using XML-RPC
Exposing students to leading edge vocational areas of relevance such as Web Services can be difficult. We show a lightweight approach by embedding a key component of Web Services within a Level 3 BSc module in Distributed Computing. We present a ready to use collection of lecture slides and student activities based on XML-RPC. In
addition we show that this material addresses the central topics in the context of web services as identified by Draganova (2003)
Customer churn prediction in telecom using machine learning and social network analysis in big data platform
Customer churn is a major problem and one of the most important concerns for
large companies. Due to the direct effect on the revenues of the companies,
especially in the telecom field, companies are seeking to develop means to
predict potential customer to churn. Therefore, finding factors that increase
customer churn is important to take necessary actions to reduce this churn. The
main contribution of our work is to develop a churn prediction model which
assists telecom operators to predict customers who are most likely subject to
churn. The model developed in this work uses machine learning techniques on big
data platform and builds a new way of features' engineering and selection. In
order to measure the performance of the model, the Area Under Curve (AUC)
standard measure is adopted, and the AUC value obtained is 93.3%. Another main
contribution is to use customer social network in the prediction model by
extracting Social Network Analysis (SNA) features. The use of SNA enhanced the
performance of the model from 84 to 93.3% against AUC standard. The model was
prepared and tested through Spark environment by working on a large dataset
created by transforming big raw data provided by SyriaTel telecom company. The
dataset contained all customers' information over 9 months, and was used to
train, test, and evaluate the system at SyriaTel. The model experimented four
algorithms: Decision Tree, Random Forest, Gradient Boosted Machine Tree "GBM"
and Extreme Gradient Boosting "XGBOOST". However, the best results were
obtained by applying XGBOOST algorithm. This algorithm was used for
classification in this churn predictive model.Comment: 24 pages, 14 figures. PDF https://rdcu.be/budK
Exploiting Sentence Embedding for Medical Question Answering
Despite the great success of word embedding, sentence embedding remains a
not-well-solved problem. In this paper, we present a supervised learning
framework to exploit sentence embedding for the medical question answering
task. The learning framework consists of two main parts: 1) a sentence
embedding producing module, and 2) a scoring module. The former is developed
with contextual self-attention and multi-scale techniques to encode a sentence
into an embedding tensor. This module is shortly called Contextual
self-Attention Multi-scale Sentence Embedding (CAMSE). The latter employs two
scoring strategies: Semantic Matching Scoring (SMS) and Semantic Association
Scoring (SAS). SMS measures similarity while SAS captures association between
sentence pairs: a medical question concatenated with a candidate choice, and a
piece of corresponding supportive evidence. The proposed framework is examined
by two Medical Question Answering(MedicalQA) datasets which are collected from
real-world applications: medical exam and clinical diagnosis based on
electronic medical records (EMR). The comparison results show that our proposed
framework achieved significant improvements compared to competitive baseline
approaches. Additionally, a series of controlled experiments are also conducted
to illustrate that the multi-scale strategy and the contextual self-attention
layer play important roles for producing effective sentence embedding, and the
two kinds of scoring strategies are highly complementary to each other for
question answering problems.Comment: 8 page
Noise or music? Investigating the usefulness of normalisation for robust sentiment analysis on social media data
In the past decade, sentiment analysis research has thrived, especially on social media. While this data genre is suitable to extract opinions and sentiment, it is known to be noisy. Complex normalisation methods have been developed to transform noisy text into its standard form, but their effect on tasks like sentiment analysis remains underinvestigated. Sentiment analysis approaches mostly include spell checking or rule-based normalisation as preprocess- ing and rarely investigate its impact on the task performance. We present an optimised sentiment classifier and investigate to what extent its performance can be enhanced by integrating SMT-based normalisation as preprocessing. Experiments on a test set comprising a variety of user-generated content genres revealed that normalisation improves sentiment classification performance on tweets and blog posts, showing the model’s ability to generalise to other data genres
Cyberbullying Detection System with Multiple Server Configurations
Due to the proliferation of online networking, friendships and relationships - social communications have reached a whole new level. As a result of this scenario, there is an increasing evidence that social applications are frequently used for bullying. State-of-the-art studies in cyberbullying detection have mainly focused on the content of the conversations while largely ignoring the users involved in cyberbullying. To encounter this problem, we have designed a distributed cyberbullying detection system that will detect bullying messages and drop them before they are sent to the intended receiver. A prototype has been created using the principles of NLP, Machine Learning and Distributed Systems. Preliminary studies conducted with it, indicate a strong promise of our approach
- …