14 research outputs found

    A New Data Representation Based on Training Data Characteristics to Extract Drug Name Entity in Medical Text

    Get PDF
    One essential task in information extraction from the medical corpus is drug name recognition. Compared with text sources come from other domains, the medical text mining poses more challenges, for example, more unstructured text, the fast growing of new terms addition, a wide range of name variation for the same drug, the lack of labeled dataset sources and external knowledge, and the multiple token representations for a single drug name. Although many approaches have been proposed to overwhelm the task, some problems remained with poor F-score performance (less than 0.75). This paper presents a new treatment in data representation techniques to overcome some of those challenges. We propose three data representation techniques based on the characteristics of word distribution and word similarities as a result of word embedding training. The first technique is evaluated with the standard NN model, that is, MLP. The second technique involves two deep network classifiers, that is, DBN and SAE. The third technique represents the sentence as a sequence that is evaluated with a recurrent NN model, that is, LSTM. In extracting the drug name entities, the third technique gives the best F-score performance compared to the state of the art, with its average F-score being 0.8645

    Data analytics 2016: proceedings of the fifth international conference on data analytics

    Get PDF

    Innovations and Social Media Analytics in a Digital Society

    Get PDF
    info:eu-repo/semantics/publishedVersio

    Innovations and Social Media Analytics in a Digital Society

    Get PDF
    Recent advances in digitization are transforming healthcare, education, tourism, information technology, and some other sectors. Social media analytics are tools that can be used to measure innovation and the relation of the companies with the citizens. This book comprises state-ofthe-art social media analytics, and advanced innovation policies in the digitization of society. The number of applications that can be used to create and analyze social media analytics generates large amounts of data called big data, including measures of the use of the technologies to develop or to use new services to improve the quality of life of the citizens. Digitization has applications in fields from remote monitoring to smart sensors and other devices. Integration generates data that need to be analyzed and visualized in an easy and clear way, that will be some of the proposals of the researchers present in this book. This volume offers valuable insights to researchers on how to design innovative digital analytics systems and how to improve information delivery remotely.info:eu-repo/semantics/publishedVersio

    Anaphora resolution for Arabic machine translation :a case study of nafs

    Get PDF
    PhD ThesisIn the age of the internet, email, and social media there is an increasing need for processing online information, for example, to support education and business. This has led to the rapid development of natural language processing technologies such as computational linguistics, information retrieval, and data mining. As a branch of computational linguistics, anaphora resolution has attracted much interest. This is reflected in the large number of papers on the topic published in journals such as Computational Linguistics. Mitkov (2002) and Ji et al. (2005) have argued that the overall quality of anaphora resolution systems remains low, despite practical advances in the area, and that major challenges include dealing with real-world knowledge and accurate parsing. This thesis investigates the following research question: can an algorithm be found for the resolution of the anaphor nafs in Arabic text which is accurate to at least 90%, scales linearly with text size, and requires a minimum of knowledge resources? A resolution algorithm intended to satisfy these criteria is proposed. Testing on a corpus of contemporary Arabic shows that it does indeed satisfy the criteria.Egyptian Government

    Cartoons as interdiscourse : a quali-quantitative analysis of social representations based on collective imagination in cartoons produced after the Charlie Hebdo attack

    No full text
    The attacks against Charlie Hebdo in Paris at the beginning of the year 2015 urged many cartoonists – most professionals but some laymen as well – to create cartoons as a reaction to this tragedy. The main goal of this article is to show how traumatic events like this one can converge in a rather limited set of metaphors, ranging from easily recognizable topoi to rather vague interdiscourses that circulate in contemporary societies. To do so, we analyzed 450 cartoons that were produced as a reaction to the Charlie Hebdo attacks, and took a quali-quantitative approach that draws both on discourse analysis and semiotics. In this paper, we identified eight main themes and we analyzed the five ones which are anchored in collective imagination (the pen against the sword, the journalist as a modern hero, etc.). Then, we studied the cartoons at figurative, narrative and thematic levels thanks to Greimas’ model of the semiotic square. This paper shows the ways in which these cartoons build upon a memory-based network of events from the recent past (particularly 9/11), and more generally on a collective imagination which can be linked to Western values.SCOPUS: ar.jinfo:eu-repo/semantics/publishe

    An integrated clustering analysis framework for heterogeneous data

    Get PDF
    Big data is a growing area of research with some important research challenges that motivate our work. We focus on one such challenge, the variety aspect. First, we introduce our problem by defining heterogeneous data as data about objects that are described by different data types, e.g., structured data, text, time-series, images, etc. Through our work we use five datasets for experimentation: a real dataset of prostate cancer data and four synthetic dataset that we have created and made them publicly available. Each dataset covers different combinations of data types that are used to describe objects. Our strategy for clustering is based on fusion approaches. We compare intermediate and late fusion schemes. We propose an intermediary fusion approach, Similarity Matrix Fusion (SMF), where the integration process takes place at the level of calculating similarities. SMF produces a single distance fusion matrix and two uncertainty expression matrices. We then propose a clustering algorithm, Hk-medoids, a modified version of the standard k-medoids algorithm that utilises uncertainty calculations to improve on the clustering performance. We evaluate our results by comparing them to clustering produced using individual elements and show that the fusion approach produces equal or significantly better results. Also, we show that there are advantages in utilising the uncertainty information as Hkmedoids does. In addition, from a theoretical point of view, our proposed Hk-medoids algorithm has less computation complexity than the popular PAM implementation of the k-medoids algorithm. Then, we employed late fusion that aggregates the results of clustering by individual elements by combining cluster labels using an object co-occurrence matrix technique. The final cluster is then derived by a hierarchical clustering algorithm. We show that intermediate fusion for clustering of heterogeneous data is a feasible and efficient approach using our proposed Hk-medoids algorithm

    Proceedings of the ECMLPKDD 2015 Doctoral Consortium

    Get PDF
    ECMLPKDD 2015 Doctoral Consortium was organized for the second time as part of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECMLPKDD), organised in Porto during September 7-11, 2015. The objective of the doctoral consortium is to provide an environment for students to exchange their ideas and experiences with peers in an interactive atmosphere and to get constructive feedback from senior researchers in machine learning, data mining, and related areas. These proceedings collect together and document all the contributions of the ECMLPKDD 2015 Doctoral Consortium

    Evaluation of cloud computing modelling tools: simulators and predictive models

    Get PDF
    Experimenting with novel algorithms and configurations for the automatic management of Cloud Computing infrastructures is expensive and time consuming on real systems. Cloud computing delivers the benefits of using virtualisation techniques to data centers instead of physical servers for customers. However, it is still complex for researchers to test and run their experiments on data center due to the cost for repeating the experiments. To address this, various tools are available to enable simulators, emulators, mathematical models, statistical models and benchmarking. Despite this, there are different methods used by researchers to avoid the difficulty of conducting Cloud Computing research on actual large data centre infrastructure. However, it is still difficult to chose the best tool to evaluate the proposed research. This research focuses on investigating the level of accuracy of existing known simulators in the field of cloud computing. Simulation tools are generally developed for particular experiments, so there is little assurance that using them with different workloads will be reliable. Moreover, a predictive model based on a data set from a realistic data center is delivered as an alternative model of simulators as there is a lack of their sufficient accuracy. So, this work addresses the problem of investigating the accuracy of different modelling tools by developing and validating a procedure based on the performance of a target micro data centre. Key insights and contributions are: Involving three alternative models for Cloud Computing real infrastructure showing the level of accuracy of selected simulation tools. Developing and validating a predictive model based on a Raspberry Pi small scale data centre. The use of predictive model based on Linear Regression and Artificial Neural Net- works models based on training data set drawn from a Raspberry Pi Cloud infrastructure provides better accuracy
    corecore