4,871 research outputs found
Parsing Argumentation Structures in Persuasive Essays
In this article, we present a novel approach for parsing argumentation
structures. We identify argument components using sequence labeling at the
token level and apply a new joint model for detecting argumentation structures.
The proposed model globally optimizes argument component types and
argumentative relations using integer linear programming. We show that our
model considerably improves the performance of base classifiers and
significantly outperforms challenging heuristic baselines. Moreover, we
introduce a novel corpus of persuasive essays annotated with argumentation
structures. We show that our annotation scheme and annotation guidelines
successfully guide human annotators to substantial agreement. This corpus and
the annotation guidelines are freely available for ensuring reproducibility and
to encourage future research in computational argumentation.Comment: Under review in Computational Linguistics. First submission: 26
October 2015. Revised submission: 15 July 201
A survey on utilization of data mining approaches for dermatological (skin) diseases prediction
Due to recent technology advances, large volumes of medical data is obtained. These data contain valuable information. Therefore data mining techniques can be used to extract useful patterns. This paper is intended to introduce data mining and its various techniques and a survey of the available literature on medical data mining. We emphasize mainly on the application of data mining on skin diseases. A categorization has been provided based on the different data mining techniques. The utility of the various data mining methodologies is highlighted. Generally association mining is suitable for extracting rules. It has been used especially in cancer diagnosis. Classification is a robust method in medical mining. In this paper, we have summarized the different uses of classification in dermatology. It is one of the most important methods for diagnosis of erythemato-squamous diseases. There are different methods like Neural Networks, Genetic Algorithms and fuzzy classifiaction in this topic. Clustering is a useful method in medical images mining. The purpose of clustering techniques is to find a structure for the given data by finding similarities between data according to data characteristics. Clustering has some applications in dermatology. Besides introducing different mining methods, we have investigated some challenges which exist in mining skin data
Fame for sale: efficient detection of fake Twitter followers
are those Twitter accounts specifically created to
inflate the number of followers of a target account. Fake followers are
dangerous for the social platform and beyond, since they may alter concepts
like popularity and influence in the Twittersphere - hence impacting on
economy, politics, and society. In this paper, we contribute along different
dimensions. First, we review some of the most relevant existing features and
rules (proposed by Academia and Media) for anomalous Twitter accounts
detection. Second, we create a baseline dataset of verified human and fake
follower accounts. Such baseline dataset is publicly available to the
scientific community. Then, we exploit the baseline dataset to train a set of
machine-learning classifiers built over the reviewed rules and features. Our
results show that most of the rules proposed by Media provide unsatisfactory
performance in revealing fake followers, while features proposed in the past by
Academia for spam detection provide good results. Building on the most
promising features, we revise the classifiers both in terms of reduction of
overfitting and cost for gathering the data needed to compute the features. The
final result is a novel classifier, general enough to thwart
overfitting, lightweight thanks to the usage of the less costly features, and
still able to correctly classify more than 95% of the accounts of the original
training set. We ultimately perform an information fusion-based sensitivity
analysis, to assess the global sensitivity of each of the features employed by
the classifier. The findings reported in this paper, other than being supported
by a thorough experimental methodology and interesting on their own, also pave
the way for further investigation on the novel issue of fake Twitter followers
Comparative between optimization feature selection by using classifiers algorithms on spam email
Spam mail has become a rising phenomenon in a world that has recently witnessed high growth in the volume of emails. This indicates the need to develop an effective spam filter. At the present time, Classification algorithms for text mining are used for the classification of emails. This paper provides a description and evaluation of the effectiveness of three popular classifiers using optimization feature selections, such as Genetic algorithm, Harmony search, practical swarm optimization, and simulating annealing. The research focuses on a comparison of the effect of classifiers using K-nearest Neighbor (KNN), Naïve Bayesian (NB), and Support Vector Machine (SVM) on spam classifiers (without using feature selection) also enhances the reliability of feature selection by proposing optimization feature selection to reduce number of features that are not important
An academic review: applications of data mining techniques in finance industry
With the development of Internet techniques, data volumes are doubling every two years, faster than predicted by Moore’s Law. Big Data Analytics becomes particularly important for enterprise business. Modern computational technologies will provide effective tools to help understand hugely accumulated data and leverage this information to get insights into the finance industry. In order to get actionable insights into the business, data has become most valuable asset of financial organisations, as there are no physical products in finance industry to manufacture. This is where data mining techniques come to their rescue by allowing access to the right information at the right time. These techniques are used by the finance industry in various areas such as fraud detection, intelligent forecasting, credit rating, loan management, customer profiling, money laundering, marketing and prediction of price movements to name a few. This work aims to survey the research on data mining techniques applied to the finance industry from 2010 to 2015.The review finds that Stock prediction and Credit rating have received most attention of researchers, compared to Loan prediction, Money Laundering and Time Series prediction. Due to the dynamics, uncertainty and variety of data, nonlinear mapping techniques have been deeply studied than linear techniques. Also it has been proved that hybrid methods are more accurate in prediction, closely followed by Neural Network technique. This survey could provide a clue of applications of data mining techniques for finance industry, and a summary of methodologies for researchers in this area. Especially, it could provide a good vision of Data Mining Techniques in computational finance for beginners who want to work in the field of computational finance
Information retrieval using machine learning for database curation
Tese de mestrado, Bioinformática e Biologia Computacional, Universidade de Lisboa, Faculdade de Ciências, 2019Em 2016, a Agência Internacional de Pesquisa sobre o Cancro da Organização Mundial de Saúde lançou a primeira base de dados de biomarcadores de exposição, chamada Exposome-Explorer. Para construir a base de dados, mais de 8500 citações foram manualmente analisadas, mas apenas 480 foram consideradas relevantes e usadas para extrair informação para integrar a base de dados. Curar manualmente uma base de dados é uma tarefa demorada e que requer especialistas capazes de recolher e analisar dados que se encontram espalhados por milhões de artigos. Esta tese propõe o uso de técnicas de Recuperação de Informação com uma abordagem de aprendizagem supervisionada para classificar automaticamente artigos como relevantes ou irrelevantes para auxiliar o processo de criação e atualização da Exposome-Explorer. Esta abordagem restringe a literatura a um conjunto de publicações relevantes sobre biomarcadores de exposição de uma maneira eficiente, reduzindo o tempo e esforço necessários para identificar documentos relevantes. Além disso, as queries originais usadas pelos curadores para pesquisar sobre literatura de biomarcadores de exposição foram melhoradas para incluir alguns artigos relevantes que anteriormente não estavam a ser encontrados. Os dados manualmente recolhidos d a Exposome-Explorer, foram usados para treinar e testar os modelos de aprendizagem automática (classificadores). Vários parâmetros e seis algoritmos diferentes foram avaliados para averiguar quais previam melhor a relevância de um artigo com base no tÃtulo, resumo ou metadados. O melhor classificador foi construÃdo com o algoritmo SVM e treinado com os resumos dos artigos, obtendo um recall de 85.8%.Este classificador reduz o número de citações sobre biomarcadores dietéticos a serem manualmente analisadas pelos curadores em quase 88%,classificando apenas incorrectamente 14.2% dos artigos relevantes.Esta metodologia também pode ser aplicada a outros dados de biomarcadores ou ser adaptada para auxiliar o processo de criação manual de outras bases de dados quÃmicas ou de doenças
- …