1,985 research outputs found

    From Lexical to Semantic Features in Paraphrase Identification

    Get PDF
    The task of paraphrase identification has been applied to diverse scenarios in Natural Language Processing, such as Machine Translation, summarization, or plagiarism detection. In this paper we present a comparative study on the performance of lexical, syntactic and semantic features in the task of paraphrase identification in the Microsoft Research Paraphrase Corpus. In our experiments, semantic features do not represent a gain in results, and syntactic features lead to the best results, but only if combined with lexical features

    A hybrid approach for arabic semantic relation extraction

    Get PDF
    Information retrieval applications are essential tools to manage the huge amount of information in the Web. Ontologies have great importance in these applications. The idea here is that several data belonging to a domain of interest are represented and related semantically in the ontology, which can help to navigate, manage and reuse these data. Despite of the growing need of ontology, only few works were interested in Arabic language. Indeed, arabic texts are highly ambiguous, especially when diacritics are absent. Besides, existent works does not cover all the types of se-mantic relations, which are useful to structure Arabic ontol-ogies. A lot of work has been done on cooccurrence- based techniques, which lead to over-generation. In this paper, we propose a new approach for Arabic se-mantic relation extraction. We use vocalized texts to reduce ambiguities and propose a new distributional approach for similarity calculus, which is compared to cooccurrence. We discuss our contribution through experimental results and propose some perspectives for future research

    Sentiment Lexicon Adaptation with Context and Semantics for the Social Web

    Get PDF
    Sentiment analysis over social streams offers governments and organisations a fast and effective way to monitor the publics' feelings towards policies, brands, business, etc. General purpose sentiment lexicons have been used to compute sentiment from social streams, since they are simple and effective. They calculate the overall sentiment of texts by using a general collection of words, with predetermined sentiment orientation and strength. However, words' sentiment often vary with the contexts in which they appear, and new words might be encountered that are not covered by the lexicon, particularly in social media environments where content emerges and changes rapidly and constantly. In this paper, we propose a lexicon adaptation approach that uses contextual as well as semantic information extracted from DBPedia to update the words' weighted sentiment orientations and to add new words to the lexicon. We evaluate our approach on three different Twitter datasets, and show that enriching the lexicon with contextual and semantic information improves sentiment computation by 3.4% in average accuracy, and by 2.8% in average F1 measure

    A Novel ILP Framework for Summarizing Content with High Lexical Variety

    Full text link
    Summarizing content contributed by individuals can be challenging, because people make different lexical choices even when describing the same events. However, there remains a significant need to summarize such content. Examples include the student responses to post-class reflective questions, product reviews, and news articles published by different news agencies related to the same events. High lexical diversity of these documents hinders the system's ability to effectively identify salient content and reduce summary redundancy. In this paper, we overcome this issue by introducing an integer linear programming-based summarization framework. It incorporates a low-rank approximation to the sentence-word co-occurrence matrix to intrinsically group semantically-similar lexical items. We conduct extensive experiments on datasets of student responses, product reviews, and news documents. Our approach compares favorably to a number of extractive baselines as well as a neural abstractive summarization system. The paper finally sheds light on when and why the proposed framework is effective at summarizing content with high lexical variety.Comment: Accepted for publication in the journal of Natural Language Engineering, 201

    Feature Selection of Network Intrusion Data using Genetic Algorithm and Particle Swarm Optimization

    Get PDF
    This paper describes the advantages of using Evolutionary Algorithms (EA) for feature selection on network intrusion dataset. Most current Network Intrusion Detection Systems (NIDS) are unable to detect intrusions in real time because of high dimensional data produced during daily operation. Extracting knowledge from huge data such as intrusion data requires new approach. The more complex the datasets, the higher computation time and the harder they are to be interpreted and analyzed. This paper investigates the performance of feature selection algoritms in network intrusiona data. We used Genetic Algorithms (GA) and Particle Swarm Optimizations (PSO) as feature selection algorithms. When applied to network intrusion datasets, both GA and PSO have significantly reduces the number of features. Our experiments show that GA successfully reduces the number of attributes from 41 to 15 while PSO reduces the number of attributes from 41 to 9. Using k Nearest Neighbour (k-NN) as a classifier,the GA-reduced dataset which consists of 37% of original attributes, has accuracy improvement from 99.28% to 99.70% and its execution time is also 4.8 faster than the execution time of original dataset. Using the same classifier, PSO-reduced dataset which consists of 22% of original attributes, has the fastest execution time (7.2 times faster than the execution time of original datasets). However, its accuracy is slightly reduced 0.02% from 99.28% to 99.26%. Overall, both GA and PSO are good solution as feature selection techniques because theyhave shown very good performance in reducing the number of features significantly while still maintaining and sometimes improving the classification accuracy as well as reducing the computation time

    Impact of a Non-Traditional Research Approach

    Get PDF
    abstract: Construction Management research has not been successful in changing the practices of the construction industry. The method of receiving grants and the peer review paper system that academics rely on to achieve promotion, does not align to academic researchers becoming experts who can bring change to industry practices. Poor construction industry performance has been documented for the past 25 years in the international construction management field. However, after 25 years of billions of dollars of research investment, the solution remains elusive. Research has shown that very few researchers have a hypothesis, run cycles of research tests in the industry, and result in changing industry practices. The most impactful research identified in this thesis, has led to conclusions that pre-planning is critical, hiring contractors who have expertise will result in better performance, and risk is mitigated when the supply chain partners work together and expertise is utilized at the beginning of projects. The problems with construction non-performance have persisted. Legal contract issues have become more important. Traditional research approaches have not identified the severity and the source of construction non-performance. The problem seems to be as complex as ever. The construction industry practices and the academic research community remain in silos. This research proposes that the problem may be in the traditional construction management research structure and methodology. The research has identified a unique non-traditional research program that has documented over 1700 industry tests, which has resulted in a decrease in client management by up to 79%, contractors adding value by up to 38%, increased customer satisfaction by up to 140%, reduced change order rates as low as -0.6%, and decreased cost of services by up to 31%. The purpose of this thesis is to document the performance of the non-traditional research program around the above identified results. The documentation of such an effort will shed more light on what is required for a sustainable, industry impacting, and academic expert based research program.Dissertation/ThesisMasters Thesis Construction 201

    Data analytics 2016: proceedings of the fifth international conference on data analytics

    Get PDF
    • …
    corecore