Search CORE

51,616 research outputs found

From Frequency to Meaning: Vector Space Models of Semantics

Author: Pantel Patrick
Turney Peter D.
Publication venue: 'AI Access Foundation'
Publication date: 01/01/2010
Field of study

Computers understand very little of the meaning of human language. This profoundly limits our ability to give instructions to computers, the ability of computers to explain their actions to us, and the ability of computers to analyse and process text. Vector space models (VSMs) of semantics are beginning to address these limits. This paper surveys the use of VSMs for semantic processing of text. We organize the literature on VSMs according to the structure of the matrix in a VSM. There are currently three broad classes of VSMs, based on term-document, word-context, and pair-pattern matrices, yielding three classes of applications. We survey a broad range of applications in these three categories and we take a detailed look at a specific open source project in each category. Our goal in this survey is to show the breadth of applications of VSMs for semantics, to provide a new perspective on VSMs for those who are already familiar with the area, and to provide pointers into the literature for those who are less familiar with the field

arXiv.org e-Print Archive

CiteSeerX

NRC Publications Archive

Crossref

Algorithm Diversity for Resilient Systems

Author: A Avizienis
DS Hirschberg
FB Cohen
G Ricart
L Lamport
L Lamport
R Paige
YA Liu
YA Liu
YA Liu
YA Liu
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 28/04/2019
Field of study

Diversity can significantly increase the resilience of systems, by reducing the prevalence of shared vulnerabilities and making vulnerabilities harder to exploit. Work on software diversity for security typically creates variants of a program using low-level code transformations. This paper is the first to study algorithm diversity for resilience. We first describe how a method based on high-level invariants and systematic incrementalization can be used to create algorithm variants. Executing multiple variants in parallel and comparing their outputs provides greater resilience than executing one variant. To prevent different parallel schedules from causing variants' behaviors to diverge, we present a synchronized execution algorithm for DistAlgo, an extension of Python for high-level, precise, executable specifications of distributed algorithms. We propose static and dynamic metrics for measuring diversity. An experimental evaluation of algorithm diversity combined with implementation-level diversity for several sequential algorithms and distributed algorithms shows the benefits of algorithm diversity

arXiv.org e-Print Archive

Crossref

An Extended Stable Marriage Problem Algorithm for Clone Detection

Author: AlHakami Hosam
Chen Feng
Janicke Helge
Publication venue: 'Academy and Industry Research Collaboration Center (AIRCC)'
Publication date: 01/01/2014
Field of study

Code cloning negatively affects industrial software and threatens intellectual property. This paper presents a novel approach to detecting cloned software by using a bijective matching technique. The proposed approach focuses on increasing the range of similarity measures and thus enhancing the precision of the detection. This is achieved by extending a well-known stable-marriage problem (SMP) and demonstrating how matches between code fragments of different files can be expressed. A prototype of the proposed approach is provided using a proper scenario, which shows a noticeable improvement in several features of clone detection such as scalability and accuracy.Comment: 20 pages, 10 figures, 6 table

arXiv.org e-Print Archive

CiteSeerX

Exploring the characteristics of issue-related behaviors in GitHub using visualization techniques

Author: Chen Zhijie
Fan Xiaoping
He Dayu
Liao Zhifang
Liu Shengzong
Zhang Yan
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2018
Field of study

Crossref

ResearchOnline@GCU

Normalized Web Distance and Word Similarity

Author: Cilibrasi Rudi L.
Vitanyi Paul M. B.
Publication venue
Publication date: 01/01/2009
Field of study

There is a great deal of work in cognitive psychology, linguistics, and computer science, about using word (or phrase) frequencies in context in text corpora to develop measures for word similarity or word association, going back to at least the 1960s. The goal of this chapter is to introduce the normalizedis a general way to tap the amorphous low-grade knowledge available for free on the Internet, typed in by local users aiming at personal gratification of diverse objectives, and yet globally achieving what is effectively the largest semantic electronic database in the world. Moreover, this database is available for all by using any search engine that can return aggregate page-count estimates for a large range of search-queries. In the paper introducing the NWD it was called `normalized Google distance (NGD),' but since Google doesn't allow computer searches anymore, we opt for the more neutral and descriptive NWD. web distance (NWD) method to determine similarity between words and phrases. ItComment: Latex, 20 pages, 7 figures, to appear in: Handbook of Natural Language Processing, Second Edition, Nitin Indurkhya and Fred J. Damerau Eds., CRC Press, Taylor and Francis Group, Boca Raton, FL, 2010, ISBN 978-142008592

arXiv.org e-Print Archive

CiteSeerX

CWI's Institutional Repository

International Migration, Integration and Social Cohesion online publications

Empirical research on the evaluation model and method of sustainability of the open source ecosystem

Author: Deng Libing
Fan Xiaoping
Liao Zhifang
Liu Hui
Qi Xiaofei
Zhang Yan
Zhou Yun
Publication venue: 'MDPI AG'
Publication date: 01/12/2018
Field of study

The development of open source brings new thinking and production modes to software engineering and computer science, and establishes a software development method and ecological environment in which groups participate. Regardless of investors, developers, participants, and managers, they are most concerned about whether the Open Source Ecosystem can be sustainable to ensure that the ecosystem they choose will serve users for a long time. Moreover, the most important quality of the software ecosystem is sustainability, and it is also a research area in Symmetry. Therefore, it is significant to assess the sustainability of the Open Source Ecosystem. However, the current measurement of the sustainability of the Open Source Ecosystem lacks universal measurement indicators, as well as a method and a model. Therefore, this paper constructs an Evaluation Indicators System, which consists of three levels: The target level, the guideline level and the evaluation level, and takes openness, stability, activity, and extensibility as measurement indicators. On this basis, a weight calculation method, based on information contribution values and a Sustainability Assessment Model, is proposed. The models and methods are used to analyze the factors affecting the sustainability of Stack Overflow (SO) ecosystem. Through the analysis, we find that every indicator in the SO ecosystem is partaking in different development trends. The development trend of a single indicator does not represent the sustainable development trend of the whole ecosystem. It is necessary to consider all of the indicators to judge that ecosystem’s sustainability. The research on the sustainability of the Open Source Ecosystem is helpful for judging software health, measuring development efficiency and adjusting organizational structure. It also provides a reference for researchers who study the sustainability of software engineering

Directory of Open Access Journals

ResearchOnline@GCU

Missouri State University: BearWorks

Animating the development of Social Networks over time using a dynamic extension of multidimensional scaling

Author: De Nooy Wouter
Leydesdorff Loet
Schank Thomas
Scharnhorst Andrea
Publication venue
Publication date: 01/01/2008
Field of study

The animation of network visualizations poses technical and theoretical challenges. Rather stable patterns are required before the mental map enables a user to make inferences over time. In order to enhance stability, we developed an extension of stress-minimization with developments over time. This dynamic layouter is no longer based on linear interpolation between independent static visualizations, but change over time is used as a parameter in the optimization. Because of our focus on structural change versus stability the attention is shifted from the relational graph to the latent eigenvectors of matrices. The approach is illustrated with animations for the journal citation environments of Social Networks, the (co-)author networks in the carrying community of this journal, and the topical development using relations among its title words. Our results are also compared with animations based on PajekToSVGAnim and SoNIA

arXiv.org e-Print Archive

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Scipedia

International Migration, Integration and Social Cohesion online publications

UvA-DARE