Search CORE

19,020 research outputs found

Scientific Computing Meets Big Data Technology: An Astronomy Use Case

Author: Barbary Kyle
Franklin Michael J.
Nothaft Frank Austin
Patterson David A.
Perlmutter Saul
Sparks Evan
Zahn Oliver
Zhang Zhao
Publication venue
Publication date: 22/12/2015
Field of study

Scientific analyses commonly compose multiple single-process programs into a dataflow. An end-to-end dataflow of single-process programs is known as a many-task application. Typically, tools from the HPC software stack are used to parallelize these analyses. In this work, we investigate an alternate approach that uses Apache Spark -- a modern big data platform -- to parallelize many-task applications. We present Kira, a flexible and distributed astronomy image processing toolkit using Apache Spark. We then use the Kira toolkit to implement a Source Extractor application for astronomy images, called Kira SE. With Kira SE as the use case, we study the programming flexibility, dataflow richness, scheduling capacity and performance of Apache Spark running on the EC2 cloud. By exploiting data locality, Kira SE achieves a 2.5x speedup over an equivalent C program when analyzing a 1TB dataset using 512 cores on the Amazon EC2 cloud. Furthermore, we show that by leveraging software originally designed for big data infrastructure, Kira SE achieves competitive performance to the C implementation running on the NERSC Edison supercomputer. Our experience with Kira indicates that emerging Big Data platforms such as Apache Spark are a performant alternative for many-task scientific applications

arXiv.org e-Print Archive

Crossref

eScholarship - University of California

Simplifying Deep-Learning-Based Model for Code Search

Author: Hassan Ahmed E.
Li Shanping
Liu Chao
Liu Zhiwei
Lo David
Xia Xin
Publication venue
Publication date: 28/05/2020
Field of study

To accelerate software development, developers frequently search and reuse existing code snippets from a large-scale codebase, e.g., GitHub. Over the years, researchers proposed many information retrieval (IR) based models for code search, which match keywords in query with code text. But they fail to connect the semantic gap between query and code. To conquer this challenge, Gu et al. proposed a deep-learning-based model named DeepCS. It jointly embeds method code and natural language description into a shared vector space, where methods related to a natural language query are retrieved according to their vector similarities. However, DeepCS' working process is complicated and time-consuming. To overcome this issue, we proposed a simplified model CodeMatcher that leverages the IR technique but maintains many features in DeepCS. Generally, CodeMatcher combines query keywords with the original order, performs a fuzzy search on name and body strings of methods, and returned the best-matched methods with the longer sequence of used keywords. We verified its effectiveness on a large-scale codebase with about 41k repositories. Experimental results showed the simplified model CodeMatcher outperforms DeepCS by 97% in terms of MRR (a widely used accuracy measure for code search), and it is over 66 times faster than DeepCS. Besides, comparing with the state-of-the-art IR-based model CodeHow, CodeMatcher also improves the MRR by 73%. We also observed that: fusing the advantages of IR-based and deep-learning-based models is promising because they compensate with each other by nature; improving the quality of method naming helps code search, since method name plays an important role in connecting query and code

arXiv.org e-Print Archive

Institutional Knowledge at Singapore Management University

Recommended from our members

Developing Australian Academics' Capacity: Supporting the Adoption of Open Educational Practices in Curriculum Design

Author: Bossu Carina
Brown Natalie
Fountain Wendy
Smyth Robyn
Publication venue: Office for Learning and Teaching, Department of Education and Training, Australian Government
Publication date: 01/01/2016
Field of study

This seed project initiative addressed an identified gap in Australian higher education between awareness of open educational practices (OEP) and implementation of OEP, particularly the production, adaptation and use of open educational resources (OER) to support the design of innovative, engaging and agile curriculum. In response, the authors aimed to design, develop, pilot and evaluate a free, open and online professional development course focused on supporting curriculum design in higher education. The specific aim of the course - Curriculum design for open education (CD4OE) - is to develop the capacity of academics in Australia to adopt and incorporate OER and OEP into curriculum development, for more effective and efficient learning and teaching across the sector

Open Research Online (The Open University)

Active Transfer Learning with Zero-Shot Priors: Reusing Past Datasets for Future Tasks

Author: Gavves Efstratios
Mensink Thomas
Snoek Cees G. M.
Tommasi Tatiana
Tuytelaars Tinne
Publication venue
Publication date: 01/01/2015
Field of study

How can we reuse existing knowledge, in the form of available datasets, when solving a new and apparently unrelated target task from a set of unlabeled data? In this work we make a first contribution to answer this question in the context of image classification. We frame this quest as an active learning problem and use zero-shot classifiers to guide the learning process by linking the new task to the existing classifiers. By revisiting the dual formulation of adaptive SVM, we reveal two basic conditions to choose greedily only the most relevant samples to be annotated. On this basis we propose an effective active learning algorithm which learns the best possible target classification model with minimum human labeling effort. Extensive experiments on two challenging datasets show the value of our approach compared to the state-of-the-art active learning methodologies, as well as its potential to reuse past datasets with minimal effort for future tasks

arXiv.org e-Print Archive

Crossref

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

Archivio della ricerca- Università di Roma La Sapienza

International Migration, Integration and Social Cohesion online publications

UvA-DARE

Semantic annotation, publication, and discovery of Java software components: an integrated approach

Author: Dranidis Dimitris
Kourtesis Dimitrios
Zygkostiotis Zinon
Publication venue: CEUR-WS.org
Publication date: 01/04/2009
Field of study

Component-based software development has matured into standard practice in software engineering. Among the advantages of reusing software modules are lower costs, faster development, more manageable code, increased productivity, and improved software quality. As the number of available software components has grown, so has the need for effective component search and retrieval. Traditional search approaches, such as keyword matching, have proved ineffective when applied to software components. Applying a semantically- enhanced approach to component classification, publication, and discovery can greatly increase the efficiency of searching and retrieving software components. This has been already applied in the context of Web technologies, and Web services in particular, in the frame of Semantic Web Services research. This paper examines the similarities between software components and Web services and adapts an existing Semantic Web Service publication and discovery solution into a software component annotation and discovery tool which is implemented as an Eclipse plug-in

White Rose Research Online