Search CORE

5 research outputs found

CSISE: cloud-based semantic image search engine

Author: Walunj Vijay
Publication venue
Publication date
Field of study

Title from PDF of title page, viewed on March 27, 2014Thesis advisor: Yugyung LeeVitaIncludes bibliographical references (pages 53-56)Thesis (M. S.)--School of Computing and Engineering. University of Missouri--Kansas City, 2013Due to rapid exponential growth in data, a couple of challenges we face today are how to handle big data and analyze large data sets. An IBM study showed the amount of data created in the last two years alone is 90% of the data in the world today. We have especially seen the exponential growth of images on the Web, e.g., more than 6 billion in Flickr, 1.5 billion in Google image engine, and more than 1 billon images in Instagram [1]. Since big data are not only a matter of a size, but are also heterogeneous types and sources of data, image searching with big data may not be scalable in practical settings. We envision Cloud computing as a new way to transform the big data challenge into a great opportunity. In this thesis, we intend to perform an efficient and accurate classification of a large collection of images using Cloud computing, which in turn supports semantic image searching. A novel approach with enhanced accuracy has been proposed to utilize semantic technology to classify images by analyzing both metadata and image data types. A two-level classification model was designed (i) semantic classification was performed on a metadata of images using TF-IDF, and (ii) image classification was performed using a hybrid image processing model combined with Euclidean distance and SURF FLANN measurements. A Cloud-based Semantic Image Search Engine (CSISE) is also developed to search an image using the proposed semantic model with the dynamic image repository by connecting online image search engines that include Google Image Search, Flickr, and Picasa. A series of experiments have been performed in a large-scale Hadoop environment using IBM's cloud on over half a million logo images of 76 types. The experimental results show that the performance of the CSISE engine (based on the proposed method) is comparable to the popular online image search engines as well as accurate with a higher rate (average precision of 71%) than existing approachesAbstract -- Contents -- Illustrations -- Tables -- Acknowledgements - Introduction -- Related work -- Cloud-based semantic image search engine model -- Cloud-based semantic image search engine (CSISE) implementation -- Experimental results and evaluation -- Conclusion and future work - Reference

University of Missouri: MOspace

GraphEvo: Evaluating Software Evolution Using Machine Learning Based Call Graph Analytics And Network Portrait Divergence

Author: Walunj Vijay
Publication venue
Publication date: 29/08/2022
Field of study

Title from PDF of title page, viewed September 9, 2022Dissertation advisor: Yugyung LeeVitaIncludes bibliographical references (pages 151-168)Dissertation (Ph.D)--Department of Computer Science and Electrical Engineering. University of Missouri--Kansas City, 2022Understanding software evolution is essential for software development tasks, including debugging, maintenance, and testing. Unfortunately, as software changes, it becomes more prominent and more complicated, which makes it harder to understand. Software Defect Prediction (SDP) in the codebase is one of the most common ways artificial intelligence (AI) is used to improve the quality of agile products. But graph-based software metrics are seldom used in the software. In this dissertation, we propose a graph-based software framework called GraphEvo based on deep learning modeling for graphs. We applied the recent network comparison advancement to software networks via information theory-based metric Network portrait divergence (NPD). NPD captures the structural changes to call graph-based software networks. The NPD-based method determines what significant software changes are, how much execution paths are affected, and how tests are improved concerning the code. All of these factors affect how reliable the software is. To ensure that NPD-based software works well, version controls and Pull Requests (PRs) are used. GraphEvo's most significant contributions are: (i) Find and show how software has changed over time using call graphs. (ii) Using a machine learning and deep learning techniques to understand the software and guess how many defects are in each code entity (such as a class). (iii) Use the NPD-based tooling to create a public bug dataset and machine learning to see how well it can predict software defects. (iv) Help with the PR review process by knowing how the changes to code and tests that go with them work. We compared the performance of GraphEvo (i) across 66 software releases from five popular Java open-source systems to show that it works, (ii) for 9 Java projects and deep learning to make an SDP model, (iii) for 19 Java projects of different sizes and types from GitHub and to add bug information from other places, and (iv) for 627 PRs from 14 Java projects to see how vital tests are in PRs. These comprehensive experiments show that GraphEvo works well for debugging, maintaining, and testing software. We also received favorable responses from user studies, in which we asked software developers and testers what they thought of GraphEvo.Introduction -- Characterizing and understanding software evolution using call graphs -- Defect prediction using deep learning with NPD for software evolution -- NPD-based tooling, extendible defect dataset and its assessment -- Reviewing pull requests with path-based NPD and test

University of Missouri: MOspace

A Fine-grained Data Set and Analysis of Tangling in Bug Fixing Commits

Author: Aghamohammadi Alireza
Ahmadabadi Matin Nili
Aktas Ethem Utku
Alam Omar
Albrecht Ella
Aldaeej Abdullah
Amit Idan
Bossenmaier Tim
Chahal Kuljit Kaur
Chakroborti Debasish
Colomo-Palacios Ricardo
Davis James
Davis Willard
Eismann Simon
Erbel Johannes
Fard Fatemeh
Ghaleb Taher Ahmed
Henley Austin Z.
Herbold Steffen
Hoy Nathaniel
Kourtzanidis Stratos
Ledel Benjamin
Lenarduzzi Valentina
Madeja Matej
Makedonski Philip
Malavolta Ivano
Marcilio Diego
Nagaria Bhaveet
Pashchenko Ivan
Qin Yihao
Rodríguez-Pérez Gema
Serebrenik Alexander
Shamasbi Simin Maleki
Singh Paramvir
Spieker Helge
Strüber Daniel
Sulir Matus
Szabados Kristof
Trautsch Alexander
Treude Christoph
Turhan Burak
Tuzun Eray
Verdecchia Roberto
Walunj Vijay
Wang Shangwen
Wickert Anna-Katharina
Wu Hongjun
Wyrich Marvin
Publication venue
Publication date: 01/01/2021
Field of study

Context: Tangled commits are changes to software that address multiple concerns at once. For researchers interested in bugs, tangled commits mean that they actually study not only bugs, but also other concerns irrelevant for the study of bugs. Objective: We want to improve our understanding of the prevalence of tangling and the types of changes that are tangled within bug fixing commits. Methods: We use a crowd sourcing approach for manual labeling to validate which changes contribute to bug fixes for each line in bug fixing commits. Each line is labeled by four participants. If at least three participants agree on the same label, we have consensus. Results: We estimate that between 17% and 32% of all changes in bug fixing commits modify the source code to fix the underlying problem. However, when we only consider changes to the production code files this ratio increases to 66% to 87%. We find that about 11% of lines are hard to label leading to active disagreements between participants. Due to confirmed tangling and the uncertainty in our data, we estimate that 3% to 47% of data is noisy without manual untangling, depending on the use case. Conclusion: Tangled commits have a high prevalence in bug fixes and can lead to a large amount of noise in the data. Prior research indicates that this noise may alter results. As researchers, we should be skeptics and assume that unvalidated data is likely very noisy, until proven otherwise.Comment: Status: Accepted at Empirical Software Engineerin

arXiv.org e-Print Archive

University of Oulu Repository - Jultika

Monash University Research Portal

Extended fluorescent uridine analogues: synthesis, photophysical properties and selective interaction with BSA protein

Author: Ajaykumar V. Ardhapure
Anant R. Kapdi
Ardhapure
Bag
Bag
Bag
Bag
Bhilare
Bhilare
Bloomfield
Cservenyi
Dziuba
Eisinger
Fasano
Fletcher
Foller Larsen
Franco
Frisch
Gayakhe
Gayakhe
Gelamo
Gislason
Greco
Guallar
Hamilton
Hirayama
Johnson
Jordheim
Kapdi
Krishna Chaitanya Gunturu
Liang
Liu
Lorion
Loving
Lussier
Majorek
Olszewska
Onidas
Pal
Patra
Proudnikov
Riedl
Segal
Sharon
Shatrughn Bhilare
Shaughnessy
Sholokh
Sjoholm
Srivatsan
Subhendu Sekhar Bag
Sudlow
Sudlow
Tanpure
Uppuluri
Valuchova
Vijay Gayakhe
Vongsutilers
Vrábel
Walunj
Ward
Wotring
Wright
Xie
Xu
Yogesh S. Sanghvi
Zhang
Zhu
Zilbershtein-Shklanovsky
Publication venue: 'Royal Society of Chemistry (RSC)'
Publication date: 01/01/2020
Field of study

Crossref

A fine-grained data set and analysis of tangling in bug fixing commits

Author: Aghamohammadi A. (Alireza)
Ahmadabadi M. N. (Matin Nili)
Aktas E. U. (Ethem Utku)
Alam O. (Omar)
Albrecht E. (Ella)
Aldaeej A. (Abdullah)
Amit I. (Idan)
Bossenmaier T. (Tim)
Chahal K. K. (Kuljit Kaur)
Chakroborti D. (Debasish)
Colomo-Palacios R. (Ricardo)
Davis J. (James)
Davis W. (Willard)
Eismann S. (Simon)
Erbel J. (Johannes)
Fard F. (Fatemeh)
Ghaleb T. A. (Taher A.)
Henley A. Z. (Austin Z.)
Herbold S. (Steffen)
Hoy N. (Nathaniel)
Kourtzanidis S. (Stratos)
Ledel B. (Benjamin)
Lenarduzzi V. (Valentina)
Madeja M. (Matej)
Makedonski P. (Philip)
Malavolta I. (Ivano)
Marcilio D. (Diego)
Nagaria B. (Bhaveet)
Pashchenko I. (Ivan)
Qin Y. (Yihao)
Rodríguez-Pérez G. (Gema)
Serebrenik A. (Alexander)
Shamasbi S. M. (Simin Maleki)
Singh P. (Paramvir)
Spieker H. (Helge)
Strüber D. (Daniel)
Sulír M. (Matúš)
Szabados K. (Kristof)
Trautsch A. (Alexander)
Treude C. (Christoph)
Turhan B. (Burak)
Tuzun E. (Eray)
Verdecchia R. (Roberto)
Walunj V. (Vijay)
Wang S. (Shangwen)
Wickert A.-K. (Anna-Katharina)
Wu H. (Hongjun)
Wyrich M. (Marvin)
Publication venue: Springer Nature
Publication date: 01/01/2022
Field of study

Abstract Context: Tangled commits are changes to software that address multiple concerns at once. For researchers interested in bugs, tangled commits mean that they actually study not only bugs, but also other concerns irrelevant for the study of bugs. Objectives: We want to improve our understanding of the prevalence of tangling and the types of changes that are tangled within bug fixing commits. Methods: We use a crowd sourcing approach for manual labeling to validate which changes contribute to bug fixes for each line in bug fixing commits. Each line is labeled by four participants. If at least three participants agree on the same label, we have consensus. Results: We estimate that between 17% and 32% of all changes in bug fixing commits modify the source code to fix the underlying problem. However, when we only consider changes to the production code files this ratio increases to 66% to 87%. We find that about 11% of lines are hard to label leading to active disagreements between participants. Due to confirmed tangling and the uncertainty in our data, we estimate that 3% to 47% of data is noisy without manual untangling, depending on the use case. Conclusions: Tangled commits have a high prevalence in bug fixes and can lead to a large amount of noise in the data. Prior research indicates that this noise may alter results. As researchers, we should be skeptics and assume that unvalidated data is likely very noisy, until proven otherwise

University of Oulu Repository - Jultika