5,940 research outputs found
A systematic literature review on source code similarity measurement and clone detection: techniques, applications, and challenges
Measuring and evaluating source code similarity is a fundamental software
engineering activity that embraces a broad range of applications, including but
not limited to code recommendation, duplicate code, plagiarism, malware, and
smell detection. This paper proposes a systematic literature review and
meta-analysis on code similarity measurement and evaluation techniques to shed
light on the existing approaches and their characteristics in different
applications. We initially found over 10000 articles by querying four digital
libraries and ended up with 136 primary studies in the field. The studies were
classified according to their methodology, programming languages, datasets,
tools, and applications. A deep investigation reveals 80 software tools,
working with eight different techniques on five application domains. Nearly 49%
of the tools work on Java programs and 37% support C and C++, while there is no
support for many programming languages. A noteworthy point was the existence of
12 datasets related to source code similarity measurement and duplicate codes,
of which only eight datasets were publicly accessible. The lack of reliable
datasets, empirical evaluations, hybrid methods, and focuses on multi-paradigm
languages are the main challenges in the field. Emerging applications of code
similarity measurement concentrate on the development phase in addition to the
maintenance.Comment: 49 pages, 10 figures, 6 table
Openness in Education as a Praxis: From Individual Testimonials to Collective Voices
Why is Openness in Education important, and why is it critically needed at this moment? As manifested in our guiding question, the significance of Openness in Education and its immediate necessity form the heart of this collaborative editorial piece. This rather straightforward, yet nuanced query has sparked this collective endeavour by using individual testimonies, which may also be taken as living narratives, to reveal the value of Openness in Education as a praxis. Such testimonies serve as rich, personal narratives, critical introspections, and experience-based accounts that function as sources of data. The data gleaned from these narratives points to the understanding of Openness in Education as a complex, multilayered concept intricately woven into an array of values. These range from aspects such as sharing, access, flexibility, affordability, enlightenment, barrier-removal, empowerment, care, individual agency, trust, innovation, sustainability, collaboration, co-creation, social justice, equity, transparency, inclusivity, decolonization, democratisation, participation, liberty, and respect for diversity. This editorial, as a product of collective endeavour, invites its readers to independently engage with individual narratives, fostering the creation of unique interpretations. This call stems from the distinctive character of each narrative as they voice individual researchers’ perspectives from around the globe, articulating their insights within their unique situational contexts
Advertiser Learning in Direct Advertising Markets
Direct buy advertisers procure advertising inventory at fixed rates from
publishers and ad networks. Such advertisers face the complex task of choosing
ads amongst myriad new publisher sites. We offer evidence that advertisers do
not excel at making these choices. Instead, they try many sites before settling
on a favored set, consistent with advertiser learning. We subsequently model
advertiser demand for publisher inventory wherein advertisers learn about
advertising efficacy across publishers' sites. Results suggest that advertisers
spend considerable resources advertising on sites they eventually abandon -- in
part because their prior beliefs about advertising efficacy on those sites are
too optimistic. The median advertiser's expected CTR at a new site is 0.23%,
five times higher than the true median CTR of 0.045%.
We consider how pooling advertiser information remediates this problem.
Specifically, we show that ads with similar visual elements garner similar
CTRs, enabling advertisers to better predict ad performance at new sites.
Counterfactual analyses indicate that gains from pooling advertiser information
are substantial: over six months, we estimate a median advertiser welfare gain
of \$2,756 (a 15.5% increase) and a median publisher revenue gain of \$9,618 (a
63.9% increase)
Synthetic Aperture Radar (SAR) Meets Deep Learning
This reprint focuses on the application of the combination of synthetic aperture radars and depth learning technology. It aims to further promote the development of SAR image intelligent interpretation technology. A synthetic aperture radar (SAR) is an important active microwave imaging sensor, whose all-day and all-weather working capacity give it an important place in the remote sensing community. Since the United States launched the first SAR satellite, SAR has received much attention in the remote sensing community, e.g., in geological exploration, topographic mapping, disaster forecast, and traffic monitoring. It is valuable and meaningful, therefore, to study SAR-based remote sensing applications. In recent years, deep learning represented by convolution neural networks has promoted significant progress in the computer vision community, e.g., in face recognition, the driverless field and Internet of things (IoT). Deep learning can enable computational models with multiple processing layers to learn data representations with multiple-level abstractions. This can greatly improve the performance of various applications. This reprint provides a platform for researchers to handle the above significant challenges and present their innovative and cutting-edge research results when applying deep learning to SAR in various manuscript types, e.g., articles, letters, reviews and technical reports
Lancaster Stem Sammon Projective Feature Selection based Stochastic eXtreme Gradient Boost Clustering for Web Page Ranking
Web content mining retrieves the information from web in more structured forms. The page rank plays an essential part in web content mining process. Whenever user searches for any information on web, the relevant information is shown at top of list through page ranking. Many existing page ranking algorithms were developed and failed to rank the web pages in accurate manner through minimum time feeding. In direction to address the above mentioned issues, Lancaster Stem Sammon Projective Feature Selection based Stochastic eXtreme Gradient Boost Clustering (LSSPFS-SXGBC) Approach is introduced for page ranking based on user query. LSSPFS-SXGBC Approach has three processes for performing efficient web page ranking, namely preprocessing, feature selection and clustering. LSSPFS-SXGBC Approach in account of the numeral of operator request by way of an input. Lancaster Stemming Preprocessed Analysis is carried out in LSSPFS-SXGBC Approach for removing the noisy data from the input query. It eradicates the stem words, stop words and incomplete data for minimizing the time and space consumption. Sammon Projective Feature Selection Process is carried out in LSSPFS-SXGBC Approach to select the relevant features (i.e., keywords) based on user needs for efficient page ranking. Sammon Projection maps the high-dimensional space to lower dimensionality space to preserve the inter-point distance structure. After feature selection, Stochastic eXtreme Gradient Boost Page Rank Clustering process is carried out to cluster the similar keyword web pages based on their rank. Gradient Boost Page Rank Cluster is an ensemble of several weak clusters (i.e., X-means cluster). X-means cluster partitions the web pages into ‘x’ numeral of clusters where each reflection goes towards the cluster through adjacent mean value. For every weak cluster, selected features are considered as the training samples. Subsequently, all weak clusters are joined to form the strong cluster for attaining the webpage ranking results. By this way, an efficient page ranking is carried out through higher accurateness and minimum time consumption. The practical validation is carried out in LSSPFS-SXGBC Approach on factors such ranking accurateness, false positive rate, ranking time and space complexity with respect to numeral of user query
Recommended from our members
Openness in Education as a Praxis: From Individual Testimonials to Collective Voices
Why is Openness in Education important, and why is it critically needed at this moment? As manifested in our guiding question, the significance of Openness in Education and its immediate necessity form the heart of this collaborative editorial piece. This rather straightforward, yet nuanced query has sparked this collective endeavour by using individual testimonies, which may also be taken as living narratives, to reveal the value of Openness in Education as a praxis. Such testimonies serve as rich, personal narratives, critical introspections, and experience-based accounts that function as sources of data. The data gleaned from these narratives points to the understanding of Openness in Education as a complex, multilayered concept intricately woven into an array of values. These range from aspects such as sharing, access, flexibility, affordability, enlightenment, barrier-removal, empowerment, care, individual agency, trust, innovation, sustainability, collaboration, co-creation, social justice, equity, transparency, inclusivity, decolonization, democratisation, participation, liberty, and respect for diversity. This editorial, as a product of collective endeavour, invites its readers to independently engage with individual narratives, fostering the creation of unique interpretations. This call stems from the distinctive character of each narrative as they voice individual researchers’ perspectives from around the globe, articulating their insights within their unique situational contexts
Reshaping Higher Education for a Post-COVID-19 World: Lessons Learned and Moving Forward
No abstract available
A Closer Look into Recent Video-based Learning Research: A Comprehensive Review of Video Characteristics, Tools, Technologies, and Learning Effectiveness
People increasingly use videos on the Web as a source for learning. To
support this way of learning, researchers and developers are continuously
developing tools, proposing guidelines, analyzing data, and conducting
experiments. However, it is still not clear what characteristics a video should
have to be an effective learning medium. In this paper, we present a
comprehensive review of 257 articles on video-based learning for the period
from 2016 to 2021. One of the aims of the review is to identify the video
characteristics that have been explored by previous work. Based on our
analysis, we suggest a taxonomy which organizes the video characteristics and
contextual aspects into eight categories: (1) audio features, (2) visual
features, (3) textual features, (4) instructor behavior, (5) learners
activities, (6) interactive features (quizzes, etc.), (7) production style, and
(8) instructional design. Also, we identify four representative research
directions: (1) proposals of tools to support video-based learning, (2) studies
with controlled experiments, (3) data analysis studies, and (4) proposals of
design guidelines for learning videos. We find that the most explored
characteristics are textual features followed by visual features, learner
activities, and interactive features. Text of transcripts, video frames, and
images (figures and illustrations) are most frequently used by tools that
support learning through videos. The learner activity is heavily explored
through log files in data analysis studies, and interactive features have been
frequently scrutinized in controlled experiments. We complement our review by
contrasting research findings that investigate the impact of video
characteristics on the learning effectiveness, report on tasks and technologies
used to develop tools that support learning, and summarize trends of design
guidelines to produce learning video
Ciguatoxins
Ciguatoxins (CTXs), which are responsible for Ciguatera fish poisoning (CFP), are liposoluble toxins produced by microalgae of the genera Gambierdiscus and Fukuyoa. This book presents 18 scientific papers that offer new information and scientific evidence on: (i) CTX occurrence in aquatic environments, with an emphasis on edible aquatic organisms; (ii) analysis methods for the determination of CTXs; (iii) advances in research on CTX-producing organisms; (iv) environmental factors involved in the presence of CTXs; and (v) the assessment of public health risks related to the presence of CTXs, as well as risk management and mitigation strategies
A Practical and Empirical Comparison of Three Topic Modeling Methods Using a COVID-19 Corpus: LSA, LDA, and Top2Vec
This study was prepared as a practical guide for researchers interested in using topic modeling methodologies. This study is specially designed for those with difficulty determining which methodology to use. Many topic modeling methods have been developed since the 1980s namely, latent semantic indexing or analysis (LSI/LSA), probabilistic LSI/LSA (pLSI/pLSA), naïve Bayes, the Author-Recipient-Topic (ART), Latent Dirichlet Allocation (LDA), Topic Over Time (TOT), Dynamic Topic Models (DTM), Word2Vec, Top2Vec, and \variation and combination of these techniques. Researchers from disciplines other than computer science may find it challenging to select a topic modeling methodology. We compared a recently developed topic modeling algorithm Top2Vec with two of the most conventional and frequently-used methodologiesLSA and LDA. As a study sample, we used a corpus of 65,292 COVID-19-focused abstracts. Among the 11 topics we identified in each methodology, we found high levels of correlation between LDA and Top2Vec results, followed by LSA and LDA and Top2Vec and LSA. We also provided information on computational resources we used to perform the analyses and provided practical guidelines and recommendations for researchers
- …