5,940 research outputs found

    A systematic literature review on source code similarity measurement and clone detection: techniques, applications, and challenges

    Full text link
    Measuring and evaluating source code similarity is a fundamental software engineering activity that embraces a broad range of applications, including but not limited to code recommendation, duplicate code, plagiarism, malware, and smell detection. This paper proposes a systematic literature review and meta-analysis on code similarity measurement and evaluation techniques to shed light on the existing approaches and their characteristics in different applications. We initially found over 10000 articles by querying four digital libraries and ended up with 136 primary studies in the field. The studies were classified according to their methodology, programming languages, datasets, tools, and applications. A deep investigation reveals 80 software tools, working with eight different techniques on five application domains. Nearly 49% of the tools work on Java programs and 37% support C and C++, while there is no support for many programming languages. A noteworthy point was the existence of 12 datasets related to source code similarity measurement and duplicate codes, of which only eight datasets were publicly accessible. The lack of reliable datasets, empirical evaluations, hybrid methods, and focuses on multi-paradigm languages are the main challenges in the field. Emerging applications of code similarity measurement concentrate on the development phase in addition to the maintenance.Comment: 49 pages, 10 figures, 6 table

    Openness in Education as a Praxis: From Individual Testimonials to Collective Voices

    Get PDF
    Why is Openness in Education important, and why is it critically needed at this moment? As manifested in our guiding question, the significance of Openness in Education and its immediate necessity form the heart of this collaborative editorial piece. This rather straightforward, yet nuanced query has sparked this collective endeavour by using individual testimonies, which may also be taken as living narratives, to reveal the value of Openness in Education as a praxis. Such testimonies serve as rich, personal narratives, critical introspections, and experience-based accounts that function as sources of data. The data gleaned from these narratives points to the understanding of Openness in Education as a complex, multilayered concept intricately woven into an array of values. These range from aspects such as sharing, access, flexibility, affordability, enlightenment, barrier-removal, empowerment, care, individual agency, trust, innovation, sustainability, collaboration, co-creation, social justice, equity, transparency, inclusivity, decolonization, democratisation, participation, liberty, and respect for diversity. This editorial, as a product of collective endeavour, invites its readers to independently engage with individual narratives, fostering the creation of unique interpretations. This call stems from the distinctive character of each narrative as they voice individual researchers’ perspectives from around the globe, articulating their insights within their unique situational contexts

    Advertiser Learning in Direct Advertising Markets

    Full text link
    Direct buy advertisers procure advertising inventory at fixed rates from publishers and ad networks. Such advertisers face the complex task of choosing ads amongst myriad new publisher sites. We offer evidence that advertisers do not excel at making these choices. Instead, they try many sites before settling on a favored set, consistent with advertiser learning. We subsequently model advertiser demand for publisher inventory wherein advertisers learn about advertising efficacy across publishers' sites. Results suggest that advertisers spend considerable resources advertising on sites they eventually abandon -- in part because their prior beliefs about advertising efficacy on those sites are too optimistic. The median advertiser's expected CTR at a new site is 0.23%, five times higher than the true median CTR of 0.045%. We consider how pooling advertiser information remediates this problem. Specifically, we show that ads with similar visual elements garner similar CTRs, enabling advertisers to better predict ad performance at new sites. Counterfactual analyses indicate that gains from pooling advertiser information are substantial: over six months, we estimate a median advertiser welfare gain of \$2,756 (a 15.5% increase) and a median publisher revenue gain of \$9,618 (a 63.9% increase)

    Synthetic Aperture Radar (SAR) Meets Deep Learning

    Get PDF
    This reprint focuses on the application of the combination of synthetic aperture radars and depth learning technology. It aims to further promote the development of SAR image intelligent interpretation technology. A synthetic aperture radar (SAR) is an important active microwave imaging sensor, whose all-day and all-weather working capacity give it an important place in the remote sensing community. Since the United States launched the first SAR satellite, SAR has received much attention in the remote sensing community, e.g., in geological exploration, topographic mapping, disaster forecast, and traffic monitoring. It is valuable and meaningful, therefore, to study SAR-based remote sensing applications. In recent years, deep learning represented by convolution neural networks has promoted significant progress in the computer vision community, e.g., in face recognition, the driverless field and Internet of things (IoT). Deep learning can enable computational models with multiple processing layers to learn data representations with multiple-level abstractions. This can greatly improve the performance of various applications. This reprint provides a platform for researchers to handle the above significant challenges and present their innovative and cutting-edge research results when applying deep learning to SAR in various manuscript types, e.g., articles, letters, reviews and technical reports

    Lancaster Stem Sammon Projective Feature Selection based Stochastic eXtreme Gradient Boost Clustering for Web Page Ranking

    Get PDF
    Web content mining retrieves the information from web in more structured forms. The page rank plays an essential part in web content mining process. Whenever user searches for any information on web, the relevant information is shown at top of list through page ranking. Many existing page ranking algorithms were developed and failed to rank the web pages in accurate manner through minimum time feeding. In direction to address the above mentioned issues, Lancaster Stem Sammon Projective Feature Selection based Stochastic eXtreme Gradient Boost Clustering (LSSPFS-SXGBC) Approach is introduced for page ranking based on user query. LSSPFS-SXGBC Approach has three processes for performing efficient web page ranking, namely preprocessing, feature selection and clustering. LSSPFS-SXGBC Approach in account of the numeral of operator request by way of an input. Lancaster Stemming Preprocessed Analysis is carried out in LSSPFS-SXGBC Approach for removing the noisy data from the input query. It eradicates the stem words, stop words and incomplete data for minimizing the time and space consumption. Sammon Projective Feature Selection Process is carried out in LSSPFS-SXGBC Approach to select the relevant features (i.e., keywords) based on user needs for efficient page ranking. Sammon Projection maps the high-dimensional space to lower dimensionality space to preserve the inter-point distance structure. After feature selection, Stochastic eXtreme Gradient Boost Page Rank Clustering process is carried out to cluster the similar keyword web pages based on their rank. Gradient Boost Page Rank Cluster is an ensemble of several weak clusters (i.e., X-means cluster). X-means cluster partitions the web pages into ‘x’ numeral of clusters where each reflection goes towards the cluster through adjacent mean value. For every weak cluster, selected features are considered as the training samples. Subsequently, all weak clusters are joined to form the strong cluster for attaining the webpage ranking results. By this way, an efficient page ranking is carried out through higher accurateness and minimum time consumption. The practical validation is carried out in LSSPFS-SXGBC Approach on factors such ranking accurateness, false positive rate, ranking time and space complexity with respect to numeral of user query

    Reshaping Higher Education for a Post-COVID-19 World: Lessons Learned and Moving Forward

    Get PDF
    No abstract available

    A Closer Look into Recent Video-based Learning Research: A Comprehensive Review of Video Characteristics, Tools, Technologies, and Learning Effectiveness

    Full text link
    People increasingly use videos on the Web as a source for learning. To support this way of learning, researchers and developers are continuously developing tools, proposing guidelines, analyzing data, and conducting experiments. However, it is still not clear what characteristics a video should have to be an effective learning medium. In this paper, we present a comprehensive review of 257 articles on video-based learning for the period from 2016 to 2021. One of the aims of the review is to identify the video characteristics that have been explored by previous work. Based on our analysis, we suggest a taxonomy which organizes the video characteristics and contextual aspects into eight categories: (1) audio features, (2) visual features, (3) textual features, (4) instructor behavior, (5) learners activities, (6) interactive features (quizzes, etc.), (7) production style, and (8) instructional design. Also, we identify four representative research directions: (1) proposals of tools to support video-based learning, (2) studies with controlled experiments, (3) data analysis studies, and (4) proposals of design guidelines for learning videos. We find that the most explored characteristics are textual features followed by visual features, learner activities, and interactive features. Text of transcripts, video frames, and images (figures and illustrations) are most frequently used by tools that support learning through videos. The learner activity is heavily explored through log files in data analysis studies, and interactive features have been frequently scrutinized in controlled experiments. We complement our review by contrasting research findings that investigate the impact of video characteristics on the learning effectiveness, report on tasks and technologies used to develop tools that support learning, and summarize trends of design guidelines to produce learning video

    Ciguatoxins

    Get PDF
    Ciguatoxins (CTXs), which are responsible for Ciguatera fish poisoning (CFP), are liposoluble toxins produced by microalgae of the genera Gambierdiscus and Fukuyoa. This book presents 18 scientific papers that offer new information and scientific evidence on: (i) CTX occurrence in aquatic environments, with an emphasis on edible aquatic organisms; (ii) analysis methods for the determination of CTXs; (iii) advances in research on CTX-producing organisms; (iv) environmental factors involved in the presence of CTXs; and (v) the assessment of public health risks related to the presence of CTXs, as well as risk management and mitigation strategies

    A Practical and Empirical Comparison of Three Topic Modeling Methods Using a COVID-19 Corpus: LSA, LDA, and Top2Vec

    Get PDF
    This study was prepared as a practical guide for researchers interested in using topic modeling methodologies. This study is specially designed for those with difficulty determining which methodology to use. Many topic modeling methods have been developed since the 1980s namely, latent semantic indexing or analysis (LSI/LSA), probabilistic LSI/LSA (pLSI/pLSA), naïve Bayes, the Author-Recipient-Topic (ART), Latent Dirichlet Allocation (LDA), Topic Over Time (TOT), Dynamic Topic Models (DTM), Word2Vec, Top2Vec, and \variation and combination of these techniques. Researchers from disciplines other than computer science may find it challenging to select a topic modeling methodology. We compared a recently developed topic modeling algorithm Top2Vec with two of the most conventional and frequently-used methodologiesLSA and LDA. As a study sample, we used a corpus of 65,292 COVID-19-focused abstracts. Among the 11 topics we identified in each methodology, we found high levels of correlation between LDA and Top2Vec results, followed by LSA and LDA and Top2Vec and LSA. We also provided information on computational resources we used to perform the analyses and provided practical guidelines and recommendations for researchers
    corecore