7,816 research outputs found
A systematic literature review on source code similarity measurement and clone detection: techniques, applications, and challenges
Measuring and evaluating source code similarity is a fundamental software
engineering activity that embraces a broad range of applications, including but
not limited to code recommendation, duplicate code, plagiarism, malware, and
smell detection. This paper proposes a systematic literature review and
meta-analysis on code similarity measurement and evaluation techniques to shed
light on the existing approaches and their characteristics in different
applications. We initially found over 10000 articles by querying four digital
libraries and ended up with 136 primary studies in the field. The studies were
classified according to their methodology, programming languages, datasets,
tools, and applications. A deep investigation reveals 80 software tools,
working with eight different techniques on five application domains. Nearly 49%
of the tools work on Java programs and 37% support C and C++, while there is no
support for many programming languages. A noteworthy point was the existence of
12 datasets related to source code similarity measurement and duplicate codes,
of which only eight datasets were publicly accessible. The lack of reliable
datasets, empirical evaluations, hybrid methods, and focuses on multi-paradigm
languages are the main challenges in the field. Emerging applications of code
similarity measurement concentrate on the development phase in addition to the
maintenance.Comment: 49 pages, 10 figures, 6 table
Multimodal spatio-temporal deep learning framework for 3D object detection in instrumented vehicles
This thesis presents the utilization of multiple modalities, such as image and lidar, to incorporate spatio-temporal information from sequence data into deep learning architectures for 3Dobject detection in instrumented vehicles. The race to autonomy in instrumented vehicles or self-driving cars has stimulated significant research in developing autonomous driver assistance systems (ADAS) technologies related explicitly to perception systems. Object detection plays a crucial role in perception systems by providing spatial information to its subsequent modules; hence, accurate detection is a significant task supporting autonomous driving. The advent of deep learning in computer vision applications and the availability of multiple sensing modalities such as 360° imaging, lidar, and radar have led to state-of-the-art 2D and 3Dobject detection architectures. Most current state-of-the-art 3D object detection frameworks consider single-frame reference. However, these methods do not utilize temporal information associated with the objects or scenes from the sequence data. Thus, the present research hypothesizes that multimodal temporal information can contribute to bridging the gap between 2D and 3D metric space by improving the accuracy of deep learning frameworks for 3D object estimations. The thesis presents understanding multimodal data representations and selecting hyper-parameters using public datasets such as KITTI and nuScenes with Frustum-ConvNet as a baseline architecture. Secondly, an attention mechanism was employed along with convolutional-LSTM to extract spatial-temporal information from sequence data to improve 3D estimations and to aid the architecture in focusing on salient lidar point cloud features. Finally, various fusion strategies are applied to fuse the modalities and temporal information into the architecture to assess its efficacy on performance and computational complexity. Overall, this thesis has established the importance and utility of multimodal systems for refined 3D object detection and proposed a complex pipeline incorporating spatial, temporal and attention mechanisms to improve specific, and general class accuracy demonstrated on key autonomous driving data sets
Leveraging a machine learning based predictive framework to study brain-phenotype relationships
An immense collective effort has been put towards the development of methods forquantifying brain activity and structure. In parallel, a similar effort has focused on collecting experimental data, resulting in ever-growing data banks of complex human in vivo neuroimaging data. Machine learning, a broad set of powerful and effective tools for identifying multivariate relationships in high-dimensional problem spaces, has proven to be a promising approach toward better understanding the relationships between the brain and different phenotypes of interest. However, applied machine learning within a predictive framework for the study of neuroimaging data introduces several domain-specific problems and considerations, leaving the overarching question of how to best structure and run experiments ambiguous. In this work, I cover two explicit pieces of this larger question, the relationship between data representation and predictive performance and a case study on issues related to data collected from disparate sites and cohorts. I then present the Brain Predictability toolbox, a soft- ware package to explicitly codify and make more broadly accessible to researchers the recommended steps in performing a predictive experiment, everything from framing a question to reporting results. This unique perspective ultimately offers recommen- dations, explicit analytical strategies, and example applications for using machine learning to study the brain
Recommended from our members
Climate Risk Management in Agricultural Extension (CRMAE) Reference Guide
The Climate Risk Management in Agricultural Extension (CRMAE) Reference Guide is an accompaniment to the abridged CRMAE Handbook. Both the Reference Guide and Handbook are training and reference materials intended to be used during implementation of the Climate Risk Management in Agricultural Extension course in Ethiopia. The Reference Guide was designed for Ethiopia’s subject matter specialists (SMS) and extension staff, including development agents (DAs). It may also be used by other actors, such as non-governmental organizations (NGOs) or community-based organizations (CBOs), who work closely with farmers and those who support them. It aims to provide foundational knowledge on climate and agricultural decision making and practical tools to analyze climate-related risks, use appropriate weather and climate information to support agricultural decisions, communicate complex climate information effectively with farmers, and integrate climate services into agricultural extension activities.
Keywords: Ethiopia; agriculture; climate change; climate variability; food security; education; extension approaches; capacity development; climate-smart agriculture; climatology; monitoring systems; forecasting; participatory approaches; Goal 2 Zero Hunge
Modelling, Monitoring, Control and Optimization for Complex Industrial Processes
This reprint includes 22 research papers and an editorial, collected from the Special Issue "Modelling, Monitoring, Control and Optimization for Complex Industrial Processes", highlighting recent research advances and emerging research directions in complex industrial processes. This reprint aims to promote the research field and benefit the readers from both academic communities and industrial sectors
Exploring the Training Factors that Influence the Role of Teaching Assistants to Teach to Students With SEND in a Mainstream Classroom in England
With the implementation of inclusive education having become increasingly valued over the years, the training of Teaching Assistants (TAs) is now more important than ever, given that they work alongside pupils with special educational needs and disabilities (hereinafter SEND) in mainstream education classrooms. The current study explored the training factors that influence the role of TAs when it comes to teaching SEND students in mainstream classrooms in England during their one-year training period. This work aimed to increase understanding of how the training of TAs is seen to influence the development of their personal knowledge and professional skills. The study has significance for our comprehension of the connection between the TAs’ training and the quality of education in the classroom. In addition, this work investigated whether there existed a correlation between the teaching experience of TAs and their background information, such as their gender, age, grade level taught, years of teaching experience, and qualification level.
A critical realist theoretical approach was adopted for this two-phased study, which involved the mixing of adaptive and grounded theories respectively. The multi-method project featured 13 case studies, each of which involved a trainee TA, his/her college tutor, and the classroom teacher who was supervising the trainee TA. The analysis was based on using semi-structured interviews, various questionnaires, and non-participant observation methods for each of these case studies during the TA’s one-year training period. The primary analysis of the research was completed by comparing the various kinds of data collected from the participants in the first and second data collection stages of each case. Further analysis involved cross-case analysis using a grounded theory approach, which made it possible to draw conclusions and put forth several core propositions. Compared with previous research, the findings of the current study reveal many implications for the training and deployment conditions of TAs, while they also challenge the prevailing approaches in many aspects, in addition to offering more diversified, enriched, and comprehensive explanations of the critical pedagogical issues
Central-provincial Politics and Industrial Policy-making in the Electric Power Sector in China
In addition to the studies that provide meaningful insights into the complexity of technical and economic issues, increasing studies have focused on the political process of market transition in network industries such as the electric power sector. This dissertation studies the central–provincial interactions in industrial policy-making and implementation, and attempts to evaluate the roles of Chinese provinces in the market reform process of the electric power sector. Market reforms of this sector are used as an illustrative case because the new round of market reforms had achieved some significant breakthroughs in areas such as pricing reform and wholesale market trading. Other policy measures, such as the liberalization of the distribution market and cross-regional market-building, are still at a nascent stage and have only scored moderate progress. It is important to investigate why some policy areas make greater progress in market reforms than others. It is also interesting to examine the impacts of Chinese central-provincial politics on producing the different market reform outcomes. Guangdong and Xinjiang are two provinces being analyzed in this dissertation. The progress of market reforms in these two provinces showed similarities although the provinces are very different in terms of local conditions such as the stages of their economic development and energy structures. The actual reform can be understood as the outcomes of certain modes of interactions between the central and provincial actors in the context of their particular capabilities and preferences in different policy areas. This dissertation argues that market reform is more successful in policy areas where the central and provincial authorities are able to engage mainly in integrative negotiations than in areas where they engage mainly in distributive negotiations
Automatic Question Generation to Support Reading Comprehension of Learners - Content Selection, Neural Question Generation, and Educational Evaluation
Simply reading texts passively without actively engaging with their content is suboptimal for text comprehension since learners may miss crucial concepts or misunderstand essential ideas.
In contrast, engaging learners actively by asking questions fosters text comprehension.
However, educational resources frequently lack questions.
Textbooks often contain only a few at the end of a chapter, and informal learning resources such as Wikipedia lack them entirely.
Thus, in this thesis, we study to what extent questions about educational science texts can be automatically generated, tackling two research questions.
The first question concerns selecting learning-relevant passages to guide the generation process.
The second question investigates the generated questions' potential effects and applicability in reading comprehension scenarios.
Our first contribution improves the understanding of neural question generation's quality in education.
We find that the generators' high linguistic quality transfers to educational texts but that they require guidance by educational content selection.
In consequence, we study multiple educational context and answer selection mechanisms.
In our second contribution, we propose novel context selection approaches which target question-worthy sentences in texts.
In contrast to previous works, our context selectors are guided by educational theory.
The proposed methods perform competitive to related work while operating with educationally motivated decision criteria that are easier to understand for educational experts.
The third contribution addresses answer selection methods to guide neural question generation with expected answers.
Our experiments highlight the need for educational corpora for the task. Models trained on noneducational corpora do not transfer well to the educational domain.
Given this discrepancy, we propose a novel corpus construction approach.
It automatically derives educational answer selection corpora from textbooks.
We verify the approach's usefulness by showing that neural models trained on the constructed corpora learn to detect learning-relevant concepts.
In our last contribution, we use the insights from the previous experiments to design, implement, and evaluate an automatic question generator for educational use.
We evaluate the proposed generator intrinsically with an expert annotation study and extrinsically with an empirical reading comprehension study.
The two evaluation scenarios provide a nuanced view of the generated questions' strengths and weaknesses.
Expert annotations attribute an educational value to roughly 60 % of the questions but also reveal various ways in which the questions still fall short of the quality experts desire.
Furthermore, the reader-based evaluation indicates that the proposed educational question generator increases learning outcomes compared to a no-question control group.
In summary, the results of the thesis improve the understanding of the content selection tasks in educational question generation and provide evidence that it can improve reading comprehension.
As such, the proposed approaches are promising tools for authors and learners to promote active reading and thus foster text comprehension
Acoustic modelling, data augmentation and feature extraction for in-pipe machine learning applications
Gathering measurements from infrastructure, private premises, and harsh environments can be difficult and expensive. From this perspective, the development of
new machine learning algorithms is strongly affected by the availability of training
and test data. We focus on audio archives for in-pipe events. Although several
examples of pipe-related applications can be found in the literature, datasets of
audio/vibration recordings are much scarcer, and the only references found relate
to leakage detection and characterisation. Therefore, this work proposes a methodology to relieve the burden of data collection for acoustic events in deployed pipes.
The aim is to maximise the yield of small sets of real recordings and demonstrate
how to extract effective features for machine learning. The methodology developed
requires the preliminary creation of a soundbank of audio samples gathered with
simple weak annotations. For practical reasons, the case study is given by a range
of appliances, fittings, and fixtures connected to pipes in domestic environments.
The source recordings are low-reverberated audio signals enhanced through a
bespoke spectral filter and containing the desired audio fingerprints. The soundbank is then processed to create an arbitrary number of synthetic augmented
observations. The data augmentation improves the quality and the quantity of
the metadata and automatically creates strong and accurate annotations that
are both machine and human-readable. Besides, the implemented processing
chain allows precise control of properties such as signal-to-noise ratio, duration
of the events, and the number of overlapping events. The inter-class variability
is expanded by recombining source audio blocks and adding simulated artificial
reverberation obtained through an acoustic model developed for the purpose.
Finally, the dataset is synthesised to guarantee separability and balance. A few
signal representations are optimised to maximise the classification performance,
and the results are reported as a benchmark for future developments. The contribution to the existing knowledge concerns several aspects of the processing chain
implemented. A novel quasi-analytic acoustic model is introduced to simulate
in-pipe reverberations, adopting a three-layer architecture particularly convenient
for batch processing. The first layer includes two algorithms: one for the numerical
calculation of the axial wavenumbers and one for the separation of the modes. The
latter, in particular, provides a workaround for a problem not explicitly treated in the
literature and related to the modal non-orthogonality given by the solid-liquid interface in the analysed domain. A set of results for different waveguides is reported
to compare the dispersive behaviour against different mechanical configurations.
Two more novel solutions are also included in the second layer of the model and
concern the integration of the acoustic sources. Specifically, the amplitudes of the
non-orthogonal modal potentials are obtained using either a distance minimisation
objective function or by solving an analytical decoupling problem. In both cases,
results show that sources sufficiently smooth can be approximated with a limited
number of modes keeping the error below 1%. The last layer proposes a bespoke
approach for the integration of the acoustic model into the synthesiser as a reverberation simulator. Additional elements of novelty relate to the other blocks of the
audio synthesiser. The statistical spectral filter, for instance, is a batch-processing
solution for the attenuation of the background noise of the source recordings. The
signal-to-noise ratio analysis for both moderate and high noise levels indicates
a clear improvement of several decibels against the closest filter example in the
literature. The recombination of the audio blocks and the system of fully tracked
annotations are also novel extensions of similar approaches recently adopted in
other contexts. Moreover, a bespoke synthesis strategy is proposed to guarantee
separable and balanced datasets. The last contribution concerns the extraction
of convenient sets of audio features. Elements of novelty are introduced for the
optimisation of the filter banks of the mel-frequency cepstral coefficients and the
scattering wavelet transform. In particular, compared to the respective standard
definitions, the average F-score performance of the optimised features is roughly
6% higher in the first case and 2.5% higher for the latter. Finally, the soundbank,
the synthetic dataset, and the fundamental blocks of the software library developed
are publicly available for further research
- …