465 research outputs found

    Deep Learning Software Repositories

    Get PDF
    Bridging the abstraction gap between artifacts and concepts is the essence of software engineering (SE) research problems. SE researchers regularly use machine learning to bridge this gap, but there are three fundamental issues with traditional applications of machine learning in SE research. Traditional applications are too reliant on labeled data. They are too reliant on human intuition, and they are not capable of learning expressive yet efficient internal representations. Ultimately, SE research needs approaches that can automatically learn representations of massive, heterogeneous, datasets in situ, apply the learned features to a particular task and possibly transfer knowledge from task to task. Improvements in both computational power and the amount of memory in modern computer architectures have enabled new approaches to canonical machine learning tasks. Specifically, these architectural advances have enabled machines that are capable of learning deep, compositional representations of massive data depots. The rise of deep learning has ushered in tremendous advances in several fields. Given the complexity of software repositories, we presume deep learning has the potential to usher in new analytical frameworks and methodologies for SE research and the practical applications it reaches. This dissertation examines and enables deep learning algorithms in different SE contexts. We demonstrate that deep learners significantly outperform state-of-the-practice software language models at code suggestion on a Java corpus. Further, these deep learners for code suggestion automatically learn how to represent lexical elements. We use these representations to transmute source code into structures for detecting similar code fragments at different levels of granularity—without declaring features for how the source code is to be represented. Then we use our learning-based framework for encoding fragments to intelligently select and adapt statements in a codebase for automated program repair. In our work on code suggestion, code clone detection, and automated program repair, everything for representing lexical elements and code fragments is mined from the source code repository. Indeed, our work aims to move SE research from the art of feature engineering to the science of automated discovery

    Dynamic Identification for Control of Large Space Structures

    Get PDF
    This is a compilation of reports by the one author on one subject. It consists of the following five journal articles: (1) A Parametric Study of the Ibrahim Time Domain Modal Identification Algorithm; (2) Large Modal Survey Testing Using the Ibrahim Time Domain Identification Technique; (3) Computation of Normal Modes from Identified Complex Modes; (4) Dynamic Modeling of Structural from Measured Complex Modes; and (5) Time Domain Quasi-Linear Identification of Nonlinear Dynamic Systems

    Efficient Detectors for MIMO-OFDM Systems under Spatial Correlation Antenna Arrays

    Full text link
    This work analyzes the performance of the implementable detectors for multiple-input-multiple-output (MIMO) orthogonal frequency division multiplexing (OFDM) technique under specific and realistic operation system condi- tions, including antenna correlation and array configuration. Time-domain channel model has been used to evaluate the system performance under realistic communication channel and system scenarios, including different channel correlation, modulation order and antenna arrays configurations. A bunch of MIMO-OFDM detectors were analyzed for the purpose of achieve high performance combined with high capacity systems and manageable computational complexity. Numerical Monte-Carlo simulations (MCS) demonstrate the channel selectivity effect, while the impact of the number of antennas, adoption of linear against heuristic-based detection schemes, and the spatial correlation effect under linear and planar antenna arrays are analyzed in the MIMO-OFDM context.Comment: 26 pgs, 16 figures and 5 table

    Evaluation and Improvement of Machine Learning Algorithms in Drug Discovery

    Get PDF
    Drug discovery plays a critical role in today’s society for treating and preventing sickness and possibly deadly viruses. In early drug discovery development, the main challenge is to find candidate molecules to be used as drugs to treat a disease. This also means assessing key properties that are wanted in the inter- action between molecules and proteins. It is a very difficult problem because the molecular space is so big and complex. Drug discovery development is es- timated to take around 12–15 years on average, and the costs of developing a single drug amount to $2.8 billion dollars in the US. Modern drug discovery and drug development often start with finding candi- date drug molecules (‘compounds’) that can bind to a target, usually a protein in our body. Since there are billions of possible molecules to test, this becomes an endless search for compounds that show promising bioactivity. The search method is called high-throughput screening (HTS), or virtual HTS (VHTS) in a virtual environment. The traditional approach to HTS has been to test every compound one by one. More recent approaches have seen the use of robotics and of features extracted from the molecule, combining them with machine learning algorithms, in an effort to make the process more automated. Research has shown that this will still lead to human errors and bias. So, how can we use machine learning algorithms to make this approach more cost-efficient and more robust to human errors? This project tried to address these issues and led to two scientific papers as a result. The first paper explores how common evaluation metrics used for classification can actually be unsuited to the task, leading to severe consequences when put into a real application. The argument is based on basic principles of Decision Theory, which is recognized in the field of machine learning but has not been put into much use. It makes a distinction between predicting the most probable class and predicting the most valuable class in terms of the “cost” or “gains” for the classes. In an algorithm for classifying a particular disease in a patient, the wrong classification could lead to a life or death situation. The principles also apply to drug discovery, where the cost of further developing and optimizing a "useless" drug could be huge. The goal of the classifier should therefore not be to guess the correct class but to choose the optimal class, and the metric must depend on the type of classification problem. Thus, we show that common metrics such as precision, balanced accuracy, F1-score, Area Under The Curve, Matthews Correlation Coefficient, and Fowlkes-Mallows index are affected by this problem, and propose an evaluation method grounded on the foundations of Decision Theory to provide a solution to this problem. The metric presented, called utility, takes into account gains and losses for each correct or incorrect classification of the confusion matrix. For this to work effectively, the output of the machine learning algorithm needs to be a set of sensible probabilities for each class. This brings us to the second paper. Machine learning algorithms usually output a set of real numbers for the classes they try to predict, which, possibly after some transformation (for exam- ple the ‘softmax’ function), are meant to represent probabilities for the classes. However, the problem is that these numbers cannot be reliably interpreted as actual probabilities, in the sense of degrees of belief. In the paper, we propose the implementation of a probability transducer to transform the output of the algorithm into sensible probabilities. These are then used in conjunction with the utilities to choose the class with the maximal expected utility. The results show that the transducer gives better scores, in terms of the utilities, for all cases compared to the standard method used in machine learning.Masteroppgave i Programutvikling samarbeid med HVLPROG399MAMN-PRO

    Personalizing Interactions with Information Systems

    Get PDF
    Personalization constitutes the mechanisms and technologies necessary to customize information access to the end-user. It can be defined as the automatic adjustment of information content, structure, and presentation tailored to the individual. In this chapter, we study personalization from the viewpoint of personalizing interaction. The survey covers mechanisms for information-finding on the web, advanced information retrieval systems, dialog-based applications, and mobile access paradigms. Specific emphasis is placed on studying how users interact with an information system and how the system can encourage and foster interaction. This helps bring out the role of the personalization system as a facilitator which reconciles the user’s mental model with the underlying information system’s organization. Three tiers of personalization systems are presented, paying careful attention to interaction considerations. These tiers show how progressive levels of sophistication in interaction can be achieved. The chapter also surveys systems support technologies and niche application domains
    corecore