32 research outputs found
INTEGRATING MACHINE LEARNING WITH SOFTWARE DEVELOPMENT LIFECYCLES: INSIGHTS FROM EXPERTS
This paper examines the challenges related to integrating machine learning (ML) development with software development lifecycle (SDLC) models. Data-intensive development and use of ML are gaining popularity in information systems development (ISD). To date, there is little empirical research that explores the challenges that ISD practitioners encounter when integrating ML development with SDLC frameworks. In this work we conducted a series of expert interviews where we asked the informants to reflect upon how four different archetypal SDLC models support ML development. Three high level trends in ML systems development emerged from the analysis, namely, (1) redefining the prescribed roles and responsibilities within development work; (2) the SDLC as a frame for creating a shared understanding and commitment by management, customers, and software development teams: and (3) method tailoring. This study advances the body of knowledge on the integration of conceptual SDLC models and ML engineering
Data types as a more ergonomic frontend for Grammar-Guided Genetic Programming
Genetic Programming (GP) is an heuristic method that can be applied to many
Machine Learning, Optimization and Engineering problems. In particular, it has
been widely used in Software Engineering for Test-case generation, Program
Synthesis and Improvement of Software (GI).
Grammar-Guided Genetic Programming (GGGP) approaches allow the user to refine
the domain of valid program solutions. Backus Normal Form is the most popular
interface for describing Context-Free Grammars (CFG) for GGGP. BNF and its
derivatives have the disadvantage of interleaving the grammar language and the
target language of the program.
We propose to embed the grammar as an internal Domain-Specific Language in
the host language of the framework. This approach has the same expressive power
as BNF and EBNF while using the host language type-system to take advantage of
all the existing tooling: linters, formatters, type-checkers, autocomplete, and
legacy code support. These tools have a practical utility in designing software
in general, and GP systems in particular.
We also present Meta-Handlers, user-defined overrides of the tree-generation
system. This technique extends our object-oriented encoding with more
practicability and expressive power than existing CFG approaches, achieving the
same expressive power of Attribute Grammars, but without the grammar vs target
language duality.
Furthermore, we evidence that this approach is feasible, showing an example
Python implementation as proof. We also compare our approach against textual
BNF-representations w.r.t. expressive power and ergonomics. These advantages do
not come at the cost of performance, as shown by our empirical evaluation on 5
benchmarks of our example implementation against PonyGE2. We conclude that our
approach has better ergonomics with the same expressive power and performance
of textual BNF-based grammar encodings
Feature learning for stock price prediction shows a significant role of analyst rating
Data Availability Statement: The code is available from https://mkhushi.github.io/ (accessed on 1 February 2021). Dataset License: License under which the dataset is made available (CC0).Efficient Market Hypothesis states that stock prices are a reflection of all the information present in the world and generating excess returns is not possible by merely analysing trade data which is already available to all public. Yet to further the research rejecting this idea, a rigorous literature review was conducted and a set of five technical indicators and 23 fundamental indicators was identified to establish the possibility of generating excess returns on the stock market. Leveraging these data points and various classification machine learning models, trading data of the 505 equities on the US S&P500 over the past 20 years was analysed to develop a classifier effective for our cause. From any given day, we were able to predict the direction of change in price by 1% up to 10 days in the future. The predictions had an overall accuracy of 83.62% with a precision of 85% for buy signals and a recall of 100% for sell signals. Moreover, we grouped equities by their sector and repeated the experiment to see if grouping similar assets together positively effected the results but concluded that it showed no significant improvements in the performanceârejecting the idea of sector-based analysis. Also, using feature ranking we could identify an even smaller set of 6 indicators while maintaining similar accuracies as that from the original 28 features and also uncovered the importance of buy, hold and sell analyst ratings as they came out to be the top contributors in the model. Finally, to evaluate the effectiveness of the classifier in real-life situations, it was backtested on FAANG (Facebook, Amazon, Apple, Netflix & Google) equities using a modest trading strategy where it generated high returns of above 60% over the term of the testing dataset. In conclusion, our proposed methodology with the combination of purposefully picked features shows an improvement over the previous studies, and our model predicts the direction of 1% price changes on the 10th day with high confidence and with enough buffer to even build a robotic trading system.This research received no external funding
Recommended from our members
A heterogeneous online learning ensemble for non-stationary environments
Learning in non-stationary environments is a challenging task which requires the updating of predictive models to deal with changes in the underlying probability distribution of the problem, i.e., dealing with concept drift. Most work in this area is concerned with updating the learning system so that it can quickly recover from concept drift, while little work has been dedicated to investigating what type of predictive model is most suitable at any given time. This paper aims to investigate the benefits of online model selection for predictive modelling in non-stationary environments. A novel heterogeneous ensemble approach is proposed to intelligently switch between different types of base models in an ensemble to increase the predictive performance of online learning in non-stationary environments. This approach is Heterogeneous Dynamic Weighted Majority (HDWM). It makes use of âseedâ learners of different types to maintain ensemble diversity, overcoming problems of existing dynamic ensembles that may undergo loss of diversity due to the exclusion of base learners. The algorithm has been evaluated on artificial and real-world data streams against existing well-known approaches such as a heterogeneous Weighted Majority Algorithm (WMA) and a homogeneous Dynamic Weighted Majority (DWM). The results show that HDWM performed significantly better than WMA in non-stationary environments. Also, when recurring concept drifts were present, the predictive performance of HDWM showed an improvement over DWM
The Language Means of Comicality in Clickbait Headings
The analysis of material presented in the media discourse demonstrates significant changes in the intentionality of the journalistic text, which are reflected in establishing contacts so as to grab and retain the reader's attention. This feature of modern media text is represented in changing genre preferences, speech tactics and strategies, and, consequently, selecting and combining linguistic means. One of the manifestations of this trend is the phenomenon of clickbait, which is a communicative act of promising to continue communication. This article is dedicated to the clickbait with the semantics of comicality. The collected from the Russian-language Internet research material includes clickbait headings that promise a certain funny content. The study revealed that a clickbait model includes the following semantic components: a stimulating utterance of the subject of speech seeking to involve the reader in the humorous nature of hypertext; the verbal and non-verbal markers of the object of laughter; markers, which reflect Internet user's involvement in the communicative act. The analysis of relationship between the components of a clickbait model resulted in specifying four types of clickbait headlines: 1) narrative headlines, which invite the reader to laugh what some other readers have already laughed at; 2) offering headlines suggesting some comic entertainment; 3) allusive clickbaits that hint on the possibility to continue amusing reading; 4) nominative clickbaits, which name the expected laughing reaction to the presentation of some objects