1,423 research outputs found

    Selection of Software Product Line Implementation Components Using Recommender Systems: An Application to Wordpress

    Get PDF
    In software products line (SPL), there may be features which can be implemented by different components, which means there are several implementations for the same feature. In this context, the selection of the best components set to implement a given configuration is a challenging task due to the high number of combinations and options which could be selected. In certain scenarios, it is possible to find information associated with the components which could help in this selection task, such as user ratings. In this paper, we introduce a component-based recommender system, called (REcommender System that suggests implementation Components from selecteD fEatures), which uses information associated with the implementation components to make recommendations in the domain of the SPL configuration. We also provide a RESDEC reference implementation that supports collaborative-based and content-based filtering algorithms to recommend (i.e., implementation components) regarding WordPress-based websites configuration. The empirical results, on a knowledge base with 680 plugins and 187 000 ratings by 116 000 users, show promising results. Concretely, this indicates that it is possible to guide the user throughout the implementation components selection with a margin of error smaller than 13% according to our evaluation.Ministerio de Economía y Competitividad RTI2018-101204-B-C22Ministerio de Economía y Competitividad TIN2014-55894-C2-1-RMinisterio de Economía y Competitividad TIN2017-88209-C2-2-RMinisterio de Economía, Industria y Competitividad MCIU-AEI TIN2017-90644-RED

    Innovative Heuristics to Improve the Latent Dirichlet Allocation Methodology for Textual Analysis and a New Modernized Topic Modeling Approach

    Get PDF
    Natural Language Processing is a complex method of data mining the vast trove of documents created and made available every day. Topic modeling seeks to identify the topics within textual corpora with limited human input into the process to speed analysis. Current topic modeling techniques used in Natural Language Processing have limitations in the pre-processing steps. This dissertation studies topic modeling techniques, those limitations in the pre-processing, and introduces new algorithms to gain improvements from existing topic modeling techniques while being competitive with computational complexity. This research introduces four contributions to the field of Natural Language Processing and topic modeling. First, this research identifies a requirement for a more robust “stopwords” list and proposes a heuristic for creating a more robust list. Second, a new dimensionality-reduction technique is introduced that exploits the number of words within a document to infer importance to word choice. Third, an algorithm is developed to determine the number of topics within a corpus and demonstrated using a standard topic modeling data set. These techniques produce a higher quality result from the Latent Dirichlet Allocation topic modeling technique. Fourth, a novel heuristic utilizing Principal Component Analysis is introduced that is capable of determining the number of topics within a corpus that produces stable sets of topic words

    Contributions to probabilistic non-negative matrix factorization - Maximum marginal likelihood estimation and Markovian temporal models

    Get PDF
    Non-negative matrix factorization (NMF) has become a popular dimensionality reductiontechnique, and has found applications in many different fields, such as audio signal processing,hyperspectral imaging, or recommender systems. In its simplest form, NMF aims at finding anapproximation of a non-negative data matrix (i.e., with non-negative entries) as the product of twonon-negative matrices, called the factors. One of these two matrices can be interpreted as adictionary of characteristic patterns of the data, and the other one as activation coefficients ofthese patterns. This low-rank approximation is traditionally retrieved by optimizing a measure of fitbetween the data matrix and its approximation. As it turns out, for many choices of measures of fit,the problem can be shown to be equivalent to the joint maximum likelihood estimation of thefactors under a certain statistical model describing the data. This leads us to an alternativeparadigm for NMF, where the learning task revolves around probabilistic models whoseobservation density is parametrized by the product of non-negative factors. This general framework, coined probabilistic NMF, encompasses many well-known latent variable models ofthe literature, such as models for count data. In this thesis, we consider specific probabilistic NMFmodels in which a prior distribution is assumed on the activation coefficients, but the dictionary remains a deterministic variable. The objective is then to maximize the marginal likelihood in thesesemi-Bayesian NMF models, i.e., the integrated joint likelihood over the activation coefficients.This amounts to learning the dictionary only; the activation coefficients may be inferred in asecond step if necessary. We proceed to study in greater depth the properties of this estimation process. In particular, two scenarios are considered. In the first one, we assume the independence of the activation coefficients sample-wise. Previous experimental work showed that dictionarieslearned with this approach exhibited a tendency to automatically regularize the number of components, a favorable property which was left unexplained. In the second one, we lift thisstandard assumption, and consider instead Markov structures to add statistical correlation to themodel, in order to better analyze temporal data

    The Design of an Interactive Topic Modeling Application for Media Content

    Get PDF
    Topic Modeling has been widely used by data scientists to analyze the increasing amount of text documents. Documents can be assigned to a distribution of topics with techniques like LDA or NMF, that are related to unsupervised soft clustering but consider text semantics. More recently, Interactive Topic Modeling (ITM) has been introduced to incorporate human expertise in the modeling process. This enables real-time hyperparameter optimization and topic manipulation on document and keyword level. However, current ITM applications are mostly accessible to experienced data scientists, who lack domain knowledge. Domain experts, on the other hand, usually lack the data science expertise to build and use ITM applications. This thesis presents an Interactive Topic Modeling application accessible to non-technical data analysts in the broadcasting domain. The application allows domain experts, like journalists, to explore themes in various produced media content in a dynamic, intuitive and efficient manner. An interactive interface, with an embedded NMF topic model, enables users to filter on various data sources, configure and refine the topic model, interpret and evaluate the output by visualizations, and analyze the data in wider context. This application was designed in collaboration with domain experts in focus group sessions, according to human-centered design principles. An evaluation study with ten participants shows that journalists and data analysts without any natural language processing knowledge agree that the application is not only usable, but also very user-friendly, effective and efficient. A SUS score of 81 was received, and user experience and user perceptions of control questionnaires both received an average of 4.1 on a five-point Likert scale. The ITM application thus enables this specific user group to extract meaningful topics from their produced media content, and use these results in broader perspective to perform exploratory data analysis. The success of the final application design presented in this thesis shows that the knowledge gap between data scientists and domain experts in the broadcasting field has been filled. In bigger perspective; machine learning applications can be made more accessible by translating hidden low-level details of complex models into high-level model interactions, presented in a user interface

    Mining diverse consumer preferences for bundling and recommendation

    Get PDF

    Algorithms, applications and systems towards interpretable pattern mining from multi-aspect data

    Get PDF
    How do humans move around in the urban space and how do they differ when the city undergoes terrorist attacks? How do users behave in Massive Open Online courses~(MOOCs) and how do they differ if some of them achieve certificates while some of them not? What areas in the court elite players, such as Stephen Curry, LeBron James, like to make their shots in the course of the game? How can we uncover the hidden habits that govern our online purchases? Are there unspoken agendas in how different states pass legislation of certain kinds? At the heart of these seemingly unconnected puzzles is this same mystery of multi-aspect mining, i.g., how can we mine and interpret the hidden pattern from a dataset that simultaneously reveals the associations, or changes of the associations, among various aspects of the data (e.g., a shot could be described with three aspects, player, time of the game, and area in the court)? Solving this problem could open gates to a deep understanding of underlying mechanisms for many real-world phenomena. While much of the research in multi-aspect mining contribute broad scope of innovations in the mining part, interpretation of patterns from the perspective of users (or domain experts) is often overlooked. Questions like what do they require for patterns, how good are the patterns, or how to read them, have barely been addressed. Without efficient and effective ways of involving users in the process of multi-aspect mining, the results are likely to lead to something difficult for them to comprehend. This dissertation proposes the M^3 framework, which consists of multiplex pattern discovery, multifaceted pattern evaluation, and multipurpose pattern presentation, to tackle the challenges of multi-aspect pattern discovery. Based on this framework, we develop algorithms, applications, and analytic systems to enable interpretable pattern discovery from multi-aspect data. Following the concept of meaningful multiplex pattern discovery, we propose PairFac to close the gap between human information needs and naive mining optimization. We demonstrate its effectiveness in the context of impact discovery in the aftermath of urban disasters. We develop iDisc to target the crossing of multiplex pattern discovery with multifaceted pattern evaluation. iDisc meets the specific information need in understanding multi-level, contrastive behavior patterns. As an example, we use iDisc to predict student performance outcomes in Massive Open Online Courses given users' latent behaviors. FacIt is an interactive visual analytic system that sits at the intersection of all three components and enables for interpretable, fine-tunable, and scrutinizable pattern discovery from multi-aspect data. We demonstrate each work's significance and implications in its respective problem context. As a whole, this series of studies is an effort to instantiate the M^3 framework and push the field of multi-aspect mining towards a more human-centric process in real-world applications

    Manipulating the Capacity of Recommendation Models in Recall-Coverage Optimization

    Get PDF
    Traditional approaches in Recommender Systems ignore the problem of long-tail recommendations. There is no systematic approach to control the magnitude of long-tail recommendations generated by the models, and there is not even proper methodology to evaluate the quality of long-tail recommendations. This thesis addresses the long-tail recommendation problem from both the algorithmic and evaluation perspective. We proposed controlling the magnitude of long-tail recommendations generated by models through the manipulation with capacity hyperparameters of learning algorithms, and we dene such hyperparameters for multiple state-of-the-art algorithms. We also summarize multiple such algorithms under the common framework of the score function, which allows us to apply popularity-based regularization to all of them. We propose searching for Pareto-optimal states in the Recall-Coverage plane as the right way to search for long-tail, high-accuracy models. On the set of exhaustive experiments, we empirically demonstrate the corectness of our theory on a mixture of public and industrial datasets for 5 dierent algorithms and their dierent versions.Traditional approaches in Recommender Systems ignore the problem of long-tail recommendations. There is no systematic approach to control the magnitude of long-tail recommendations generated by the models, and there is not even proper methodology to evaluate the quality of long-tail recommendations. This thesis addresses the long-tail recommendation problem from both the algorithmic and evaluation perspective. We proposed controlling the magnitude of long-tail recommendations generated by models through the manipulation with capacity hyperparameters of learning algorithms, and we dene such hyperparameters for multiple state-of-the-art algorithms. We also summarize multiple such algorithms under the common framework of the score function, which allows us to apply popularity-based regularization to all of them. We propose searching for Pareto-optimal states in the Recall-Coverage plane as the right way to search for long-tail, high-accuracy models. On the set of exhaustive experiments, we empirically demonstrate the corectness of our theory on a mixture of public and industrial datasets for 5 dierent algorithms and their dierent versions

    PERSONALIZED POINT OF INTEREST RECOMMENDATIONS WITH PRIVACY-PRESERVING TECHNIQUES

    Get PDF
    Location-based services (LBS) have become increasingly popular, with millions of people using mobile devices to access information about nearby points of interest (POIs). Personalized POI recommender systems have been developed to assist users in discovering and navigating these POIs. However, these systems typically require large amounts of user data, including location history and preferences, to provide personalized recommendations. The collection and use of such data can pose significant privacy concerns. This dissertation proposes a privacy-preserving approach to POI recommendations that address these privacy concerns. The proposed approach uses clustering, tabular generative adversarial networks, and differential privacy to generate synthetic user data, allowing for personalized recommendations without revealing individual user data. Specifically, the approach clusters users based on their fuzzy locations, generates synthetic user data using a tabular generative adversarial network and perturbs user data with differential privacy before it is used for recommendation. The proposed approaches achieve well-balanced trade-offs between accuracy and privacy preservation and can be applied to different recommender systems. The approach is evaluated through extensive experiments on real-world POI datasets, demonstrating that it is effective in providing personalized recommendations while preserving user privacy. The results show that the proposed approach achieves comparable accuracy to traditional POI recommender systems that do not consider privacy while providing significant privacy guarantees for users. The research\u27s contribution is twofold: it compares different methods for synthesizing user data specifically for POI recommender systems and offers a general privacy-preserving framework for different recommender systems. The proposed approach provides a novel solution to the privacy concerns of POI recommender systems, contributes to the development of more trustworthy and user-friendly LBS applications, and can enhance the trust of users in these systems

    New accurate, explainable, and unbiased machine learning models for recommendation with implicit feedback.

    Get PDF
    Recommender systems have become ubiquitous Artificial Intelligence (AI) tools that play an important role in filtering online information in our daily lives. Whether we are shopping, browsing movies, or listening to music online, AI recommender systems are working behind the scene to provide us with curated and personalized content, that has been predicted to be relevant to our interest. The increasing prevalence of recommender systems has challenged researchers to develop powerful algorithms that can deliver recommendations with increasing accuracy. In addition to the predictive accuracy of recommender systems, recent research has also started paying attention to their fairness, in particular with regard to the bias and transparency of their predictions. This dissertation contributes to advancing the state of the art in fairness in AI by proposing new Machine Learning models and algorithms that aim to improve the user\u27s experience when receiving recommendations, with a focus that is positioned at the nexus of three objectives, namely accuracy, transparency, and unbiasedness of the predictions. In our research, we focus on state-of-the-art Collaborative Filtering (CF) recommendation approaches trained on implicit feedback data. More specifically, we address the limitations of two established deep learning approaches in two distinct recommendation settings, namely recommendation with user profiles and sequential recommendation. First, we focus on a state of the art pairwise ranking model, namely Bayesian Personalized Ranking (BPR), which has been found to outperform pointwise models in predictive accuracy in the recommendation with the user profiles setting. Specifically, we address two limitations of BPR: (1) BPR is a black box model that does not explain its outputs, thus limiting the user\u27s trust in the recommendations, and the analyst\u27s ability to scrutinize a model\u27s outputs; and (2) BPR is vulnerable to exposure bias due to the data being Missing Not At Random (MNAR). This exposure bias usually translates into an unfairness against the least popular items because they risk being under-exposed by the recommender system. We propose a novel explainable loss function and a corresponding model called Explainable Bayesian Personalized Ranking (EBPR) that generates recommendations along with item-based explanations. Then, we theoretically quantify the additional exposure bias resulting from the explainability, and use it as a basis to propose an unbiased estimator for the ideal EBPR loss. This being done, we perform an empirical study on three real-world benchmarking datasets that demonstrate the advantages of our proposed models, compared to existing state of the art techniques. Next, we shift our attention to sequential recommendation systems and focus on modeling and mitigating exposure bias in BERT4Rec, which is a state-of-the-art recommendation approach based on bidirectional transformers. The bi-directional representation capacity in BERT4Rec is based on the Cloze task, a.k.a. Masked Language Model, which consists of predicting randomly masked items within the sequence, assuming that the true interacted item is the most relevant one. This results in an exposure bias, where non-interacted items with low exposure propensities are assumed to be irrelevant. Thus far, the most common approach to mitigating exposure bias in recommendation has been Inverse Propensity Scoring (IPS), which consists of down-weighting the interacted predictions in the loss function in proportion to their propensities of exposure, yielding a theoretically unbiased learning. We first argue and prove that IPS does not extend to sequential recommendation because it fails to account for the sequential nature of the problem. We then propose a novel propensity scoring mechanism, that we name Inverse Temporal Propensity Scoring (ITPS), which is used to theoretically debias the Cloze task in sequential recommendation. We also rely on the ITPS framework to propose a bidirectional transformer-based model called ITPS-BERT4Rec. Finally, we empirically demonstrate the debiasing capabilities of our proposed approach and its robustness to the severity of exposure bias. Our proposed explainable approach in recommendation with user profiles, EBPR, showed an increase in ranking accuracy of about 4% and an increase in explainability of about 7% over the baseline BPR model when performing experiments on real-world recommendation datasets. Moreover, experiments on a real-world unbiased dataset demonstrated the importance of coupling explainability and exposure debiasing in capturing the true preferences of the user with a significant improvement of 1% over the baseline unbiased model UBPR. Furthermore, coupling explainability with exposure debiasing was also empirically proven to mitigate popularity bias with an improvement in popularity debiasing metrics of over 10% on three real-world recommendation tasks over the unbiased UBPR model. These results demonstrate the viability of our proposed approaches in recommendation with user profiles and their capacity to improve the user\u27s experience in recommendation by better capturing and modeling their true preferences, improving the explainability of the recommendations, and presenting them with more diverse recommendations that span a larger portion of the item catalog. On the other hand, our proposed approach in sequential recommendation ITPS-BERT4Rec has demonstrated a significant increase of 1% in terms of modeling the true preferences of the user in a semi-synthetic setting over the state-of-the-art sequential recommendation model BERT4Rec while also being unbiased in terms of exposure. Similarly, ITPS-BERT4Rec showed an average increase of 8.7% over BERT4Rec in three real-world recommendation settings. Moreover, empirical experiments demonstrated the robustness of our proposed ITPS-BERT4Rec model to increasing levels of exposure bias and its stability in terms of variance. Furthermore, experiments on popularity debiasing showed a significant advantage of our proposed ITPS-BERT4Rec model for both the short and long term sequences. Finally, ITPS-BERT4Rec showed respective improvements of around 60%, 470%, and 150% over vanilla BERT4Rec in capturing the temporal dependencies between the items within the sequences of interactions for three different evaluation metrics. These results demonstrate the potential of our proposed unbiased estimator to improve the user experience in the context of sequential recommendation by presenting them with more accurate and diverse recommendations that better match their true preferences and the sequential dependencies between the recommended items

    Robustness analysis and controller synthesis for bilateral teleoperation systems via IQCs

    Get PDF
    corecore