3,567 research outputs found

    A visual analytics platform for competitive intelligence

    Get PDF
    Silva, D., & Bação, F. (2023). MapIntel: A visual analytics platform for competitive intelligence. Expert Systems, [e13445]. https://doi.org/https://www.authorea.com/doi/full/10.22541/au.166785335.50477185, https://doi.org/10.1111/exsy.13445 --- Funding Information: This work was supported by the (research grant under the DSAIPA/DS/0116/2019 project). Fundação para a Ciência e Tecnologia of Ministério da Ciência e Tecnologia e Ensino SuperiorCompetitive Intelligence allows an organization to keep up with market trends and foresee business opportunities. This practice is mainly performed by analysts scanning for any piece of valuable information in a myriad of dispersed and unstructured sources. Here we present MapIntel, a system for acquiring intelligence from vast collections of text data by representing each document as a multidimensional vector that captures its own semantics. The system is designed to handle complex Natural Language queries and visual exploration of the corpus, potentially aiding overburdened analysts in finding meaningful insights to help decision-making. The system searching module uses a retriever and re-ranker engine that first finds the closest neighbours to the query embedding and then sifts the results through a cross-encoder model that identifies the most relevant documents. The browsing or visualization module also leverages the embeddings by projecting them onto two dimensions while preserving the multidimensional landscape, resulting in a map where semantically related documents form topical clusters which we capture using topic modelling. This map aims at promoting a fast overview of the corpus while allowing a more detailed exploration and interactive information encountering process. We evaluate the system and its components on the 20 newsgroups data set, using the semantic document labels provided, and demonstrate the superiority of Transformer-based components. Finally, we present a prototype of the system in Python and show how some of its features can be used to acquire intelligence from a news article corpus we collected during a period of 8 months.preprintauthorsversionepub_ahead_of_prin

    On the Generation of Realistic and Robust Counterfactual Explanations for Algorithmic Recourse

    Get PDF
    This recent widespread deployment of machine learning algorithms presents many new challenges. Machine learning algorithms are usually opaque and can be particularly difficult to interpret. When humans are involved, algorithmic and automated decisions can negatively impact people’s lives. Therefore, end users would like to be insured against potential harm. One popular way to achieve this is to provide end users access to algorithmic recourse, which gives end users negatively affected by algorithmic decisions the opportunity to reverse unfavorable decisions, e.g., from a loan denial to a loan acceptance. In this thesis, we design recourse algorithms to meet various end user needs. First, we propose methods for the generation of realistic recourses. We use generative models to suggest recourses likely to occur under the data distribution. To this end, we shift the recourse action from the input space to the generative model’s latent space, allowing to generate counterfactuals that lie in regions with data support. Second, we observe that small changes applied to the recourses prescribed to end users likely invalidate the suggested recourse after being nosily implemented in practice. Motivated by this observation, we design methods for the generation of robust recourses and for assessing the robustness of recourse algorithms to data deletion requests. Third, the lack of a commonly used code-base for counterfactual explanation and algorithmic recourse algorithms and the vast array of evaluation measures in literature make it difficult to compare the per formance of different algorithms. To solve this problem, we provide an open source benchmarking library that streamlines the evaluation process and can be used for benchmarking, rapidly developing new methods, and setting up new experiments. In summary, our work contributes to a more reliable interaction of end users and machine learned models by covering fundamental aspects of the recourse process and suggests new solutions towards generating realistic and robust counterfactual explanations for algorithmic recourse

    Performance analysis of various machine learning algorithms for CO2 leak prediction and characterization in geo-sequestration injection wells

    Get PDF
    The effective detection and prevention of CO2 leakage in active injection wells are paramount for safe carbon capture and storage (CCS) initiatives. This study assesses five fundamental machine learning algorithms, namely, Support Vector Regression (SVR), K-Nearest Neighbor Regression (KNNR), Decision Tree Regression (DTR), Random Forest Regression (RFR), and Artificial Neural Network (ANN), for use in developing a robust data-driven model to predict potential CO2 leakage incidents in injection wells. Leveraging wellhead and bottom-hole pressure and temperature data, the models aim to simultaneously predict the location and size of leaks. A representative dataset simulating various leak scenarios in a saline aquifer reservoir was utilized. The findings reveal crucial insights into the relationships between the variables considered and leakage characteristics. With its positive linear correlation with depth of leak, wellhead pressure could be a pivotal indicator of leak location, while the negative linear relationship with well bottom-hole pressure demonstrated the strongest association with leak size. Among the predictive models examined, the highest prediction accuracy was achieved by the KNNR model for both leak localization and sizing. This model displayed exceptional sensitivity to leak size, and was able to identify leak magnitudes representing as little as 0.0158% of the total main flow with relatively high levels of accuracy. Nonetheless, the study underscored that accurate leak sizing posed a greater challenge for the models compared to leak localization. Overall, the findings obtained can provide valuable insights into the development of efficient data-driven well-bore leak detection systems.<br/

    Deep Clustering for Data Cleaning and Integration

    Get PDF
    Deep Learning (DL) techniques now constitute the state-of-theart for important problems in areas such as text and image processing, and there have been impactful results that deploy DL in several data management tasks. Deep Clustering (DC) has recently emerged as a sub-discipline of DL, in which data representations are learned in tandem with clustering, with a view to automatically identifying the features of the data that lead to improved clustering results. While DC has been used to good effect in several domains, particularly in image processing, the potential of DC for data management tasks remains unexplored. In this paper, we address this gap by investigating the suitability of DC for data cleaning and integration tasks, specifically schema inference, entity resolution and domain discovery, from the perspective of tables, rows and columns, respectively. In this setting, we compare and contrast several DC and non-DC clustering algorithms using standard benchmarks. The results show, among other things, that the most effective DC algorithms consistently outperform non-DC clustering algorithms for data integration tasks. Experiments also show consistently strong performance compared with state-of-the-art bespoke algorithms for each of the data integration tasks

    Dynamic Circular Network-Based Federated Dual-View Learning for Multivariate Time Series Anomaly Detection

    Get PDF
    Multivariate time-series data exhibit intricate correlations in both temporal and spatial dimensions. However, existing network architectures often overlook dependencies in the spatial dimension and struggle to strike a balance between long-term and short-term patterns when extracting features from the data. Furthermore, industries within the business community are hesitant to share their raw data, which hinders anomaly prediction accuracy and detection performance. To address these challenges, the authors propose a dynamic circular network-based federated dual-view learning approach. Experimental results from four open-source datasets demonstrate that the method outperforms existing methods in terms of accuracy, recall, and F1_score for anomaly detection

    An innovative network intrusion detection system (NIDS): Hierarchical deep learning model based on Unsw-Nb15 dataset

    Get PDF
    With the increasing prevalence of network intrusions, the development of effective network intrusion detection systems (NIDS) has become crucial. In this study, we propose a novel NIDS approach that combines the power of long short-term memory (LSTM) and attention mechanisms to analyze the spatial and temporal features of network traffic data. We utilize the benchmark UNSW-NB15 dataset, which exhibits a diverse distribution of patterns, including a significant disparity in the size of the training and testing sets. Unlike traditional machine learning techniques like support vector machines (SVM) and k-nearest neighbors (KNN) that often struggle with limited feature sets and lower accuracy, our proposed model overcomes these limitations. Notably, existing models applied to this dataset typically require manual feature selection and extraction, which can be time-consuming and less precise. In contrast, our model achieves superior results in binary classification by leveraging the advantages of LSTM and attention mechanisms. Through extensive experiments and evaluations with state-of-the-art ML/DL models, we demonstrate the effectiveness and superiority of our proposed approach. Our findings highlight the potential of combining LSTM and attention mechanisms for enhanced network intrusion detection

    Face Emotion Recognition Based on Machine Learning: A Review

    Get PDF
    Computers can now detect, understand, and evaluate emotions thanks to recent developments in machine learning and information fusion. Researchers across various sectors are increasingly intrigued by emotion identification, utilizing facial expressions, words, body language, and posture as means of discerning an individual's emotions. Nevertheless, the effectiveness of the first three methods may be limited, as individuals can consciously or unconsciously suppress their true feelings. This article explores various feature extraction techniques, encompassing the development of machine learning classifiers like k-nearest neighbour, naive Bayesian, support vector machine, and random forest, in accordance with the established standard for emotion recognition. The paper has three primary objectives: firstly, to offer a comprehensive overview of effective computing by outlining essential theoretical concepts; secondly, to describe in detail the state-of-the-art in emotion recognition at the moment; and thirdly, to highlight important findings and conclusions from the literature, with an emphasis on important obstacles and possible future paths, especially in the creation of state-of-the-art machine learning algorithms for the identification of emotions

    A fine-tuning of decision tree classifier for ransomware detection based on memory data

    Get PDF
    Ransomware has evolved into a pervasive and extremely disruptive cybersecurity threat, causing substantial operational and financial damage to individuals and businesses. This article explores the critical domain of Ransomware detection and employs Machine Learning (ML) classifiers, particularly Decision Tree (DT), for Ransomware detection. The article also delves into the usefulness of DT in identifying Ransomware attacks, leveraging the innate ability of DT to recognize complex patterns within datasets. Instead of merely introducing DT as a detection method, we adopt a comprehensive approach, emphasizing the importance of fine-tuning DT hyperparameters. The optimization of these parameters is essential for maximizing the DT capability to identify Ransomware threats accurately. The obfuscated-MalMem2022 dataset, which is well-known for its extensive and challenging Ransomware-related data, was utilized to evaluate the effectiveness of DT in detecting Ransomware. The implementation uses the versatile Python programming language, renowned for its efficiency and adaptability in data analysis and ML tasks. Notably, the DT classifier consistently outperforms other classifiers in Ransomware detection, including K-Nearest Neighbors, Gradient Boosting Tree, Naive Bayes, and Linear Support Vector Classifier. For instance, the DT demonstrated exceptional effectiveness in distinguishing between Ransomware and benign data, as evidenced by its remarkable accuracy of 99.97%

    An in-depth investigation of five machine learning algorithms for optimizing mixed-asset portfolios including REITs

    Get PDF
    Real estate is a favored investment option as it allows investors to diversify their portfolios and minimize risk. Investors can invest in real estate directly by purchasing a property, or through real estate investment funds (REITs) where they can purchase shares in companies that own and manage real estate. Investing in REITs has become increasingly popular because it eliminates some of the disadvantages associated with direct real estate investment, such as the need for a large upfront payment. When investing in mixed asset portfolios, it is crucial to predict future prices accurately to ensure profitable and less risky asset allocation. However, literature on price prediction often focuses on only one or two algorithms, and there is no research that explores REITs’ price prediction in the context of portfolio optimization. To address this gap, we conducted a thorough evaluation of 5 machine learning algorithms (ML), including Ordinary Least Squares Linear Regression (LR), Support Vector Regression (SVR), k-Nearest Neighbors Regression (KNN), Extreme Gradient Boosting (XGBoost), and Long/Short-Term Memory Neural Networks (LSTM), as well as other financial benchmarks like Holt’s Exponential Smoothing (HES), Trigonometric Seasonality, Box–Cox Transformation, ARMA Errors, Trend, and Seasonal Components (TBATS), and Auto-Regression Integrated Moving Average (ARIMA). We applied these algorithms to predict future prices for 30 REITs from the US, UK, and Australia, as well as 30 stocks and 30 bonds. The assets were then used as part of a portfolio, which we optimized using a genetic algorithm. Our results showed that using ML algorithms for price prediction provided at least three times the return over benchmark models and reduced risk by almost two-fold. For REITs, we observed that the use of ML algorithms led to a higher allocation to REITs diversified by country. In particular, our results showed that SVR was the best-performing algorithm in terms of risk-adjusted returns across different time horizons, as confirmed by our Friedman test results (Sharpe ratio). Overall, our study highlights the effectiveness of ML algorithms in predicting asset prices and optimizing portfolio allocation
    • …
    corecore