2,983 research outputs found
A rule-based machine learning model for financial fraud detection
Financial fraud is a growing problem that poses a significant threat to the banking industry, the government sector, and the public. In response, financial institutions must continuously improve their fraud detection systems. Although preventative and security precautions are implemented to reduce financial fraud, criminals are constantly adapting and devising new ways to evade fraud prevention systems. The classification of transactions as legitimate or fraudulent poses a significant challenge for existing classification models due to highly imbalanced datasets. This research aims to develop rules to detect fraud transactions that do not involve any resampling technique. The effectiveness of the rule-based model (RBM) is assessed using a variety of metrics such as accuracy, specificity, precision, recall, confusion matrix, Matthew’s correlation coefficient (MCC), and receiver operating characteristic (ROC) values. The proposed rule-based model is compared to several existing machine learning models such as random forest (RF), decision tree (DT), multi-layer perceptron (MLP), k-nearest neighbor (KNN), naive Bayes (NB), and logistic regression (LR) using two benchmark datasets. The results of the experiment show that the proposed rule-based model beat the other methods, reaching accuracy and precision of 0.99 and 0.99, respectively
Structuring the State’s Voice of Contention in Harmonious Society: How Party Newspapers Cover Social Protests in China
During the Chinese Communist Party’s (CCP) campaign of building a ‘harmonious society’, how do the official newspapers cover the instances of social contention on the ground? Answering this question will shed light not only on how the party press works but also on how the state and the society interact in today’s China. This thesis conceptualises this phenomenon with a multi-faceted and multi-levelled notion of ‘state-initiated contentious public sphere’ to capture the complexity of mediated relations between the state and social contention in the party press. Adopting a relational approach, this thesis analyses 1758 news reports of ‘mass incident’ in the People’s Daily and the Guangming Daily between 2004 and 2020, employing cluster analysis, qualitative comparative analysis, and social network analysis. The thesis finds significant differences in the patterns of contentious coverage in the party press at the level of event and province and an uneven distribution of attention to social contention across incidents and regions. For ‘reported regions’, the thesis distinguishes four types of coverage and presents how party press responds differently to social contention in different scenarios at the provincial level. For ‘identified incidents’, the thesis distinguishes a cumulative type of visibility based on the quantity of coverage from a relational visibility based on the structure emerging from coverage and explains how different news-making rationales determine whether instances receive similar amounts of coverage or occupy similar positions within coverage. Eventually, by demonstrating how the Chinese state strategically uses party press to respond to social contention and how social contention is journalistically placed in different positions in the state’s eyes, this thesis argues that what social contention leads to is the establishment of complex state-contention relations channelled through the party press
Analysis and monitoring of single HaCaT cells using volumetric Raman mapping and machine learning
No explorer reached a pole without a map, no chef served a meal without tasting, and no surgeon implants untested devices. Higher accuracy maps, more sensitive taste buds, and more rigorous tests increase confidence in positive outcomes. Biomedical manufacturing necessitates rigour, whether developing drugs or creating bioengineered tissues [1]–[4]. By designing a dynamic environment that supports mammalian cells during experiments within a Raman spectroscope, this project provides a platform that more closely replicates in vivo conditions. The platform also adds the opportunity to automate the adaptation of the cell culture environment, alongside spectral monitoring of cells with machine learning and three-dimensional Raman mapping, called volumetric Raman mapping (VRM). Previous research highlighted key areas for refinement, like a structured approach for shading Raman maps [5], [6], and the collection of VRM [7]. Refining VRM shading and collection was the initial focus, k-means directed shading for vibrational spectroscopy map shading was developed in Chapter 3 and exploration of depth distortion and VRM calibration (Chapter 4). “Cage” scaffolds, designed using the findings from Chapter 4 were then utilised to influence cell behaviour by varying the number of cage beams to change the scaffold porosity. Altering the porosity facilitated spectroscopy investigation into previously observed changes in cell biology alteration in response to porous scaffolds [8]. VRM visualised changed single human keratinocyte (HaCaT) cell morphology, providing a complementary technique for machine learning classification. Increased technical rigour justified progression onto in-situ flow chamber for Raman spectroscopy development in Chapter 6, using a Psoriasis (dithranol-HaCaT) model on unfixed cells. K-means-directed shading and principal component analysis (PCA) revealed HaCaT cell adaptations aligning with previous publications [5] and earlier thesis sections. The k-means-directed Raman maps and PCA score plots verified the drug-supplying capacity of the flow chamber, justifying future investigation into VRM and machine learning for monitoring single cells within the flow chamber
Context-aware SAR image ship detection and recognition network
With the development of deep learning, synthetic aperture radar (SAR) ship detection and recognition based on deep learning have gained widespread application and advancement. However, there are still challenging issues, manifesting in two primary facets: firstly, the imaging mechanism of SAR results in significant noise interference, making it difficult to separate background noise from ship target features in complex backgrounds such as ports and urban areas; secondly, the heterogeneous scales of ship target features result in the susceptibility of smaller targets to information loss, rendering them elusive to detection. In this article, we propose a context-aware one-stage ship detection network that exhibits heightened sensitivity to scale variations and robust resistance to noise interference. Then we introduce a Local feature refinement module (LFRM), which utilizes multiple receptive fields of different sizes to extract local multi-scale information, followed by a two-branch channel-wise attention approach to obtain local cross-channel interactions. To minimize the effect of a complex background on the target, we design the global context aggregation module (GCAM) to enhance the feature representation of the target and suppress the interference of noise by acquiring long-range dependencies. Finally, we validate the effectiveness of our method on three publicly available SAR ship detection datasets, SAR-Ship-Dataset, high-resolution SAR images dataset (HRSID), and SAR ship detection dataset (SSDD). The experimental results show that our method is more competitive, with AP50s of 96.3, 93.3, and 96.2% on the three publicly available datasets, respectively
INTEGRATED COMPUTER-AIDED DESIGN, EXPERIMENTATION, AND OPTIMIZATION APPROACH FOR PEROVSKITES AND PETROLEUM PACKAGING PROCESSES
According to the World Economic Forum report, the U.S. currently has an energy efficiency of just 30%, thus illustrating the potential scope and need for efficiency enhancement and waste minimization. In the U.S. energy sector, petroleum and solar energy are the two key pillars that have the potential to create research opportunities for transition to a cleaner, greener, and sustainable future. In this research endeavor, the focus is on two pivotal areas: (i) Computer-aided perovskite solar cell synthesis; and (ii) Optimization of flow processes through multiproduct petroleum pipelines. In the area of perovskite synthesis, the emphasis is on the enhancement of structural stability, lower costs, and sustainability. Utilizing modeling and optimization methods for computer-aided molecular design (CAMD), efficient, sustainable, less toxic, and economically viable alternatives to conventional lead-based perovskites are obtained. In the second area of optimization of flow processes through multiproduct petroleum pipelines, an actual industrial-scale operation for packaging multiple lube-oil blends is studied. Through an integrated approach of experimental characterization, process design, procedural improvements, testing protocols, control mechanisms, mathematical modeling, and optimization, the limitations of traditional packaging operations are identified, and innovative operational paradigms and strategies are developed by incorporating methods from process systems engineering and data-driven approaches
Computational and experimental studies on the reaction mechanism of bio-oil components with additives for increased stability and fuel quality
As one of the world’s largest palm oil producers, Malaysia encountered a major disposal problem as vast amount of oil palm biomass wastes are produced. To overcome this problem, these biomass wastes can be liquefied into biofuel with fast pyrolysis technology. However, further upgradation of fast pyrolysis bio-oil via direct solvent addition was required to overcome it’s undesirable attributes. In addition, the high production cost of biofuels often hinders its commercialisation. Thus, the designed solvent-oil blend needs to achieve both fuel functionality and economic targets to be competitive with the conventional diesel fuel.
In this thesis, a multi-stage computer-aided molecular design (CAMD) framework was employed for bio-oil solvent design. In the design problem, molecular signature descriptors were applied to accommodate different classes of property prediction models. However, the complexity of the CAMD problem increases as the height of signature increases due to the combinatorial nature of higher order signature. Thus, a consistency rule was developed reduce the size of the CAMD problem. The CAMD problem was then further extended to address the economic aspects via fuzzy multi-objective optimisation approach.
Next, a rough-set based machine learning (RSML) model has been proposed to correlate the feedstock characterisation and pyrolysis condition with the pyrolysis bio-oil properties by generating decision rules. The generated decision rules were analysed from a scientific standpoint to identify the underlying patterns, while ensuring the rules were logical. The decision rules generated can be used to select optimal feedstock composition and pyrolysis condition to produce pyrolysis bio-oil of targeted fuel properties.
Next, the results obtained from the computational approaches were verified through experimental study. The generated pyrolysis bio-oils were blended with the identified solvents at various mixing ratio. In addition, emulsification of the solvent-oil blend in diesel was also conducted with the help of surfactants. Lastly, potential extensions and prospective work for this study have been discuss in the later part of this thesis. To conclude, this thesis presented the combination of computational and experimental approaches in upgrading the fuel properties of pyrolysis bio-oil. As a result, high quality biofuel can be generated as a cleaner burning replacement for conventional diesel fuel
Combined Nutrition and Exercise Interventions in Community Groups
Diet and physical activity are two key modifiable lifestyle factors that influence health across the lifespan (prevention and management of chronic diseases and reduction of the risk of premature death through several biological mechanisms). Community-based interventions contribute to public health, as they have the potential to reach high population-level impact, through the focus on groups that share a common culture or identity in their natural living environment. While the health benefits of a balanced diet and regular physical activity are commonly studied separately, interventions that combine these two lifestyle factors have the potential to induce greater benefits in community groups rather than strategies focusing only on one or the other. Thus, this Special Issue entitled “Combined Nutrition and Exercise Interventions in Community Groups” is comprised of manuscripts that highlight this combined approach (balanced diet and regular physical activity) in community settings. The contributors to this Special Issue are well-recognized professionals in complementary fields such as education, public health, nutrition, and exercise. This Special Issue highlights the latest research regarding combined nutrition and exercise interventions among different community groups and includes research articles developed through five continents (Africa, Asia, America, Europe and Oceania), as well as reviews and systematic reviews
LIPIcs, Volume 251, ITCS 2023, Complete Volume
LIPIcs, Volume 251, ITCS 2023, Complete Volum
Construction cost prediction system based on Random Forest optimized by the Bird Swarm Algorithm
Predicting construction costs often involves disadvantages, such as low prediction accuracy, poor promotion value and unfavorable efficiency, owing to the complex composition of construction projects, a large number of personnel, long working periods and high levels of uncertainty. To address these concerns, a prediction index system and a prediction model were developed. First, the factors influencing construction cost were first identified, a prediction index system including 14 secondary indexes was constructed and the methods of obtaining data were presented elaborately. A prediction model based on the Random Forest (RF) algorithm was then constructed. Bird Swarm Algorithm (BSA) was used to optimize RF parameters and thereby avoid the effect of the random selection of RF parameters on prediction accuracy. Finally, the engineering data of a construction company in Xinyu, China were selected as a case study. The case study showed that the maximum relative error of the proposed model was only 1.24%, which met the requirements of engineering practice. For the selected cases, the minimum prediction index system that met the requirement of prediction accuracy included 11 secondary indexes. Compared with classical metaheuristic optimization algorithms (Particle Swarm Optimization, Genetic Algorithms, Tabu Search, Simulated Annealing, Ant Colony Optimization, Differential Evolution and Artificial Fish School), BSA could more quickly determine the optimal combination of calculation parameters, on average. Compared with the classical and latest forecasting methods (Back Propagation Neural Network, Support Vector Machines, Stacked Auto-Encoders and Extreme Learning Machine), the proposed model exhibited higher forecasting accuracy and efficiency. The prediction model proposed in this study could better support the prediction of construction cost, and the prediction results provided a basis for optimizing the cost management of construction projects
Novel Neural Network Applications to Mode Choice in Transportation: Estimating Value of Travel Time and Modelling Psycho-Attitudinal Factors
Whenever researchers wish to study the behaviour of individuals choosing among a set of alternatives, they usually rely on models based on the random utility theory, which postulates that the single individuals modify their behaviour so that they can maximise of their utility. These models, often identified as discrete choice models (DCMs), usually require the definition of the utilities for each alternative, by first identifying the variables influencing the decisions. Traditionally, DCMs focused on observable variables and treated users as optimizing tools with predetermined needs. However, such an approach is in contrast with the results from studies in social sciences which show that choice behaviour can be influenced by psychological factors such as attitudes and preferences. Recently there have been formulations of DCMs which include latent constructs for capturing the impact of subjective factors. These are called hybrid choice models or integrated choice and latent variable models (ICLV). However, DCMs are not exempt from issues, like, the fact that researchers have to choose the variables to include and their relations to define the utilities. This is probably one of the reasons which has recently lead to an influx of numerous studies using machine learning (ML) methods to study mode choice, in which researchers tried to find alternative methods to analyse travellers’ choice behaviour. A ML algorithm is any generic method that uses the data itself to understand and build a model, improving its performance the more it is allowed to learn. This means they do not require any a priori input or hypotheses on the structure and nature of the relationships between the several variables used as its inputs. ML models are usually considered black-box methods, but whenever researchers felt the need for interpretability of ML results, they tried to find alternative ways to use ML methods, like building them by using some a priori knowledge to induce specific constrains. Some researchers also transformed the outputs of ML algorithms so that they could be interpreted from an economic point of view, or built hybrid ML-DCM models. The object of this thesis is that of investigating the benefits and the disadvantages deriving from adopting either DCMs or ML methods to study the phenomenon of mode choice in transportation. The strongest feature of DCMs is the fact that they produce very precise and descriptive results, allowing for a thorough interpretation of their outputs. On the other hand, ML models offer a substantial benefit by being truly data-driven methods and thus learning most relations from the data itself. As a first contribution, we tested an alternative method for calculating the value of travel time (VTT) through the results of ML algorithms. VTT is a very informative parameter to consider, since the time consumed by individuals whenever they need to travel normally represents an undesirable factor, thus they are usually willing to exchange their money to reduce travel times. The method proposed is independent from the mode-choice functions, so it can be applied to econometric models and ML methods equally, if they allow the estimation of individual level probabilities. Another contribution of this thesis is a neural network (NN) for the estimation of choice models with latent variables as an alternative to DCMs. This issue arose from wanting to include in ML models not only level of service variables of the alternatives, and socio-economic attributes of the individuals, but also psycho-attitudinal indicators, to better describe the influence of psychological factors on choice behaviour. The results were estimated by using two different datasets. Since NN results are dependent on the values of their hyper-parameters and on their initialization, several NNs were estimated by using different hyper-parameters to find the optimal values, which were used to verify the stability of the results with different initializations
- …