317 research outputs found

    A Comparative Study of Sentiment Analysis Methods for Detecting Fake Reviews in E-Commerce

    Get PDF
    The popularity of the e-commerce system has increased, especially under the COVID scenario. Consumer product reviews from the past have had a significant impact on influencing consumers' purchasing decisions. Fake reviews—those written by humans and computers that engage in dishonest behavior—are consequently generated to increase product sales. The fake reviews hurt consumers and are dishonest. The goal of this research is to examine and evaluate the performance of various methods for identifying fake reviews. The well-known and widely-used Amazon Review Data (2018) dataset was used for this research. The first 10 product categories on Amazon.com with favorable feedback will be provided in the data section. After that, perform fundamental data preparation procedures such as special character trimming, bag of words, TF-IDF, etc. The models are trained to create a dataset for detecting fake reviews. This research compares the performance of four different models: GPT-2, NBSVM, BiLSTM, and RoBERTa. The hyperparameters of the models are also tuned to find the optimal values. The research concludes that the RoBERTa model performs the best overall, with an accuracy of 97%. GPT-2 has an overall accuracy of 82%, NBSVM has an overall accuracy of 95%, and BiLSTM has an overall accuracy of 92%. The research also calculates the Area Under the Curve (AUC) for each model and finds that RoBERTa has an AUC of 0.9976, NBSVM has an AUC of 0.9888, BiLSTM has an AUC of 0.9753, and GPT-2 has an AUC of 0.9226. It can be observed that the RoBERTa model has the highest AUC value, which is close to 1. Therefore, it can be concluded that this model provides the most accurate prediction for detecting fake reviews, which is the main focus of this research. Doi: 10.28991/HIJ-2023-04-02-08 Full Text: PD

    Artificial Intelligence in Modern Society

    Get PDF
    Artificial intelligence is progressing rapidly into diverse areas in modern society. AI can be used in several areas such as research in the medical field or creating innovative technology, for instance, autonomous vehicles. Artificial intelligence is used in the medical field to improve the accuracy of programs used for detecting health conditions. AI technology is also used in programs such as Netflix or Spotify. This type of AI will monitor a user’s habits and make recommendations based on their recent activity. Banks use AI systems to monitor activity on members’ accounts to check for identity theft, approve loans and maintain online security. Systems like these can even be found in call centers. These programs analyze a caller’s voice in real time to provide information to the call center which helps them build a faster rapport with the caller. The purpose of this research paper is to explain how artificial intelligence is creating advanced technologies in various fields of study which will create a more efficient society

    Investigating the attainment of optimum data quality for EHR Big Data: proposing a new methodological approach

    Get PDF
    The value derivable from the use of data is continuously increasing since some years. Both commercial and non-commercial organisations have realised the immense benefits that might be derived if all data at their disposal could be analysed and form the basis of decision taking. The technological tools required to produce, capture, store, transmit and analyse huge amounts of data form the background to the development of the phenomenon of Big Data. With Big Data, the aim is to be able to generate value from huge amounts of data, often in non-structured format and produced extremely frequently. However, the potential value derivable depends on general level of governance of data, more precisely on the quality of the data. The field of data quality is well researched for traditional data uses but is still in its infancy for the Big Data context. This dissertation focused on investigating effective methods to enhance data quality for Big Data. The principal deliverable of this research is in the form of a methodological approach which can be used to optimize the level of data quality in the Big Data context. Since data quality is contextual, (that is a non-generalizable field), this research study focuses on applying the methodological approach in one use case, in terms of the Electronic Health Records (EHR). The first main contribution to knowledge of this study systematically investigates which data quality dimensions (DQDs) are most important for EHR Big Data. The two most important dimensions ascertained by the research methods applied in this study are accuracy and completeness. These are two well-known dimensions, and this study confirms that they are also very important for EHR Big Data. The second important contribution to knowledge is an investigation into whether Artificial Intelligence with a special focus upon machine learning could be used in improving the detection of dirty data, focusing on the two data quality dimensions of accuracy and completeness. Regression and clustering algorithms proved to be more adequate for accuracy and completeness related issues respectively, based on the experiments carried out. However, the limits of implementing and using machine learning algorithms for detecting data quality issues for Big Data were also revealed and discussed in this research study. It can safely be deduced from the knowledge derived from this part of the research study that use of machine learning for enhancing data quality issues detection is a promising area but not yet a panacea which automates this entire process. The third important contribution is a proposed guideline to undertake data repairs most efficiently for Big Data; this involved surveying and comparing existing data cleansing algorithms against a prototype developed for data reparation. Weaknesses of existing algorithms are highlighted and are considered as areas of practice which efficient data reparation algorithms must focus upon. Those three important contributions form the nucleus for a new data quality methodological approach which could be used to optimize Big Data quality, as applied in the context of EHR. Some of the activities and techniques discussed through the proposed methodological approach can be transposed to other industries and use cases to a large extent. The proposed data quality methodological approach can be used by practitioners of Big Data Quality who follow a data-driven strategy. As opposed to existing Big Data quality frameworks, the proposed data quality methodological approach has the advantage of being more precise and specific. It gives clear and proven methods to undertake the main identified stages of a Big Data quality lifecycle and therefore can be applied by practitioners in the area. This research study provides some promising results and deliverables. It also paves the way for further research in the area. Technical and technological changes in Big Data is rapidly evolving and future research should be focusing on new representations of Big Data, the real-time streaming aspect, and replicating same research methods used in this current research study but on new technologies to validate current results

    Метод сегментации изображения рака прямой кишки на основе сети U- Net

    Get PDF
    В статье используется сеть U-Net для интеллектуальной сегментации изображений КТ рака прямой кишки, применяются такие методы, как улучшение изображения и пакетная нормализация для облегчения явления переподгонки, определяется оптимальная начальная скорость обучения и количество сверточных ядер путем нескольких экспериментов, достигается идеальная сегментация опухолей рака прямой кишки с помощью сети U-Net: 85.76%. Эксперименты показывают, что U-Net хорошо работает для сегментации медицинских изображений на небольших наборах данных, и сходство сегментации может быть точно измерено с помощью коэффициентов Dice для наборов данных с чрезвычайно перекошенными положительными и отрицательными образцами.The paper uses U-Net network for intelligent segmentation of rectal cancer CT images, incorporates techniques such as image enhancement and batch normalization to alleviate the overfitting phenomenon, and determines the optimal initial learning rate and the number of convolutional kernels through several experiments, and achieves the ideal segmentation of rectal cancer tumors using U-Net network: 85.76%. The experiments show that U-Net works well for medical image segmentation on small data sets, and the similarity of segmentation can be accurately measured using Dice coefficients for data sets with extremely skewed positive and negative samples

    Mining and Integration of Structured and Unstructured Electronic Clinical Data for Dementia Detection

    Get PDF
    Dementia is an increasing problem for the aging population that incurs high medical costs, in part due to the lack of available treatment options. Accordingly, early detection is critical to potentially postpone symptoms and to prepare both healthcare providers and families for a patient\u27s management needs. Current detection methods are typically costly or unreliable, and could greatly benefit from improved recognition of early dementia markers. Identification of such markers may be possible through computational analysis of patients\u27 electronic clinical records. Prior work on has focused on structured data (e.g. test results), but these records often also contain natural language (text) data in the form of patient histories, visit summaries, or other notes, which may be valuable for disease prediction. This thesis has three main goals: to incorporate analysis of the aforementioned electronic medical texts into predictive models of dementia development, to explore the use of topic modeling as a form of interpretable dimensionality reduction to improve prediction and to characterize the texts, and to integrate these models with ones using structured data. This kind of computational modeling could be used in an automated screening system to identify and flag potentially problematic patients for assessment by clinicians. Results support the potential for unstructured clinical text data both as standalone predictors of dementia status when structured data are missing, and as complements to structured data

    Testicular cancer – response adapted treatment, prognostic markers and survivorship issues

    Get PDF
    In the US and most European countries, testicular cancer is the most common malignancy in young men aged 20-40 years. Since the introduction of cisplatin-based treatment in the 1970s, more than 95% of the patients are cured. The increasing incidence of testicular cancer and high survival rates, has led to a growing number of testicular cancer survivors (TCSs). As they have a long life expectancy it’s important to minimize the treatment burden without compromising outcome in order to minimize the risk of late toxicity. In the first study the aim was to evaluate the SWENOTECA IV (Swedish-Norwegian Testicular Cancer Group) treatment strategy for patients with metastatic NSGCT with respect to outcome. The protocol was designed for early identification of patients, in whom the response to two standard chemotherapy courses was inadequate and to provide intensified treatment to these individuals. Tumor marker decline and, for patients with marker negative disease, radiological assessment were to be used for response evaluation. The conclusion was that with detailed treatment protocols and a dedicated collaborative group of specialists, treatment results comparable to those reported from large single institutions can be achieved at a national level. The survival of intermediate risk patients is remarkable and close to that of good risk patients. To investigate if testicular cancer survivors (TCSs) have a higher incidence of work loss compared with the general population, accounting for stage, treatment and relapse a cohort of TCSs was identified in the SWENOTECA register, and compared to matched population comparators. Prospectively recorded work loss data was obtained from national registers. Adjusted relative risks (RR) and 95% confidence intervals (CI) of sick leave and/or disability pension were calculated annually and overall with Poisson- and Cox regression, censoring at relapse. The mean number of annual work days lost was also estimated. The result indicated that extensively treated TCSs, but not those on surveillance or limited treatment, are at increased risk of work loss long-term, not explained by relapse. These patients may benefit from early rehabilitation initiatives. Expression of the RNA-binding motif protein 3 (RBM3) has been shown to correlate with favourable clinicopathological parameters and prognosis in several cancer diseases. The aim of the study was to examine the expression and prognostic ability of RBM3 in patients with testicular non-seminomatous germ cell tumors (NSGCT). Low RBM3 expression was a predictor of treatment failure in metastatic NSGCT, in relation to the prognostic factors included in the International Germ Cell Consensus Classification (IGCCC). These findings suggest that RBM3 may be a potential biomarker for treatment stratification in patients with metastatic non-seminomatous germ cell tumors, and therefore merit further validation

    Sobiva omaduste profiiliga ühendite tuvastamine keemiliste struktuuride andmekogudest

    Get PDF
    Keemiliste ühendite digitaalsete andmebaaside kasutuselevõtuga kaasneb vajadus leida neist arvutuslikke vahendeid kasutades sobivate omadustega molekule. Probleem on eriti huvipakkuv ravimitööstuses, kus aja- ja ressursimahukate katsete asendamine arvutustega, võimaldab märkimisväärset säästu. Kuigi tänapäevaste arvutusmeetodite piiratud võimsuse tõttu ei ole lähemas tulevikus võimalik kogu ravimidisaini protsessi algusest lõpuni arvutitesse ümber kolida, on lugu teine, kui vaadelda suuri andmekogusid. Arvutusmeetod, mis töötab teadaoleva statistilise vea piires, visates välja mõne sobiva ühendi ja lugedes mõni ekslikult aktiivseks, tihendab lõppkokkuvõttes andmekomplekti tuntaval määral huvitavate ühendite suhtes. Seetõttu on ravimiarenduse lihtsamate ja vähenõudlikkumade etappide puhul, nagu juhtühendite või ravimikandidaatide leidmine, edukalt võimalik rakendada arvutuslikke vahendeid. Selline tegevus on tuntud virtuaalsõelumisena ning käesolevasse töösse on sellest avarast ja kiiresti arenevast valdkonnast valitud mõningad suunad, ning uuritud nende võimekust ja tulemuslikkust erinevate projektide raames. Töö tulemusena on valminud arvutusmudelid teatud tüüpi ühendite HIV proteaasi vastase aktiivsuse ja tsütotoksilisuse hindamiseks; koostatud uus sõelumismeetod; leitud potentsiaalsed ligandid HIV proteaasile ja pöördtranskriptaasile; ning kokku pandud farmakokineetiliste filtritega eeltöödeldud andmekomplekt – mugav lähtepositsioon edasisteks töödeks.With the implementation of digital chemical compound libraries, creates the need for finding compounds from them that fit the desired profile. The problem is of particular interest in drug design, where replacing the resource-intensive experiments with computational methods, would result in significant savings in time and cost. Although due to the limitations of current computational methods, it is not possible in foreseeable future to transfer all of the drug development process into computers, it is a different story with large molecular databases. An in silico method, working within a known error margin, is still capable of significantly concentrating the data set in terms of attractive compounds. That allows the use of computational methods in less stringent steps of drug development, such as finding lead compounds or drug candidates. This approach is known as virtual screening, and today it is a vast and prospective research area comprising of several paradigms and numerous individual methods. The present thesis takes a closer look on some of them, and evaluates their performance in the course of several projects. The results of the thesis include computational models to estimate the HIV protease inhibition activity and cytotoxicity of certain type of compounds; a few prospective ligands for HIV protease and reverse transcriptase; pre-filtered dataset of compounds – convenient starting point for subsequent projects; and finally a new virtual screening method was developed
    corecore