6 research outputs found

    Data augmentation approach for language identification in imbalanced bilingual code-mixed social media datasets

    Get PDF
    Addressing the problem of language identification in code-mixed datasets poses notable challenges due to data scarcity and high confusability in bilingual contexts. These challenges are further amplified by the associated imbalance and noise characteristic of social media data, complicating efforts to optimize performance. This paper introduces an augmentation approach designed to enhance language identification in bilingual code-mixed social media data. By incorporating reverse translation, semantic similarity, and sampling techniques alongside customized reprocessing strategies, our approach offers a comprehensive solution to these complex issues. To evaluate the effectiveness of the proposed approach, experiments were conducted on language identification at both the sentence and word levels. The results demonstrated the potential of the approach in optimizing language identification performance, offering a compelling combination of generation techniques for addressing the challenges of language identification in code-mixed data

    Penilaian esei berbantukan komputer menggunakan teknik Bayesian dan pengunduran linear berganda

    Get PDF
    Disagreement of grade given by two human judges, time consuming and high evaluation cost became a reason of research on Computer-based Assessment System (CbAS) been studied. The main key is CbAS assessment must be closest to human assessment. Based on UPSR Essay Assessment Schema, there are three main assessment components consists of language, discourse element and style. Recently, Fuzzy Logic is used to determine and classify the discourse element while Stepwise Linear Regression Algorithm (SLR) is used to make closest prediction for style of writing. Both of them have its weakness. Fuzzy Logic did not measure the form of linguistic features and required a huge size of training data. SLR Algorithm derive prediction of writing style using un-standardize feature set and size of features set not clearly defined and no warranty of significance in contribute to get closest grade prediction. This study emphasized on optimization of prediction on discourse elements and writing style that leading to the development of CbAS through four phases of research methodology. (1) Pre-processing and data extraction phase where essay will be parsed into word (token) and implemented Word Correction Algorithm to re-correct the misspell word. (2) Training process of determination and classification of discourse elements using Multivariate Bernoulli Model (MMB) Technique. It considers both presence and absence features thus it measured the form of linguistic features that reflected essay quality. MMB Technique only required a small size of training data. (3) Prediction process of writing style using Multiple Linear Regression (MLR) Algorithm. MLR Algorithm applied six fixed features (based on previous research) to ensure the prediction is more standardize and feature set is more significant. (4) Test the performance agreement derived from the combination of MMB, MLR and data of language component (taken from human assessment) and compared it to human assessment for five cycles of cross-validation. The outcome shows performance is consistent with 95.2% agreement. Thus, the experiment has shown by utilizing both techniques (MMB and MLR), better prediction or essay assessment has been achieved compared to the one’s implemented using Fuzzy Logic and SLR Algorithm

    An Improved Integrative Random Forest for Gene Regulatory Network Inference of Breast Cancer

    Get PDF
    Gene Regulatory Network (GRN) inference aims to capture the regulatory influences between the genes and regulatory events in the GRN. Integrative Random Forest for Gene Regulatory Network Inference (iRafNet) is a RF based algorithm which provides a great result in constructing GRN inference by integrating multiple data types. Most of the approaches did justify their duty but there are some limitations which don’t allow it to reach its optimal state and needs a very long computational time to construct a GRN inference. Other than that, they do not provide optimal performance. There are redundant genes in the dataset. GRN inference by existing methods has a lower accuracy on benchmark and real dataset. Furthermore, the computational time to produce the GRN inference is very long in the existing methods. To overcome these issues is proposed improved the existing method by adding a gene selection into it. To perform the improvement the existing methods was studied and analyzed on their performance in constructing GRN inference. Improved iRafNet was designed and developed to reduce the computational time to construct the GRN inference gene from the dataset. Finally, the accuracy and computational time of the proposed method was validated and verified with the benchmark and real dataset. Improved iRafNet has proven its performance by having a higher AUC and lower computational time

    A review on distance measure formula for enhancing match detection process of generic code clone detection model in Java Application

    No full text
    Two code fragments that are identical to each other and repeatedly being implemented in a software program are called code clones. Code clones are classified into four different type which are clone Type-1, clone Type-2, clone Type3 and clone Type-4. In the past decades, variegated code clone approaches as well as tools have been utilized for detecting code clones. However, the minimum comprehensiveness or lacking of generality in detecting all four types of code clone, has prompted other researchers to develop a code clone detection model. Generic Code Clone Detection (GCCD) Model is the model that implemented a clone detection for the four types of code clone. GCCD incorporates with process namely Pre-processing, Transformation, Parameterization, Categorization and Match Detection process. This research study intends to ameliorate the code clone detection of GCCD Model through Match Detection process of the respective model. A literature review is done to center a few other models and techniques and distance measure formula that available for calculating the code clones in Match Detection process. Based on the analysis, there are ten distance measure formula will be used to decide on the best method for calculating the code clone in Match Detection process. This study hopes to be used as reference for other researchers in aiding their research regarding code clone detection mode

    Table 11: The detection of malware, which attacks Android OS, based on previous static analysis.

    No full text

    Evaluation of a quality improvement intervention to reduce anastomotic leak following right colectomy (EAGLE): pragmatic, batched stepped-wedge, cluster-randomized trial in 64 countries

    Get PDF
    Background Anastomotic leak affects 8 per cent of patients after right colectomy with a 10-fold increased risk of postoperative death. The EAGLE study aimed to develop and test whether an international, standardized quality improvement intervention could reduce anastomotic leaks. Methods The internationally intended protocol, iteratively co-developed by a multistage Delphi process, comprised an online educational module introducing risk stratification, an intraoperative checklist, and harmonized surgical techniques. Clusters (hospital teams) were randomized to one of three arms with varied sequences of intervention/data collection by a derived stepped-wedge batch design (at least 18 hospital teams per batch). Patients were blinded to the study allocation. Low- and middle-income country enrolment was encouraged. The primary outcome (assessed by intention to treat) was anastomotic leak rate, and subgroup analyses by module completion (at least 80 per cent of surgeons, high engagement; less than 50 per cent, low engagement) were preplanned. Results A total 355 hospital teams registered, with 332 from 64 countries (39.2 per cent low and middle income) included in the final analysis. The online modules were completed by half of the surgeons (2143 of 4411). The primary analysis included 3039 of the 3268 patients recruited (206 patients had no anastomosis and 23 were lost to follow-up), with anastomotic leaks arising before and after the intervention in 10.1 and 9.6 per cent respectively (adjusted OR 0.87, 95 per cent c.i. 0.59 to 1.30; P = 0.498). The proportion of surgeons completing the educational modules was an influence: the leak rate decreased from 12.2 per cent (61 of 500) before intervention to 5.1 per cent (24 of 473) after intervention in high-engagement centres (adjusted OR 0.36, 0.20 to 0.64; P < 0.001), but this was not observed in low-engagement hospitals (8.3 per cent (59 of 714) and 13.8 per cent (61 of 443) respectively; adjusted OR 2.09, 1.31 to 3.31). Conclusion Completion of globally available digital training by engaged teams can alter anastomotic leak rates. Registration number: NCT04270721 (http://www.clinicaltrials.gov)
    corecore