934 research outputs found

    Pemilihan kerjaya di kalangan pelajar aliran perdagangan sekolah menengah teknik : satu kajian kes

    Get PDF
    This research is a survey to determine the career chosen of form four student in commerce streams. The important aspect of the career chosen has been divided into three, first is information about career, type of career and factor that most influence students in choosing a career. The study was conducted at Sekolah Menengah Teknik Kajang, Selangor Darul Ehsan. Thirty six form four students was chosen by using non-random sampling purpose method as respondent. All information was gather by using questionnaire. Data collected has been analyzed in form of frequency, percentage and mean. Results are performed in table and graph. The finding show that information about career have been improved in students career chosen and mass media is the main factor influencing students in choosing their career

    Machine Learning and Integrative Analysis of Biomedical Big Data.

    Get PDF
    Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues

    Investigation of intra-tumour heterogeneity to identify texture features to characterise and quantify neoplastic lesions on imaging

    Get PDF
    The aim of this work was to further our knowledge of using imaging data to discover image derived biomarkers and other information about the imaged tumour. Using scans obtained from multiple centres to discover and validate the models has advanced earlier research and provided a platform for further larger centre prospective studies. This work consists of two major studies which are describe separately: STUDY 1: NSCLC Purpose The aim of this multi-center study was to discover and validate radiomics classifiers as image-derived biomarkers for risk stratification of non-small-cell lung cancer (NSCLC). Patients and methods Pre-therapy PET scans from 358 Stage I–III NSCLC patients scheduled for radical radiotherapy/chemoradiotherapy acquired between October 2008 and December 2013 were included in this seven-institution study. Using a semiautomatic threshold method to segment the primary tumors, radiomics predictive classifiers were derived from a training set of 133 scans using TexLAB v2. Least absolute shrinkage and selection operator (LASSO) regression analysis allowed data dimension reduction and radiomics feature vector (FV) discovery. Multivariable analysis was performed to establish the relationship between FV, stage and overall survival (OS). Performance of the optimal FV was tested in an independent validation set of 204 patients, and a further independent set of 21 (TESTI) patients. Results Of 358 patients, 249 died within the follow-up period [median 22 (range 0–85) months]. From each primary tumor, 665 three-dimensional radiomics features from each of seven gray levels were extracted. The most predictive feature vector discovered (FVX) was independent of known prognostic factors, such as stage and tumor volume, and of interest to multi-center studies, invariant to the type of PET/CT manufacturer. Using the median cut-off, FVX predicted a 14-month survival difference in the validation cohort (N = 204, p = 0.00465; HR = 1.61, 95% CI 1.16–2.24). In the TESTI cohort, a smaller cohort that presented with unusually poor survival of stage I cancers, FVX correctly indicated a lack of survival difference (N = 21, p = 0.501). In contrast to the radiomics classifier, clinically routine PET variables including SUVmax, SUVmean and SUVpeak lacked any prognostic information. Conclusion PET-based radiomics classifiers derived from routine pre-treatment imaging possess intrinsic prognostic information for risk stratification of NSCLC patients to radiotherapy/chemo-radiotherapy. STUDY 2: Ovarian Cancer Purpose The 5-year survival of epithelial ovarian cancer is approximately 35-40%, prompting the need to develop additional methods such as biomarkers for personalised treatment. Patient and Methods 657 texture features were extracted from the CT scans of 364 untreated EOC patients. A 4-texture feature ‘Radiomic Prognostic Vector (RPV)’ was developed using machine learning methods on the training set. Results The RPV was able to identify the 5% of patients with the worst prognosis, significantly improving established prognostic methods and was further validated in two independent, multi-centre cohorts. In addition, the genetic, transcriptomic and proteomic analysis from two independent datasets demonstrated that stromal and DNA damage response pathways are activated in RPV-stratified tumours. Conclusion RPV could be used to guide personalised therapy of EOC. Overall, the two large datasets of different imaging modalities have increased our knowledge of texture analysis, improving the models currently available and provided us with more areas with which to implement these tools in the clinical setting.Open Acces

    Comprehensive Performance Analysis of Neurodegenerative disease Incidence in the Females of 60-96 year Age Group

    Get PDF
    Neurodegenerative diseases such as Alzheimer's disease and dementia are gradually becoming more prevalent chronic diseases, characterized by the decline in cognitive and behavioral symptoms. Machine learning is revolu-tionising almost all domains of our life, including the clinical system. The application of machine learning has the potential to enormously augment the reach of neurodegenerative care thus building it more proficient. Throughout the globe, there is a massive burden of Alzheimer's and demen-tia cases; which denotes an exclusive set of difficulties. This provides us with an exceptional opportunity in terms of the impending convenience of data. Harnessing this data using machine learning tools and techniques, can put scientists and physicians in the lead research position in this area. The ob-jective of this study was to develop an efficient prognostic ML model with high-performance metrics to better identify female candidate subjects at risk of having Alzheimer's disease and dementia. The study was based on two diverse datasets. The results have been discussed employing seven perfor-mance evaluation measures i.e. accuracy, precision, recall, F-measure, Re-ceiver Operating Characteristic (ROC) area, Kappa statistic, and Root Mean Squared Error (RMSE). Also, a comprehensive performance analysis has been carried out later in the study

    Deep Risk Prediction and Embedding of Patient Data: Application to Acute Gastrointestinal Bleeding

    Get PDF
    Acute gastrointestinal bleeding is a common and costly condition, accounting for over 2.2 million hospital days and 19.2 billion dollars of medical charges annually. Risk stratification is a critical part of initial assessment of patients with acute gastrointestinal bleeding. Although all national and international guidelines recommend the use of risk-assessment scoring systems, they are not commonly used in practice, have sub-optimal performance, may be applied incorrectly, and are not easily updated. With the advent of widespread electronic health record adoption, longitudinal clinical data captured during the clinical encounter is now available. However, this data is often noisy, sparse, and heterogeneous. Unsupervised machine learning algorithms may be able to identify structure within electronic health record data while accounting for key issues with the data generation process: measurements missing-not-at-random and information captured in unstructured clinical note text. Deep learning tools can create electronic health record-based models that perform better than clinical risk scores for gastrointestinal bleeding and are well-suited for learning from new data. Furthermore, these models can be used to predict risk trajectories over time, leveraging the longitudinal nature of the electronic health record. The foundation of creating relevant tools is the definition of a relevant outcome measure; in acute gastrointestinal bleeding, a composite outcome of red blood cell transfusion, hemostatic intervention, and all-cause 30-day mortality is a relevant, actionable outcome that reflects the need for hospital-based intervention. However, epidemiological trends may affect the relevance and effectiveness of the outcome measure when applied across multiple settings and patient populations. Understanding the trends in practice, potential areas of disparities, and value proposition for using risk stratification in patients presenting to the Emergency Department with acute gastrointestinal bleeding is important in understanding how to best implement a robust, generalizable risk stratification tool. Key findings include a decrease in the rate of red blood cell transfusion since 2014 and disparities in access to upper endoscopy for patients with upper gastrointestinal bleeding by race/ethnicity across urban and rural hospitals. Projected accumulated savings of consistent implementation of risk stratification tools for upper gastrointestinal bleeding total approximately $1 billion 5 years after implementation. Most current risk scores were designed for use based on the location of the bleeding source: upper or lower gastrointestinal tract. However, the location of the bleeding source is not always clear at presentation. I develop and validate electronic health record based deep learning and machine learning tools for patients presenting with symptoms of acute gastrointestinal bleeding (e.g., hematemesis, melena, hematochezia), which is more relevant and useful in clinical practice. I show that they outperform leading clinical risk scores for upper and lower gastrointestinal bleeding, the Glasgow Blatchford Score and the Oakland score. While the best performing gradient boosted decision tree model has equivalent overall performance to the fully connected feedforward neural network model, at the very low risk threshold of 99% sensitivity the deep learning model identifies more very low risk patients. Using another deep learning model that can model longitudinal risk, the long-short-term memory recurrent neural network, need for transfusion of red blood cells can be predicted at every 4-hour interval in the first 24 hours of intensive care unit stay for high risk patients with acute gastrointestinal bleeding. Finally, for implementation it is important to find patients with symptoms of acute gastrointestinal bleeding in real time and characterize patients by risk using available data in the electronic health record. A decision rule-based electronic health record phenotype has equivalent performance as measured by positive predictive value compared to deep learning and natural language processing-based models, and after live implementation appears to have increased the use of the Acute Gastrointestinal Bleeding Clinical Care pathway. Patients with acute gastrointestinal bleeding but with other groups of disease concepts can be differentiated by directly mapping unstructured clinical text to a common ontology and treating the vector of concepts as signals on a knowledge graph; these patients can be differentiated using unbalanced diffusion earth mover’s distances on the graph. For electronic health record data with data missing not at random, MURAL, an unsupervised random forest-based method, handles data with missing values and generates visualizations that characterize patients with gastrointestinal bleeding. This thesis forms a basis for understanding the potential for machine learning and deep learning tools to characterize risk for patients with acute gastrointestinal bleeding. In the future, these tools may be critical in implementing integrated risk assessment to keep low risk patients out of the hospital and guide resuscitation and timely endoscopic procedures for patients at higher risk for clinical decompensation

    Machine learning and computational methods to identify molecular and clinical markers for complex diseases – case studies in cancer and obesity

    Get PDF
    In biomedical research, applied machine learning and bioinformatics are the essential disciplines heavily involved in translating data-driven findings into medical practice. This task is especially accomplished by developing computational tools and algorithms assisting in detection and clarification of underlying causes of the diseases. The continuous advancements in high-throughput technologies coupled with the recently promoted data sharing policies have contributed to presence of a massive wealth of data with remarkable potential to improve human health care. In concordance with this massive boost in data production, innovative data analysis tools and methods are required to meet the growing demand. The data analyzed by bioinformaticians and computational biology experts can be broadly divided into molecular and conventional clinical data categories. The aim of this thesis was to develop novel statistical and machine learning tools and to incorporate the existing state-of-the-art methods to analyze bio-clinical data with medical applications. The findings of the studies demonstrate the impact of computational approaches in clinical decision making by improving patients risk stratification and prediction of disease outcomes. This thesis is comprised of five studies explaining method development for 1) genomic data, 2) conventional clinical data and 3) integration of genomic and clinical data. With genomic data, the main focus is detection of differentially expressed genes as the most common task in transcriptome profiling projects. In addition to reviewing available differential expression tools, a data-adaptive statistical method called Reproducibility Optimized Test Statistic (ROTS) is proposed for detecting differential expression in RNA-sequencing studies. In order to prove the efficacy of ROTS in real biomedical applications, the method is used to identify prognostic markers in clear cell renal cell carcinoma (ccRCC). In addition to previously known markers, novel genes with potential prognostic and therapeutic role in ccRCC are detected. For conventional clinical data, ensemble based predictive models are developed to provide clinical decision support in treatment of patients with metastatic castration resistant prostate cancer (mCRPC). The proposed predictive models cover treatment and survival stratification tasks for both trial-based and realworld patient cohorts. Finally, genomic and conventional clinical data are integrated to demonstrate the importance of inclusion of genomic data in predictive ability of clinical models. Again, utilizing ensemble-based learners, a novel model is proposed to predict adulthood obesity using both genetic and social-environmental factors. Overall, the ultimate objective of this work is to demonstrate the importance of clinical bioinformatics and machine learning for bio-clinical marker discovery in complex disease with high heterogeneity. In case of cancer, the interpretability of clinical models strongly depends on predictive markers with high reproducibility supported by validation data. The discovery of these markers would increase chance of early detection and improve prognosis assessment and treatment choice

    Radiomics and Magnetic Resonance Imaging of Rectal Cancer: From Engineering to Clinical Practice

    Get PDF
    While cross-sectional imaging has seen continuous progress and plays an undiscussedpivotal role in the diagnostic management and treatment planning of patients with rectal cancer, alargely unmet need remains for improved staging accuracy, assessment of treatment response andprediction of individual patient outcome. Moreover, the increasing availability of target therapies hascalled for developing reliable diagnostic tools for identifying potential responders and optimizingoverall treatment strategy on a personalized basis. Radiomics has emerged as a promising, still fullyevolving research topic, which could harness the power of modern computer technology to generatequantitative information from imaging datasets based on advanced data-driven biomathematicalmodels, potentially providing an added value to conventional imaging for improved patient manage-ment. The present study aimed to illustrate the contribution that current radiomics methods appliedto magnetic resonance imaging can offer to managing patients with rectal cancer

    Bioinformatics applied to human genomics and proteomics: development of algorithms and methods for the discovery of molecular signatures derived from omic data and for the construction of co-expression and interaction networks

    Get PDF
    [EN] The present PhD dissertation develops and applies Bioinformatic methods and tools to address key current problems in the analysis of human omic data. This PhD has been organised by main objectives into four different chapters focused on: (i) development of an algorithm for the analysis of changes and heterogeneity in large-scale omic data; (ii) development of a method for non-parametric feature selection; (iii) integration and analysis of human protein-protein interaction networks and (iv) integration and analysis of human co-expression networks derived from tissue expression data and evolutionary profiles of proteins. In the first chapter, we developed and tested a new robust algorithm in R, called DECO, for the discovery of subgroups of features and samples within large-scale omic datasets, exploring all feature differences possible heterogeneity, through the integration of both data dispersion and predictor-response information in a new statistic parameter called h (heterogeneity score). In the second chapter, we present a simple non-parametric statistic to measure the cohesiveness of categorical variables along any quantitative variable, applicable to feature selection in all types of big data sets. In the third chapter, we describe an analysis of the human interactome integrating two global datasets from high-quality proteomics technologies: HuRI (a human protein-protein interaction network generated by a systematic experimental screening based on Yeast-Two-Hybrid technology) and Cell-Atlas (a comprehensive map of subcellular localization of human proteins generated by antibody imaging). This analysis aims to create a framework for the subcellular localization characterization supported by the human protein-protein interactome. In the fourth chapter, we developed a full integration of three high-quality proteome-wide resources (Human Protein Atlas, OMA and TimeTree) to generate a robust human co-expression network across tissues assigning each human protein along the evolutionary timeline. In this way, we investigate how old in evolution and how correlated are the different human proteins, and we place all them in a common interaction network. As main general comment, all the work presented in this PhD uses and develops a wide variety of bioinformatic and statistical tools for the analysis, integration and enlighten of molecular signatures and biological networks using human omic data. Most of this data corresponds to sample cohorts generated in recent biomedical studies on specific human diseases
    • …
    corecore