2,165 research outputs found

    New Fundamental Technologies in Data Mining

    Get PDF
    The progress of data mining technology and large public popularity establish a need for a comprehensive text on the subject. The series of books entitled by "Data Mining" address the need by presenting in-depth description of novel mining algorithms and many useful applications. In addition to understanding each section deeply, the two books present useful hints and strategies to solving problems in the following chapters. The contributing authors have highlighted many future research directions that will foster multi-disciplinary collaborations and hence will lead to significant development in the field of data mining

    Uncertain Multi-Criteria Optimization Problems

    Get PDF
    Most real-world search and optimization problems naturally involve multiple criteria as objectives. Generally, symmetry, asymmetry, and anti-symmetry are basic characteristics of binary relationships used when modeling optimization problems. Moreover, the notion of symmetry has appeared in many articles about uncertainty theories that are employed in multi-criteria problems. Different solutions may produce trade-offs (conflicting scenarios) among different objectives. A better solution with respect to one objective may compromise other objectives. There are various factors that need to be considered to address the problems in multidisciplinary research, which is critical for the overall sustainability of human development and activity. In this regard, in recent decades, decision-making theory has been the subject of intense research activities due to its wide applications in different areas. The decision-making theory approach has become an important means to provide real-time solutions to uncertainty problems. Theories such as probability theory, fuzzy set theory, type-2 fuzzy set theory, rough set, and uncertainty theory, available in the existing literature, deal with such uncertainties. Nevertheless, the uncertain multi-criteria characteristics in such problems have not yet been explored in depth, and there is much left to be achieved in this direction. Hence, different mathematical models of real-life multi-criteria optimization problems can be developed in various uncertain frameworks with special emphasis on optimization problems

    Discriminant analysis of multi sensor data fusion based on percentile forward feature selection

    Get PDF
    Feature extraction is a widely used approach to extract significant features in multi sensor data fusion. However, feature extraction suffers from some drawbacks. The biggest problem is the failure to identify discriminative features within multi-group data. Thus, this study proposed a new discriminant analysis of multi sensor data fusion using feature selection based on the unbounded and bounded Mahalanobis distance to replace the feature extraction approach in low and intermediate levels data fusion. This study also developed percentile forward feature selection (PFFS) to identify discriminative features feasible for sensor data classification. The proposed discriminant procedure begins by computing the average distance between multi- group using the unbounded and bounded distances. Then, the selection of features started by ranking the fused features in low and intermediate levels based on the computed distances. The feature subsets were selected using the PFFS. The constructed classification rules were measured using classification accuracy measure. The whole investigations were carried out on ten e-nose and e-tongue sensor data. The findings indicated that the bounded Mahalanobis distance is superior in selecting important features with fewer features than the unbounded criterion. Moreover, with the bounded distance approach, the feature selection using the PFFS obtained higher classification accuracy. The overall proposed procedure is found fit to replace the traditional discriminant analysis of multi sensor data fusion due to greater discriminative power and faster convergence rate of higher accuracy. As conclusion, the feature selection can solve the problem of feature extraction. Next, the proposed PFFS has been proved to be effective in selecting subsets of features of higher accuracy with faster computation. The study also specified the advantage of the unbounded and bounded Mahalanobis distance in feature selection of high dimensional data which benefit both engineers and statisticians in sensor technolog

    Computationally Linking Chemical Exposure to Molecular Effects with Complex Data: Comparing Methods to Disentangle Chemical Drivers in Environmental Mixtures and Knowledge-based Deep Learning for Predictions in Environmental Toxicology

    Get PDF
    Chemical exposures affect the environment and may lead to adverse outcomes in its organisms. Omics-based approaches, like standardised microarray experiments, have expanded the toolbox to monitor the distribution of chemicals and assess the risk to organisms in the environment. The resulting complex data have extended the scope of toxicological knowledge bases and published literature. A plethora of computational approaches have been applied in environmental toxicology considering systems biology and data integration. Still, the complexity of environmental and biological systems given in data challenges investigations of exposure-related effects. This thesis aimed at computationally linking chemical exposure to biological effects on the molecular level considering sources of complex environmental data. The first study employed data of an omics-based exposure study considering mixture effects in a freshwater environment. We compared three data-driven analyses in their suitability to disentangle mixture effects of chemical exposures to biological effects and their reliability in attributing potentially adverse outcomes to chemical drivers with toxicological databases on gene and pathway levels. Differential gene expression analysis and a network inference approach resulted in toxicologically meaningful outcomes and uncovered individual chemical effects — stand-alone and in combination. We developed an integrative computational strategy to harvest exposure-related gene associations from environmental samples considering mixtures of lowly concentrated compounds. The applied approaches allowed assessing the hazard of chemicals more systematically with correlation-based compound groups. This dissertation presents another achievement toward a data-driven hypothesis generation for molecular exposure effects. The approach combined text-mining and deep learning. The study was entirely data-driven and involved state-of-the-art computational methods of artificial intelligence. We employed literature-based relational data and curated toxicological knowledge to predict chemical-biomolecule interactions. A word embedding neural network with a subsequent feed-forward network was implemented. Data augmentation and recurrent neural networks were beneficial for training with curated toxicological knowledge. The trained models reached accuracies of up to 94% for unseen test data of the employed knowledge base. However, we could not reliably confirm known chemical-gene interactions across selected data sources. Still, the predictive models might derive unknown information from toxicological knowledge sources, like literature, databases or omics-based exposure studies. Thus, the deep learning models might allow predicting hypotheses of exposure-related molecular effects. Both achievements of this dissertation might support the prioritisation of chemicals for testing and an intelligent selection of chemicals for monitoring in future exposure studies.:Table of Contents ... I Abstract ... V Acknowledgements ... VII Prelude ... IX 1 Introduction 1.1 An overview of environmental toxicology ... 2 1.1.1 Environmental toxicology ... 2 1.1.2 Chemicals in the environment ... 4 1.1.3 Systems biological perspectives in environmental toxicology ... 7 Computational toxicology ... 11 1.2.1 Omics-based approaches ... 12 1.2.2 Linking chemical exposure to transcriptional effects ... 14 1.2.3 Up-scaling from the gene level to higher biological organisation levels ... 19 1.2.4 Biomedical literature-based discovery ... 24 1.2.5 Deep learning with knowledge representation ... 27 1.3 Research question and approaches ... 29 2 Methods and Data ... 33 2.1 Linking environmental relevant mixture exposures to transcriptional effects ... 34 2.1.1 Exposure and microarray data ... 34 2.1.2 Preprocessing ... 35 2.1.3 Differential gene expression ... 37 2.1.4 Association rule mining ... 38 2.1.5 Weighted gene correlation network analysis ... 39 2.1.6 Method comparison ... 41 Predicting exposure-related effects on a molecular level ... 44 2.2.1 Input ... 44 2.2.2 Input preparation ... 47 2.2.3 Deep learning models ... 49 2.2.4 Toxicogenomic application ... 54 3 Method comparison to link complex stream water exposures to effects on the transcriptional level ... 57 3.1 Background and motivation ... 58 3.1.1 Workflow ... 61 3.2 Results ... 62 3.2.1 Data preprocessing ... 62 3.2.2 Differential gene expression analysis ... 67 3.2.3 Association rule mining ... 71 3.2.4 Network inference ... 78 3.2.5 Method comparison ... 84 3.2.6 Application case of method integration ... 87 3.3 Discussion ... 91 3.4 Conclusion ... 99 4 Deep learning prediction of chemical-biomolecule interactions ... 101 4.1 Motivation ... 102 4.1.1Workflow ...105 4.2 Results ... 107 4.2.1 Input preparation ... 107 4.2.2 Model selection ... 110 4.2.3 Model comparison ... 118 4.2.4 Toxicogenomic application ... 121 4.2.5 Horizontal augmentation without tail-padding ...123 4.2.6 Four-class problem formulation ... 124 4.2.7 Training with CTD data ... 125 4.3 Discussion ... 129 4.3.1 Transferring biomedical knowledge towards toxicology ... 129 4.3.2 Deep learning with biomedical knowledge representation ...133 4.3.3 Data integration ...136 4.4 Conclusion ... 141 5 Conclusion and Future perspectives ... 143 5.1 Conclusion ... 143 5.1.1 Investigating complex mixtures in the environment ... 144 5.1.2 Complex knowledge from literature and curated databases predict chemical- biomolecule interactions ... 145 5.1.3 Linking chemical exposure to biological effects by integrating CTD ... 146 5.2 Future perspectives ... 147 S1 Supplement Chapter 1 ... 153 S1.1 Example of an estrogen bioassay ... 154 S1.2 Types of mode of action ... 154 S1.3 The dogma of molecular biology ... 157 S1.4 Transcriptomics ... 159 S2 Supplement Chapter 3 ... 161 S3 Supplement Chapter 4 ... 175 S3.1 Hyperparameter tuning results ... 176 S3.2 Functional enrichment with predicted chemical-gene interactions and CTD reference pathway genesets ... 179 S3.3 Reduction of learning rate in a model with large word embedding vectors ... 183 S3.4 Horizontal augmentation without tail-padding ... 183 S3.5 Four-relationship classification ... 185 S3.6 Interpreting loss observations for SemMedDB trained models ... 187 List of Abbreviations ... i List of Figures ... vi List of Tables ... x Bibliography ... xii Curriculum scientiae ... xxxix Selbständigkeitserklärung ... xlii

    THE ROLE OF ICT IN EDUCATION: AN EFFICIENCY ANALYSIS

    Get PDF
    Nell’ambito dell’educazione, l’utilizzo delle tecnologie dell’informazione e della comunicazione (TCI) si è notevolmente intensificato negli ultimi decenni grazie agli investimenti effettuati. Il concetto di TCI è molto ampio. In questo lavoro di tesi, TCI non si riferisce solo alle infrastrutture fisiche (ad esempio radio, telefono, video, televisione, computer), ma include anche l’uso e l’intensità di utilizzo (ad esempio l’impiego giornaliero, settimanale, ecc.), la qualità e l’ubicazione dell’infrastruttura (ad esempio, a scuola oppure a casa), il motivo del suo utilizzo (ad esempio, per svago o per motivi di studio) e la spesa relativa alle TIC. Questa dissertazione discute il ruolo delle TIC nell’istruzione concentrandosi sull’analisi dell’efficienza. La tesi comprende quattro lavori ripartiti in diversi capitoli. Il Capitolo II propone una sistematica literature review sull’argomento. Il Capitolo III esegue un’analisi transnazionale dell’efficienza dell’istruzione a livello scolastico in sei Paesi del sud-est asiatico, ossia in Brunei Darussalam, in Malesia, in Indonesia, nelle Filippine, a Singapore ed in Tailandia. L’analisi viene effettuata mediate l’approccio della stochastic frontier analysis (SFA) che consente di considerare l'eteroschedasticità. Da questo studio risulta che Singapore è comparativamente il Paese con la migliore performance. Nell’analisi condotta, le variabili TIC, ovvero (1) il rapporto tra computer a scuola e (2) il numero totale di studenti ed il rapporto tra computer connessi a Internet, sono assunte essere determinanti dell’inefficienza ed entrano come input nella funzione di produzione (istruzione). Dall’analisi condotta, emerge che il primo rapporto non influenza in modo significativo gli esiti scolastici mentre il secondo ha un significativo impatto. Come determinanti dell’inefficienza, il primo rapporto influisce sull’inefficienza della scuola in nelle aree di matematica e scienze, mentre il secondo non ha alcuna influenza. Il Capitolo IV utilizza l'approccio DEA (non-parametric data envelopment analysis) del modello di super-efficienza che consente alle scuole efficienti di avere punteggi di efficienza superiori a uno (nell’approccio DEA tradizionale, il punteggio di efficienza è limitato da zero a uno). Per studiare i fattori che potenzialmente influenzano l’efficienza, questo studio include anche una seconda analisi basata sull’approccio bootstrapped quantile regression. I risultati suggeriscono una serie di implicazioni politiche per le scuole del sud-est asiatico, indicando diverse linee d’azione per le scuole sia con livelli di efficienza più alti sia per quelle con efficienza minore. Il Capitolo V estende l'analisi condotta nel Capitolo III sia dal punto di vista metodologico che empirico. L’analisi, basata sull’approccio SFA, non include solo le infrastrutture TCI nel modello, ma aggiunge anche l’uso delle TCI (compreso l’indice del tempo trascorso dagli studenti nell’uso delle TCI a scuola, fuori dalla scuola per scopi di intrattenimento e a casa per compiti scolastici). Ciò viene fatto utilizzando il “modello di frontiera stocastica a quattro componenti” in cui le TCI sono modellate sia come input che come determinanti di inefficienza variabile nel tempo. Inoltre, questo modello viene testato utilizzando un set di dati di 24 Paesi OCSE. I risultati mostrano che tutte e tre le variabili che appartengono all’uso delle TIC influenzano i risultati sul livello di istruzione degli studenti, mentre come determinanti di inefficienza, queste variabili hanno solo un effetto marginale. Questo studio dovrebbe quindi fornire una visione più olistica del ruolo delle TIC nell’efficienza dei processi educativi.In education sector, the application of information and communication technology (ICT) has increased substantially over the last decades as many countries have been investing their resources in ICT for educational purposes. The ICT is a broad concept. In this dissertation, ICT does not only refer to physical infrastructure (e.g., radio, telephone, video, television, computer), but it also includes the use and the intensity of use (e.g., every day, one a week, twice a week), the quality and location of the infrastructure (e.g., at school, at home), the reason for using it (e.g., for entertainment or for study purposes), and the expenditure related to the ICT. This dissertation then discusses the role of ICT in education focusing on the efficiency analysis. It comprises four studies starting with a systematic literature review presented in Chapter II, which offers a clear overview of what has and has not been done in the literature towards this particular topic. Chapter III performs cross-country analysis of efficiency of education at school level in six countries in South-East Asia (i.e., Brunei Darussalam, Malaysia, Indonesia, the Philippines, Singapore, and Thailand). The stochastic frontier analysis (SFA) allowing for heteroscedasticity is used. The result reveals that Singapore has the (relatively) best performance among other countries. The ICT infrastructure variables, i.e., the ratio of computers at school to the total number of students and the ratio of computers connected to the internet, are modeled as inputs in the (education) production function and determinants of inefficiency. The first ratio is found to be not significant influencing education outcomes while the second one does influence. As determinants of inefficiency, the first ratio affects school’s inefficiency in terms of mathematics and science, while the second one has no influence. Relying the finding of Chapter III that there are many higher efficiency level schools, Chapter IV uses the non-parametric data envelopment analysis (DEA) approach of the super-efficiency model which has the ability to differentiate among the higher efficiency level schools. This model allows the efficient schools to have efficiency scores of more than one (in the traditional DEA approach, the efficiency score is bounded from zero to one). To investigate factors that potentially influence efficiency, this study performs the “second-stage” analysis by using bootstrapped quantile regression. The results suggest a number of policy implications for South-East Asian schools, indicating different courses of action for schools with higher and lower efficiency levels. Chapter V extends the analysis conducted in Chapter III both from methodological and empirical point of views. The analysis, based on the SFA approach, not only includes the ICT infrastructure in the model, but it also adds the ICT use (including the index of time spent by students in using ICT at school, outside school for entertainment purposes, and at home for school-related tasks). This is done by using the “four-component stochastic frontier model” where ICT is modeled both as inputs and determinants of time-varying inefficiency. In addition, this model is tested using a dataset of 24 OECD countries. Results show that all three variables belong to ICT use influence education outcomes, while as the determinants of time-varying inefficiency, these variables have only marginal effect on inefficiency. This study is then expected to provide a more holistic view of the role of ICT in the efficiency of education measurement as the previous studies only addressed the ICT infrastructure

    Economic Analysis of Portuguese Public Hospitals Through the Construction of Quality, Efficiency, Access, and Financial Related Composite Indicators

    Get PDF
    Hospitals consume most of the health systems’ fnancial resources. In Portugal, for instance, public hospitals represent more than half of the National Health Service debt and are decisive in their fnancial insufciency. Although proft is not the primary goal of hospitals, it is essential to guarantee their fnancial sustainability to ensure users’ health care and the necessary resources. An analysis of the existing literature shows that researches focus mainly on the hospital’s technical efciency. The literature has paid little or even no attention to the use of composite indicators in hospital benchmarking studies. This study uses the Beneft of Doubt methodology alongside recent data about Portuguese public hospitals (2013–2017) to understand the factors that contribute to low performance and high indebtedness levels. Our results suggest that hospitals perform better in terms of access (average score: 0.982). The group of criteria with the lowest performance was efciency and productivity (average score: 0.919), suggesting resources waste. Financial performance is, in general, higher than quality, raising social concerns about the way that public hospitals have been managed. Findings bring relevant implications. For example, the way hospitals are currently fnanced should consider efciency, productivity, quality, and access. Regulators should ensure that minimum performance levels are fulflled, applying preventive and corrective measures to avoid future low-performance levels. We suggest that hospital managers introduce satisfaction inquiries to improve quality. These improvements can attract more patients in the medium- or long-term; thus, our results are useful to citizens to make a better choice.info:eu-repo/semantics/publishedVersio

    An Intelligent Customer Relationship Management (I-CRM) Framework and its Analytical Approaches to the Logistics Industry

    Get PDF
    This thesis develops a new Intelligent Customer Relationship Management (i-CRM) framework, incorporating an i-CRM analytical methodology including text-mining, type mapping, liner, non-liner and neuron-fuzzy approaches to handle customer complaints, identify key customers in the context of business values, define problem significance and issues impact factors, coupled with i-CRM recommendations to help organizations to achieve customer satisfaction through transformation of the customer complaints to organizational opportunities and business development strategies

    Data envelopment analysis as a benchmarking application for humanitarian organizations

    Get PDF
    Humanitarian aid organizations are under tremendous pressure and competition for donor funds to sustain their operations. However, donor contribution levels have remained relatively stagnant over the past five years and are unlikely to grow in the foreseeable future. Additionally, donor policies and mandates have added pressure on humanitarian aid organizations to comply with new and more complex requirements. Many humanitarian aid organizations work in some of the most challenging areas of the world, where conflict, famine, environmental, economic, and cultural challenges are prevalent. Given all these factors, a novel form of performance and efficiency measurement is needed to evaluate the performance of humanitarian aid organizations. This study addressed the possible use of Data Envelopment Analysis that measures the efficiency of an organization’s country programs. Limited funding from donors, competition, and the humanitarian imperative to reach people in need requires humanitarian aid organizations to become better and more effective stewards of donor contributions. This study used a mixed-methods approach to compare and evaluate the efficiency of the country portfolios of a humanitarian aid organization using DEA. The DEA models used are CRS and VRS using an output orientation. This study used an explanatory sequential design. First, a quantitative approach using DEA was employed to compare the efficiency of an organization’s country portfolios. Second, a qualitative effort consisted of a focus group of DEA researchers who have performed DEA on humanitarian aid programs. The focus group addressed the views, perspectives, and issues of conducting DEA within the humanitarian sector. The DEA study was conducted in three phases. A sample of 19 country portfolios was used in this study. The results showed that 10% of the countries were efficient in the aggregate under a CRS model, and 20% using a VRS model. The focus group provided insights and perceptions of DEA from a practical perspective. These were categorized from technical requirements and communications with a client. The challenge in the humanitarian sector is that DEA is not a well known methodology. An explanation is often required on what DEA can do for an organization and its limitations. Additionally, an explanation was often needed for a client to understand how decision making units (DMUs), variables, and DEA techniques can be used to support a humanitarian aid organization

    Research in Supply Chain Management: Issue and Area Development

    Get PDF
    Today the study of supply chain management (SCM) is growing rapidly and provides a great opportunity to do research both empirical and theoretical development. Research opportunities in SCM has been reviewed by many researchers and grouped into many categories. This paper contains a review of research SCM and classify into 7 categories, namely (1) SCM Operational Management & Strategy, (2) knowledge management, (3) Relationship Management, (4) Information Technology in SCM, (5) Supply Chain Design, Logistics & Infrastructure, (6) Global Issues, (7) Environment, Legal & Regulations. The issue in each category and research opportunities will be discussed in this paper. Keywords: Supply Chain Management, Research Opportunities in SCM, Issue in SC
    • …
    corecore