243,265 research outputs found

    Cloud-based platform for intelligent healthcare monitoring and risk prevention in hazardous manufacturing contexts

    Get PDF
    This paper presents an intelligent cloud-based platform for workers healthcare monitoring and risk prevention in potentially hazardous manufacturing contexts. The platform is structured according to sequential modules dedicated to data acquisition, processing and decision-making support. Several sensors and data sources, including smart wearables, machine tool embedded sensors and environmental sensors, are employed for data collection, comprising information on offline clinical background, operational and environmental data. The cloud data processing module is responsible for extracting relevant features from the acquired data in order to feed a machine learning-based decision-making support system. The latter provides a classification of workers’ health status so that a prompt intervention can be performed in particularly challenging scenarios

    The Influence of Data Mining in Increasing Benefits of Libraries in Jordanian Governmental Universities

    Get PDF
    This current study aimed at examining the impact of data mining on increasing benefits of university library in Jordanian governmental university. Researcher adopted techniques of data mining which included (Association, Classification, Clustering, Prediction, Sequential Patterns and Decision Trees). Through employing the quantitative approach and utilizing a questionnaire as a study tool, (412) responded to an online survey which primary data later on was screened and processed using SPSS v. 27th. Results of study accepted the main hypothesis as there appeared an influence of data mining in better organization flow and accumulation of library data and better develop library\u27s services for users. Among data mining techniques, it appeared that (Sequential pattern, decision trees and Prediction techniques) were the most influential techniques on library services followed by librarians in developing library services, this was noticed through the high correlation which connected them to the dependent variable, and the remaining variables also appeared to be positive in influence with a medium correlation. Study recommended to better data mining application by responsible parties within Jordanian universities as there appeared an acceptable level of application; however, the application isn\u27t used to its maximum capacity

    Feature selection and nearest centroid classification for protein mass spectrometry

    Get PDF
    BACKGROUND: The use of mass spectrometry as a proteomics tool is poised to revolutionize early disease diagnosis and biomarker identification. Unfortunately, before standard supervised classification algorithms can be employed, the "curse of dimensionality" needs to be solved. Due to the sheer amount of information contained within the mass spectra, most standard machine learning techniques cannot be directly applied. Instead, feature selection techniques are used to first reduce the dimensionality of the input space and thus enable the subsequent use of classification algorithms. This paper examines feature selection techniques for proteomic mass spectrometry. RESULTS: This study examines the performance of the nearest centroid classifier coupled with the following feature selection algorithms. Student-t test, Kolmogorov-Smirnov test, and the P-test are univariate statistics used for filter-based feature ranking. From the wrapper approaches we tested sequential forward selection and a modified version of sequential backward selection. Embedded approaches included shrunken nearest centroid and a novel version of boosting based feature selection we developed. In addition, we tested several dimensionality reduction approaches, namely principal component analysis and principal component analysis coupled with linear discriminant analysis. To fairly assess each algorithm, evaluation was done using stratified cross validation with an internal leave-one-out cross-validation loop for automated feature selection. Comprehensive experiments, conducted on five popular cancer data sets, revealed that the less advocated sequential forward selection and boosted feature selection algorithms produce the most consistent results across all data sets. In contrast, the state-of-the-art performance reported on isolated data sets for several of the studied algorithms, does not hold across all data sets. CONCLUSION: This study tested a number of popular feature selection methods using the nearest centroid classifier and found that several reportedly state-of-the-art algorithms in fact perform rather poorly when tested via stratified cross-validation. The revealed inconsistencies provide clear evidence that algorithm evaluation should be performed on several data sets using a consistent (i.e., non-randomized, stratified) cross-validation procedure in order for the conclusions to be statistically sound

    The Out-of-core KNN Awakens: The light side of computation force on large datasets

    Get PDF
    International audienceK-Nearest Neighbors (KNN) is a crucial tool for many applications , e.g. recommender systems, image classification and web-related applications. However, KNN is a resource greedy operation particularly for large datasets. We focus on the challenge of KNN computation over large datasets on a single commodity PC with limited memory. We propose a novel approach to compute KNN on large datasets by leveraging both disk and main memory efficiently. The main rationale of our approach is to minimize random accesses to disk, maximize sequential accesses to data and efficient usage of only the available memory. We evaluate our approach on large datasets, in terms of performance and memory consumption. The evaluation shows that our approach requires only 7% of the time needed by an in-memory baseline to compute a KNN graph

    Design and implementation of a cyberinfrastructure for RNA motif search, prediction and analysis

    Get PDF
    RNA secondary and tertiary structure motifs play important roles in cells. However, very few web servers are available for RNA motif search and prediction. In this dissertation, a cyberinfrastructure, named RNAcyber, capable of performing RNA motif search and prediction, is proposed, designed and implemented. The first component of RNAcyber is a web-based search engine, named RmotifDB. This web-based tool integrates an RNA secondary structure comparison algorithm with the secondary structure motifs stored in the Rfam database. With a user-friendly interface, RmotifDB provides the ability to search for ncRNA structure motifs in both structural and sequential ways. The second component of RNAcyber is an enhanced version of RmotifDB. This enhanced version combines data from multiple sources, incorporates a variety of well-established structure-based search methods, and is integrated with the Gene Ontology. To display RmotifDB’s search results, a software tool, called RSview, is developed. RSview is able to display the search results in a graphical manner. Finally, RNAcyber contains a web-based tool called Junction-Explorer, which employs a data mining method for predicting tertiary motifs in RNA junctions. Specifically, the tool is trained on solved RNA tertiary structures obtained from the Protein Data Bank, and is able to predict the configuration of coaxial helical stacks and families (topologies) in RNA junctions at the secondary structure level. Junction-Explorer employs several algorithms for motif prediction, including a random forest classification algorithm, a pseudoknot removal algorithm, and a feature ranking algorithm based on the gini impurity measure. A series of experiments including 10-fold cross- validation has been conducted to evaluate the performance of the Junction-Explorer tool. Experimental results demonstrate the effectiveness of the proposed algorithms and the superiority of the tool over existing methods. The RNAcyber infrastructure is fully operational, with all of its components accessible on the Internet
    • 

    corecore