6,656 research outputs found
Evaluating Digital Libraries: A Longitudinal and Multifaceted View
published or submitted for publicatio
Courseware in academic library user education: a literature review from the GAELS Joint Electronic Library project
The use of courseware for information skills teaching in academic libraries has been growing for a number of years. The GAELS project was required to create a set of learning materials to support Joint Electronic Library activity at Glasgow and Strathclyde Universities and conducted a literature review of the subject. This review discovered a range of factors common to successful library courseware implementations, such as the need for practitioners to feel a sense of ownership of the medium, a need for courseware customization to local information environments, and an emphasis on training packages for large bodies of undergraduates. However, we also noted underdeveloped aspects worthy of further attention, such as treatment of pedagogic issues in library CAL implementations and use of hypertextual learning materials for more advanced information skills training. We suggest ways of improving library teaching practice and further areas of research
Autoencoder-Based Representation Learning to Predict Anomalies in Computer Networks
With the recent advances in Internet-of-thing devices (IoT), cloud-based services, and diversity in the network data, there has been a growing need for sophisticated anomaly detection algorithms within the network intrusion detection system (NIDS) that can tackle advanced network threats. Advances in Deep and Machine learning (ML) has been garnering considerable interest among researchers since it has the capacity to provide a solution to advanced threats such as the zero-day attack. An Intrusion Detection System (IDS) is the first line of defense against network-based attacks compared to other traditional technologies, such as firewall systems. This report adds to the existing approaches by proposing a novel strategy to incorporate both supervised and unsupervised learning to Intrusion Detection Systems (IDS). Specifically, the study will utilize deep Autoencoder (DAE) as a dimensionality reduction tool and Support Vector Machine (SVM) as a classifier to perform anomaly-based classification. The study diverts from other similar studies by performing a thorough analysis of using deep autoencoders as a valid non-linear dimensionality tool by comparing it against Principal Component Analysis (PCA) and tuning hyperparameters that optimizes for \u27F-1 Micro\u27 score and \u27Balanced Accuracy\u27 since we are dealing with a dataset with imbalanced classes. The study employs robust analysis tools such as Precision-Recall Curves, Average-Precision score, Train-Test Times, t-SNE, Grid Search, and L1/L2 regularization. Our model will be trained and tested on a publicly available datasets KDDTrain+ and KDDTest+
Recommended from our members
COMPARING THE EFFECTIVENESS OF DIFFERENT BOOSTING ALGORITHMS FOR GROUND WATER QUALITY IN TELANGANA REGION
This culminating experience research project explores the parameters needed to predict the water quality levels for use in different climatic conditions pre and post monsoon from 2018 to 2020 in Telangana State, India. A study was conducted on the water quality analysis by using linear regression with water quality Index in Telangana region. However, in this study we are replicating the water quality analysis by using stack model and machine learning algorithms such as Light Gradient Boosting Machine, Random Forest and Artificial Neural Network. The research Questions are: (Q1) What are the sources of the significant parameters that impact groundwater quality in a location? (Q2) Will the use of the stacked model analysis approach produce different results when applied to the Telangana dataset? (Q3) How does the size and nature of a dataset impact the effectiveness of ensemble techniques, such as stacking, for addressing class imbalance in groundwater quality prediction models? The finding are: (Q1) Sodium and Magnesium parameters values have been calculated for Sodium Adsorption Ratio (SAR) for the ground water samples. Based on these parameter electrical conductivity EC and SAR values, Salinity hazard values calculated and converted into different classes like Low C1 (EC2250). Sodium Hazard Classes Low S1 (SAR \u3c 10), Medium S2 (SAR 10 – 18), High S3 (SAR 18-26), Very High S4 (SAR \u3e 26). In comparing of 2018, 2019 and 2020 dataset of water quality analysis, increased in ranges Sodium (5.07 to 748), Calcium (1.2 to 640.0), Magnesium ((4.86 to 457.02), Electrical Conductivity (102 to 9499). (Q2). Stacked models achieved the best performance with use of different classifiers in terms of accuracy (the individual models of Random forecast 97%, Light GBM 97% and calculation of two predicted probability values passes through ANN which model accuracy increasers to 98%) to predict the water quality by collecting the data from different regions and climatic conditions based on the suitability of water salinity and sodium content. (Q3) In order to manage imbalanced data and increase prediction accuracy by calculating the model performance by using classification report of random forest, LGBM and ANN these are the values which are varying in performance F1 Score. For Class Marginal (RF-0.63), (LGBM-0.67), ANN increased to performance to (0.76). Class Poor (RF-0.95), (LGBM-0.95), ANN increased to performance to (0.96), Class Very Poor (RF-0.77), (LGBM-0.77), ANN increased to performance to (0.86). For classes Excellent and good F1 Score for 3 models are1 and for Permissible three models got 0.99. The conclusions are: (Q1) This Research provides helpful information to understand and handle the potential risks of salinity and sodium in the researched region by classifying the salinity hazard levels into four classes (C1 to C4 and S1 to S4) based on electrical conductivity (EC) and SAR values. (Q2) To conclude, our research demonstrates that stacked models, employing different classifiers, have proven to be highly effective in predicting water quality with remarkable accuracy. When we utilized the predicted probability values by passing them through the Artificial Neural Network (ANN), the accuracy further improved to an impressive 98%. (Q3) The stacked model technique, which combines random forest, light GBM, and ANN, seems to be an effective means of dealing with imbalanced data and enhancing prediction accuracy. The significant improvement in F1 Scores for a few classes, especially when using ANN, demonstrates how effectively this ensemble approach handles challenging classification problems. Furthermore, emerging areas for future research that emerged from this study include the opportunity for training and testing using our model with a larger dataset and modifying different hyperparameters for further improvement
Automated Test Input Generation for Android: Are We There Yet?
Mobile applications, often simply called "apps", are increasingly widespread,
and we use them daily to perform a number of activities. Like all software,
apps must be adequately tested to gain confidence that they behave correctly.
Therefore, in recent years, researchers and practitioners alike have begun to
investigate ways to automate apps testing. In particular, because of Android's
open source nature and its large share of the market, a great deal of research
has been performed on input generation techniques for apps that run on the
Android operating systems. At this point in time, there are in fact a number of
such techniques in the literature, which differ in the way they generate
inputs, the strategy they use to explore the behavior of the app under test,
and the specific heuristics they use. To better understand the strengths and
weaknesses of these existing approaches, and get general insight on ways they
could be made more effective, in this paper we perform a thorough comparison of
the main existing test input generation tools for Android. In our comparison,
we evaluate the effectiveness of these tools, and their corresponding
techniques, according to four metrics: code coverage, ability to detect faults,
ability to work on multiple platforms, and ease of use. Our results provide a
clear picture of the state of the art in input generation for Android apps and
identify future research directions that, if suitably investigated, could lead
to more effective and efficient testing tools for Android
Data Science, Data Visualization, and Digital Twins
Real-time, web-based, and interactive visualisations are proven to be outstanding methodologies and tools in numerous fields when knowledge in sophisticated data science and visualisation techniques is available. The rationale for this is because modern data science analytical approaches like machine/deep learning or artificial intelligence, as well as digital twinning, promise to give data insights, enable informed decision-making, and facilitate rich interactions among stakeholders.The benefits of data visualisation, data science, and digital twinning technologies motivate this book, which exhibits and presents numerous developed and advanced data science and visualisation approaches. Chapters cover such topics as deep learning techniques, web and dashboard-based visualisations during the COVID pandemic, 3D modelling of trees for mobile communications, digital twinning in the mining industry, data science libraries, and potential areas of future data science development
A case study in open source innovation: developing the Tidepool Platform for interoperability in type 1 diabetes management.
OBJECTIVE:Develop a device-agnostic cloud platform to host diabetes device data and catalyze an ecosystem of software innovation for type 1 diabetes (T1D) management. MATERIALS AND METHODS:An interdisciplinary team decided to establish a nonprofit company, Tidepool, and build open-source software. RESULTS:Through a user-centered design process, the authors created a software platform, the Tidepool Platform, to upload and host T1D device data in an integrated, device-agnostic fashion, as well as an application ("app"), Blip, to visualize the data. Tidepool's software utilizes the principles of modular components, modern web design including REST APIs and JavaScript, cloud computing, agile development methodology, and robust privacy and security. DISCUSSION:By consolidating the currently scattered and siloed T1D device data ecosystem into one open platform, Tidepool can improve access to the data and enable new possibilities and efficiencies in T1D clinical care and research. The Tidepool Platform decouples diabetes apps from diabetes devices, allowing software developers to build innovative apps without requiring them to design a unique back-end (e.g., database and security) or unique ways of ingesting device data. It allows people with T1D to choose to use any preferred app regardless of which device(s) they use. CONCLUSION:The authors believe that the Tidepool Platform can solve two current problems in the T1D device landscape: 1) limited access to T1D device data and 2) poor interoperability of data from different devices. If proven effective, Tidepool's open source, cloud model for health data interoperability is applicable to other healthcare use cases
Merging Ligand-Based and Structure-Based Methods in Drug Discovery: An Overview of Combined Virtual Screening Approaches
Virtual screening (VS) is an outstanding cornerstone in the drug discovery pipeline. A variety of computational approaches, which are generally classified as ligand-based (LB) and structure-based (SB) techniques, exploit key structural and physicochemical properties of ligands and targets to enable the screening of virtual libraries in the search of active compounds. Though LB and SB methods have found widespread application in the discovery of novel drug-like candidates, their complementary natures have stimulated continued e orts toward the development of hybrid strategies that combine LB and SB techniques, integrating them in a holistic computational framework that exploits the available information of both ligand and target to enhance the success of drug discovery projects. In this review, we analyze the main strategies and concepts that have emerged in the last years for defining hybrid LB + SB computational schemes in VS studies. Particularly, attention is focused on the combination of molecular similarity and docking, illustrating them with selected applications taken from the literature
Memory optimization and performance study of protein/DNA-ligand interaction software for HPC clusters
This master's thesis aims to improve the memory requirements and efficiency of the PELE software, developed by BSC, on High-Performance Computing (HPC) systems. The research focuses on analyzing the behavior of PELE on HPC systems to detect areas of improvement and optimize its efficiency. The primary objectives of this thesis are to significantly reduce the memory usage of PELE, replace legacy MPI communication models, and port the software to an ARM-based architecture. The first objective of the thesis is to investigate the current memory usage of PELE and identify the areas where it can be reduced. This study will involve analyzing the code and profiling the performance of PELE on HPC systems to identify memory leaks and inefficient data structures. The results will be used to propose modifications to the software's code to reduce its memory footprint. The second objective is to replace the legacy MPI communication models with more efficient communication models. This study will involve analyzing the communication patterns of PELE and identifying areas where the current models are not optimal. The proposed improvements will include implementing new communication models and optimizing the existing ones to reduce communication overhead and improve the software's scalability. Furthermore, we have made a proposal outside of local PELE changes to improve the global performance of the Adaptive PELE data-flow, which is a common workflow that uses PELE. Finally, the thesis aims to port the PELE software to an ARM-based architecture. This study will involve analyzing the software's code and identifying any platform-specific dependencies. The proposed modifications will ensure that the software can run on ARM-based HPC systems efficiently. In conclusion, this master's thesis aims to improve the efficiency of the PELE software on HPC systems by reducing its memory usage, replacing legacy MPI communication models, and porting it to an ARM-based architecture. The research will involve analyzing the software's code, profiling its performance on HPC systems, and proposing modifications to optimize its performance. The proposed improvements will make PELE more efficient and scalable, making it suitable for use in large-scale simulations and scientific research
- …