2,435 research outputs found

    Protein sequences classification based on weighting scheme

    Get PDF
    We present a new technique to recognize remote protein homologies that rely on combining probabilistic modeling and supervised learning in high-dimensional feature spaces. The main novelty of our technique is the method of constructing feature vectors using Hidden Markov Model and the combination of this representation with a classifier capable of learning in very sparse high-dimensional spaces. Each feature vector records the sensitivity of each protein domain to a previously learned set of sub-sequences (strings). Unlike other previous methods, our method takes in consideration the conserved and non-conserved regions. The system subsequently utilizes Support Vector Machines (SVM) classifiers to learn the boundaries between structural protein classes. Experiments show that this method, which we call the String Weighting Scheme-SVM (SWS-SVM) method, significantly improves on previous methods for the classification of protein domains based on remote homologies. Our method is then compared to five existing homology detection methods

    Resistance of blastocystis hominis cysts to chlorine

    Get PDF

    Extending the decomposition algorithm for support vector machines training

    Get PDF
    The Support Vector Machine (SVM) is found to de a capable learning machine. It has the ability to handle difficult pattern recognition tasks such as speech recognition, and has demonstrated reasonable performance. The formulation in a SVM is elegant in that it is simplified to a convex Quadratic IProgramming (QP) problem. Theoretically the training is guaranteed to converge to a global optimal. The training of SVM is not as straightforward as it seems. Numerical problems will cause the training to give non- optimal decision boundaries. Using a conventional optimizer to train SVM is not the ideal solution. One can design a dedicated optimizer that will take full advantage of the specific nature of the QP problem in SVM training. The decomposition algorithm developed by Osuna et al. (1997a) reduces the training cost to an acceptable level. In this paper we have analyzed and developed an extension to Osuna's method in order 110 achieve better performance. The modified method can be used to solve the training of practical SVMs, in which the training might not otherwise converge

    MaScQA: A Question Answering Dataset for Investigating Materials Science Knowledge of Large Language Models

    Full text link
    Information extraction and textual comprehension from materials literature are vital for developing an exhaustive knowledge base that enables accelerated materials discovery. Language models have demonstrated their capability to answer domain-specific questions and retrieve information from knowledge bases. However, there are no benchmark datasets in the materials domain that can evaluate the understanding of the key concepts by these language models. In this work, we curate a dataset of 650 challenging questions from the materials domain that require the knowledge and skills of a materials student who has cleared their undergraduate degree. We classify these questions based on their structure and the materials science domain-based subcategories. Further, we evaluate the performance of GPT-3.5 and GPT-4 models on solving these questions via zero-shot and chain of thought prompting. It is observed that GPT-4 gives the best performance (~62% accuracy) as compared to GPT-3.5. Interestingly, in contrast to the general observation, no significant improvement in accuracy is observed with the chain of thought prompting. To evaluate the limitations, we performed an error analysis, which revealed conceptual errors (~64%) as the major contributor compared to computational errors (~36%) towards the reduced performance of LLMs. We hope that the dataset and analysis performed in this work will promote further research in developing better materials science domain-specific LLMs and strategies for information extraction

    High-precision calculations of In I and Sn II atomic properties

    Full text link
    We use all-order relativistic many-body perturbation theory to study 5s^2 nl configurations of In I and Sn II. Energies, E1-amplitudes, and hyperfine constants are calculated using all-order method, which accounts for single and double excitations of the Dirac-Fock wave functions.Comment: 10 pages, accepted to PRA; v2: Introduction changed, references adde

    Interoperability and Reliability of Multiplatform MPLS VPN: Comparison of Traffic Engineering with RSVP-TE Protocol and LDP Protocol

    Get PDF
    One of the alternatives to overcome network scalability problem and maintaining reliability is using MPLS VPN network. In reallity, the current network is already using a multiplatform of several different hardware vendors, i.e., Cisco and Juniper platforms. This paper discusses the comparison of the simulation results to see interoperability of multiplatform MPLS VPN andreliability through traffic engineering using RSVP-TE and LDP protocols. Both the RSVP and LDP protocols are tested on a stable network and in a recovery mode,as well as non-load conditions and with additional traffic load. The recovery mode is the condition after the failover due to termination of one of the links in the network. The no-load condition means that the network is not filled with additional traffic. There is only traffic from the measurement activity itself. While network conditions with an additional load are conditions where there is an additional UDP packet traffic load of 4.5 Mbps in addition to the measurement load itself. On a stable network and without additional traffic load, the average delay on LDP protocol is 59.41 ms, 2.06 ms jitter, 0.08% packetloss, and 8.99 Mbps throughput. Meanwhile, on RSVP protocol, the average delay is 52.40 ms, 2.39 ms jitter, 12.18% packet loss, and 7.80 Mbps throughput. When failover occurs and on recovery mode, LDP protocol is48% of packet loss per 100 sent packets while on RSVP packet loss percentage is 35.5% per 100 sent packets. Both protocols have interoperability on the third layer of multiplatform MPLS VPN, but on heavy loaded traffic condition, RSVP protocol has better reliability than the LDP protocol

    Temperature influence on total volatile compounds (TVOCs) inside the car cabin of visible light transmittance

    Get PDF
    In the automotive industry indoor air quality or Vehicle Indoor Air Quality (VIAQ) are caused by various substances emitted from interior materials inside a vehicle. The volatile organic compounds (VOCs) are an example of emitted substances from the interior materials which is harmful to the human body. As stated by previous researches, there is a strong correlation between the total VOCs emission and interior temperature. This occurs due to the solar radiation through the back window glasses, windscreen and side window glasses. This trapped heat can accelerate the melting process of trim materials such as hard plastic and rubber, thus causing the emission of total VOCs (TVOCs). Therefore, reducing the percentage of visible light transmittance (VLT) will help to reduce radiation process. The aim of this study is to investigate the effect of VLT level on TVOCs emission in the vehicle cabin under static condition (parked and unventilated) and operating condition (driving and air-conditioned). For static condition the result shows that the TVOCs concentration linearly decreases whenever the percentage of VLT level decreases. However, for operating condition the percentage of VLT have less significance after 50 minutes driving time. In conclusion, the VLT levels have a strong relationship to the TVOCs concentration despite after a long driving time

    Accelerated Design of Chalcogenide Glasses through Interpretable Machine Learning for Composition Property Relationships

    Full text link
    Chalcogenide glasses possess several outstanding properties that enable several ground breaking applications, such as optical discs, infrared cameras, and thermal imaging systems. Despite the ubiquitous usage of these glasses, the composition property relationships in these materials remain poorly understood. Here, we use a large experimental dataset comprising approx 24000 glass compositions made of 51 distinct elements from the periodic table to develop machine learning models for predicting 12 properties, namely, annealing point, bulk modulus, density, Vickers hardness, Littleton point, Youngs modulus, shear modulus, softening point, thermal expansion coefficient, glass transition temperature, liquidus temperature, and refractive index. These models, by far, are the largest for chalcogenide glasses. Further, we use SHAP, a game theory based algorithm, to interpret the output of machine learning algorithms by analyzing the contributions of each element towards the models prediction of a property. This provides a powerful tool for experimentalists to interpret the models prediction and hence design new glass compositions with targeted properties. Finally, using the models, we develop several glass selection charts that can potentially aid in the rational design of novel chalcogenide glasses for various applications.Comment: 17 pages, 8 figure
    corecore