71 research outputs found
Machine Learning for Small Molecule Identification
Metabolites are small molecules involved in biological process of organisms. For example, ethylene serves as plants hormone to stimulate or regulate the opening of flowers, ripening of fruit and shedding of leaves. Metabolite identification is to figure out the molecular structure of the metabo-lite contained in some biological sample, which is considered as a major bottleneck for metabolo-mics. The backbone analytical technology for metabolite identification is tandem mass spectrometry. It consists two rounds of mass spectrometry: In the first round all the metabolites in a sample are measured and one particular metabolite being interested is selected and fragmented by a process of dissociation. In the second round, the fragments as well as their abundance are measured. The resulting tandem mass spectra contain the information on the structure and composition of the molecules.
This thesis aims to solve the problem of identifying the molecular structures that produce the observed tandem mass spectra from some biological sample. The traditional methods are mostly based on matching the observed tandem mass spectra to the reference spectra in some database. However, these methods could fail if there are no reference spectra for the molecules in the underlying sample, which is not uncommon especially considering only 220,000 spectra representing 20,000 molecules are measured and annotated according to a recent study while the number of molecules recorded in a compound database PubChem is more than 60 million. To alleviate this problem, many recent works has been focusing on the approach so called in silico fragmentation where the fragmentations are first simulated in computer for the molecules in some molecular database. Then the simulated fragments are compared to the measured tandem mass spectra.
The main contribution of this thesis is to open a novel direction to bridge the gap between the limited spectral database and the vast molecular database with the help of molecular fingerprints. Molecular fingerprints are a binary representation to encode the structures or properties of a molecule. Kernel based machine learning methods are used to predict the molecular fingerprints from tandem mass spectra. Then the predicted fingerprints are used to match the fingerprints of mole-cules in some molecular database to derive an identification. Multiple kernel learning are also proposed to combine different views of tandem mass spectra. Finally, a one-step approach based on input output kernel regression is also applied to solve this problem, which becomes the new state of the art as demonstrated in several benchmarks including the recent Critical Assessment of Small Molecule Identification (CASMI) 2016 challenge
Metabolite identification and molecular fingerprint prediction through machine learning
Motivation: Metabolite identification from tandem mass spectra is an important problem in metabolomics, underpinning subsequent metabolic modelling and network analysis. Yet, currently this task requires matching the observed spectrum against a database of reference spectra originating from similar equipment and closely matching operating parameters, a condition that is rarely satisfied in public repositories. Furthermore, the computational support for identification of molecules not present in reference databases is lacking. Recent efforts in assembling large public mass spectral databases such as MassBank have opened the door for the development of a new genre of metabolite identification methods. Results: We introduce a novel framework for prediction of molecular characteristics and identification of metabolites from tandem mass spectra using machine learning with the support vector machine. Our approach is to first predict a large set of molecular properties of the unknown metabolite from salient tandem mass spectral signals, and in the second step to use the predicted properties for matching against large molecule databases, such as PubChem. We demonstrate that several molecular properties can be predicted to high accuracy and that they are useful in de novo metabolite identification, where the reference database does not contain any spectra of the same molecule. Availability: An Matlab/Python package of the FingerID tool is freely available on the web at http://www.sourceforge.net/p/fingerid. Contact: [email protected]
Obeying the Order: Introducing Ordered Transfer Hyperparameter Optimisation
We introduce ordered transfer hyperparameter optimisation (OTHPO), a version
of transfer learning for hyperparameter optimisation (HPO) where the tasks
follow a sequential order. Unlike for state-of-the-art transfer HPO, the
assumption is that each task is most correlated to those immediately before it.
This matches many deployed settings, where hyperparameters are retuned as more
data is collected; for instance tuning a sequence of movie recommendation
systems as more movies and ratings are added. We propose a formal definition,
outline the differences to related problems and propose a basic OTHPO method
that outperforms state-of-the-art transfer HPO. We empirically show the
importance of taking order into account using ten benchmarks. The benchmarks
are in the setting of gradually accumulating data, and span XGBoost, random
forest, approximate k-nearest neighbor, elastic net, support vector machines
and a separate real-world motivated optimisation problem. We open source the
benchmarks to foster future research on ordered transfer HPO.Comment: To be presented at the AutoML 2023 Workshop Trac
Direct fabrication of high-performance high speed steel products enhanced by LaB6
A direct fabrication technology (DFT) without smelting has been developed for fabricating sophisticated high speed steel products with low pollution, near-net shaping and short process. The steel consisting of (wt.%): 6.4W, 5.0Mo, 4.2Cr, 3.1V, 8.5Co and 1.28C, was fabricated as exemplary material. The activated and reactive sintering of green compacts under vacuum with low activation energy, redox reaction enhanced diffusion and the construction of concentration gradient of alloying elements around pores, promotes the nearly full densification (>\ua099.40%). Also, the DFT steels show high purity and superior mechanical properties. Minor strengthening agent LaB (0.1\ua0wt.%), which is easily to be accurately introduced in DFT, obviously increases the hot hardness, temper resistance, bend strength and toughness of DFT M3:2. The strengthening effect of boron atoms and La-rich complexes are proposed to directly result in the high hot hardness and temper resistance of LaB containing steel
Chiral Transmission to Cationic Polycobaltocenes over Multiple Length Scales Using Anionic Surfactants
International audienceChiral polymers are ubiquitous in nature, and the self-assembly of chiral materials is a field of widespread interest. In this paper, we describe the formation of chiral metallopolymers based on poly(cobaltoceniumethylene) ([PCE] ), which have been prepared through oxidation of poly(cobaltocenylethylene) (PCE) in the presence of enantiopure N-acyl-amino-acid-derived anionic surfactants, such as N-palmitoyl-l-alanine (C-l-Ala) and N-palmitoyl-d-alanine (C-d-Ala). It is postulated that the resulting metallopolymer complexes [PCE][C-l/d-Ala] contain close ionic contacts, and exhibit chirality through the axially chiral ethylenic CH-CH bridges, leading to interaction of the chromophoric [CoCp] units through chiral space. The steric influence of the long palmitoyl (C) surfactant tail is key for the transmission of chirality to the polymer, and results in a brushlike amphiphilic macromolecular structure that also affords solubility in polar organic solvents (e.g., EtOH, THF). Upon dialysis of these solutions into water, the hydrophobic palmitoyl surfactant substituents aggregate and the complex assembles into superhelical ribbons with identifiable "handedness", indicating the transmission of chirality from the molecular surfactant to the micrometer length scale, via the macromolecular complex
Result of a year-long animal survey in a state-owned forest farm in Beijing, China
BackgroundArtificial forest can have great potential in serving as habitat to wildlife, depending on different management methods. As the state-owned forest farms now play a new role in ecological conservation in China, the biological richness of this kind of land-use type is understudied. Once owned by a mining company, a largest state-owned forest farm, Jingxi Forest Farm, has been reformed to be a state-owned forest farm with the purpose of conservation since 2017. Although this 116.4 km2 forest farm holds a near-healthy montaine ecosystem very representative in North China, a large proportion of artificial coniferous forest in the forest farm has been proven to hold less biodiversity than natural vegetation. This situation, however, provides a great opportunity for ecological restoration and biodiversity conservation. Therefore, from November 2019 to December 2020, we conducted a set of biodiversity surveys, whose results will serve as a baseline for further restoration and conservation.New informationHere, we report the result of a multi-taxa fauna diversity survey conducted in Jingxi Forest Farm mainly in year 2020 with explicit spatial information. It is the first survey of its kind conducted in this area, revealing a total of 19 species of mammals, 86 birds, four reptiles, two amphibians and one fish species, as well as 101 species of insects. Four species of mammals are identified as data-poor species as they have less than 100 occurrence records with coordination in the GBIF database. One species of insect, representing one new provincial record genus of Beijing, is reported
Climate change : strategies for mitigation and adaptation
The sustainability of life on Earth is under increasing threat due to humaninduced climate change. This perilous change in the Earth's climate is caused by increases in carbon dioxide and other greenhouse gases in the atmosphere, primarily due to emissions associated with burning fossil fuels. Over the next two to three decades, the effects of climate change, such as heatwaves, wildfires, droughts, storms, and floods, are expected to worsen, posing greater risks to human health and global stability. These trends call for the implementation of mitigation and adaptation strategies. Pollution and environmental degradation exacerbate existing problems and make people and nature more susceptible to the effects of climate change. In this review, we examine the current state of global climate change from different perspectives. We summarize evidence of climate change in Earth’s spheres, discuss emission pathways and drivers of climate change, and analyze the impact of climate change on environmental and human health. We also explore strategies for climate change mitigation and adaptation and highlight key challenges for reversing and adapting to global climate change
- …