9 research outputs found
Sparse data embedding and prediction by tropical matrix factorization
Background Matrix factorization methods are linear models, with limited capability to model complex relations. In our work, we use tropical semiring to introduce non-linearity into matrix factorization models. We propose a method called Sparse Tropical Matrix Factorization (STMF) for the estimation of missing (unknown) values in sparse data. Results We evaluate the efficiency of the STMF method on both synthetic data and biological data in the form of gene expression measurements downloaded from The Cancer Genome Atlas (TCGA) database. Tests on unique synthetic data showed that STMF approximation achieves a higher correlation than non-negative matrix factorization (NMF), which is unable to recover patterns effectively. On real data, STMF outperforms NMF on six out of nine gene expression datasets. While NMF assumes normal distribution and tends toward the mean value, STMF can better fit to extreme values and distributions. Conclusion STMF is the first work that uses tropical semiring on sparse data. We show that in certain cases semirings are useful because they consider the structure, which is different and simpler to understand than it is with standard linear algebra.This work is supported by the Slovene Research Agency, Young Researcher Grant (52096) awarded to AO, and research core funding (P1-0222 to PO and P2-0209 to TC)
Matrix tri-factorization over the tropical semiring
Tropical semiring has proven successful in several research areas, including optimal control,
bioinformatics, discrete event systems, and decision problems. Previous studies have applied a matrix
two-factorization algorithm based on the tropical semiring to investigate bipartite and tripartite networks.
Tri-factorization algorithms based on standard linear algebra are used to solve tasks such as data fusion, co-clustering, matrix completion, community detection, and more. However, there is currently no tropical matrix tri-factorization approach that would allow for the analysis of multipartite networks with many parts. To address this, we propose the triFastSTMF algorithm, which performs tri-factorization over the tropical semiring. We applied it to analyze a four-partition network structure and recover the edge lengths of the network. We show that triFastSTMF performs similarly to Fast-NMTF in terms of approximation and prediction performance when fitted on the whole network. When trained on a specific subnetwork and used to predict the entire network, triFastSTMF outperforms Fast-NMTF by several orders of magnitude smaller error. The robustness of triFastSTMF is due to tropical operations, which are less prone to predict large values compared to standard operations
FastSTMF: Efficient tropical matrix factorization algorithm for sparse data
Matrix factorization, one of the most popular methods in machine learning,
has recently benefited from introducing non-linearity in prediction tasks using
tropical semiring. The non-linearity enables a better fit to extreme values and
distributions, thus discovering high-variance patterns that differ from those
found by standard linear algebra. However, the optimization process of various
tropical matrix factorization methods is slow. In our work, we propose a new
method FastSTMF based on Sparse Tropical Matrix Factorization (STMF), which
introduces a novel strategy for updating factor matrices that results in
efficient computational performance. We evaluated the efficiency of FastSTMF on
synthetic and real gene expression data from the TCGA database, and the results
show that FastSTMF outperforms STMF in both accuracy and running time. Compared
to NMF, we show that FastSTMF performs better on some datasets and is not prone
to overfitting as NMF. This work sets the basis for developing other matrix
factorization techniques based on many other semirings using a new proposed
optimization process
Matrix tri-factorization over the tropical semiring
Tropical semiring has proven successful in several research areas, including
optimal control, bioinformatics, discrete event systems, or solving a decision
problem. In previous studies, a matrix two-factorization algorithm based on the
tropical semiring has been applied to investigate bipartite and tripartite
networks. Tri-factorization algorithms based on standard linear algebra are
used for solving tasks such as data fusion, co-clustering, matrix completion,
community detection, and more. However, there is currently no tropical matrix
tri-factorization approach, which would allow for the analysis of multipartite
networks with a high number of parts. To address this, we propose the
triFastSTMF algorithm, which performs tri-factorization over the tropical
semiring. We apply it to analyze a four-partition network structure and recover
the edge lengths of the network. We show that triFastSTMF performs similarly to
Fast-NMTF in terms of approximation and prediction performance when fitted on
the whole network. When trained on a specific subnetwork and used to predict
the whole network, triFastSTMF outperforms Fast-NMTF by several orders of
magnitude smaller error. The robustness of triFastSTMF is due to tropical
operations, which are less prone to predict large values compared to standard
operations.Comment: 14 pages, 8 figures, 3 table
Recommended from our members
Editorial: Mapping Maternal Subjectivities, Identities and Ethics
Editorial with embedded short articles including:-
"Maternal Studies: The Why and Wherefore"(Hollway) highlighted as "unconscious intersubjective dynamics"
"Why Study the Maternal Now?"(Jensen) highlighted as "parenting"
"Thoughts around the Maternal: A Sociological Viewpoint" (T Miller) highlighted as "social locations and structural contexts within which women mother
Hydrogeochemical conditions of submarine and terrestrial karst sulfur springs in the Northern Adriatic
Submarine springs near Izola, in the Northern Adriatic Sea, appear in funnel-shaped depressions and smell strongly of sulfur. Along the Mediterranean coast there are many submarine karst springs containing brackish or fresh water, but submarine sulfur springs are not particularly common. Three submarine sulfur springs and one terrestrial sulfur spring were investigated to better understand the water properties, water–rock interaction within the aquifer, and to explore the origin of the spring water. Groundwater and seawater samples were also collected for comparison. Based on the geological setting, physicochemical parameters, hydrogeochemical data, and stable isotope data (δO, δH, δC, δS, δO), we can affirm that (1) the large concentration of seawater in the submarine springs samples is due to sampling challenges(2) springs recharge from precipitation where confined karst aquifers outcrop(3) deep water circulation is indicated(4) redox conditions can provide a suitable environment for bacterial reduction of the marine or organic sulfate to the odorous HS(5) geological data suggests that the coals beneath the alveolinic-nummulitic limestones are the source of sulfur. A multi-parameter and interdisciplinary approach has proven important in assessing submarine sulfur springs affected by seawater input
Hydrogeochemistry of submarine springs in Izola, Slovenia
Not far from the coast of Izola, Slovenia, are twelve funnel-shaped depressions on the seabed containing springs of warm, sulphurous water. These submarine depressions with springs are divided into three groups: from east to west, Izola, Bele skale, and Ronek. Water with similar thermal and sulphuric properties also occurs in the terrestrial spring in Izola. Some of these submarine springs (M03 - Izola, M05 - Bele skale and M10, M11 - Ronek) and the terrestrial spring (K04) were sampled for hydrogeochemical study. Three sampling campaigns took place in June and July 2020, October 2020, and April 2021. In addition, samples of seawater (SW01, SW02, SW03), groundwater from a well (VK01) and local non-sulphuric springs (K01, K02, K03) were taken for comparison. The submarine spring water was sampled using 100-ml syringes. In April 2021, an 8-litre Niskin water sampler (OceanTest Equipment, Inc.) was used in addition to the syringes. Samples collected in April 2021 using two different methods are labeled with an additional letter: /B for syringes and /N for Niskin. In-situ physicochemical parameters - pH, temperature, electrical conductivity, oxidation-reduction potential, and dissolved oxygen - were measured in the samples from the seawater and the submarine springs using a digital Multi 3430 meter (WTW GmbH, Weilheim, Germany) as soon as they were brought aboard the boat. Measurements of the terrestrial waters were performed on site using the same digital meter. A total of 26 water samples were collected for further analysis. All water samples were collected in HDPE bottles, except those for isotopic composition of dissolved inorganic carbon, which was sampled directly to Labco glass ampoules after 0.45 µm pore-sized membrane filtration. The samples were then chemically treated where required and stored in a refrigerator until analysis. The samples were then prepared for further laboratory analysis: geochemical composition (cations, anions), isotopic composition of oxygen (δ¹⁸O) and hydrogen (δ²H), total alkalinity (TA), isotopic composition of dissolved inorganic carbon (δ¹³CDIC), tritium activity (³H), isotopic ratio of strontium (⁸⁷Sr/⁸⁶Sr), and isotopic composition of sulphur (δ³⁴Sₛ₀₄ ‰) and oxygen (δ¹⁸Oₛ₀₄ ‰) in sulphate. This database contains information on the sampling dates and basic characteristics of the sampling sites, as well as the results from field measurements and laboratory analyses
Polygenic analysis and targeted improvement of the complex trait of high acetic acid tolerance in the yeast Saccharomyces cerevisiae
BACKGROUND: Acetic acid is one of the major inhibitors in lignocellulose hydrolysates used for the production of second-generation bioethanol. Although several genes have been identified in laboratory yeast strains that are required for tolerance to acetic acid, the genetic basis of the high acetic acid tolerance naturally present in some Saccharomyces cerevisiae strains is unknown. Identification of its polygenic basis may allow improvement of acetic acid tolerance in yeast strains used for second-generation bioethanol production by precise genome editing, minimizing the risk of negatively affecting other industrially important properties of the yeast. RESULTS: Haploid segregants of a strain with unusually high acetic acid tolerance and a reference industrial strain were used as superior and inferior parent strain, respectively. After crossing of the parent strains, QTL mapping using the SNP variant frequency determined by pooled-segregant whole-genome sequence analysis revealed two major QTLs. All F1 segregants were then submitted to multiple rounds of random inbreeding and the superior F7 segregants were submitted to the same analysis, further refined by sequencing of individual segregants and bioinformatics analysis taking into account the relative acetic acid tolerance of the segregants. This resulted in disappearance in the QTL mapping with the F7 segregants of a major F1 QTL, in which we identified HAA1, a known regulator of high acetic acid tolerance, as a true causative allele. Novel genes determining high acetic acid tolerance, GLO1, DOT5, CUP2, and a previously identified component, VMA7, were identified as causative alleles in the second major F1 QTL and in three newly appearing F7 QTLs, respectively. The superior HAA1 allele contained a unique single point mutation that significantly improved acetic acid tolerance under industrially relevant conditions when inserted into an industrial yeast strain for second-generation bioethanol production. CONCLUSIONS: This work reveals the polygenic basis of high acetic acid tolerance in S. cerevisiae in unprecedented detail. It also shows for the first time that a single strain can harbor different sets of causative genes able to establish the same polygenic trait. The superior alleles identified can be used successfully for improvement of acetic acid tolerance in industrial yeast strains