33 research outputs found

    The phenotypic landscape of a Saccharomyces cerevisiae strain collection

    Get PDF
    Within our previous work [1] we developed computational models to predict strains with specific phenotypes (e.g. low ethanol resistance, growth at 30ºC and growth in media containing galactose, raffinose or urea) from microsatellite allelic patterns. The objective of the present work was to gain deeper understanding of the phenotypic diversity of a heterogeneous Saccharomyces cerevisiae strain collection, using a large battery of tests with biotechnological relevance, and apply computational data mining algorithms to predict a strain´s potential to be used as a winemaking strain from a few selected phenotypic data. A S. cerevisiae collection was constituted, comprising 172 strains of different geographical origins and technological uses (winemaking, brewing, bakery, distillery, etc.). Phenotypic screening was performed considering 30 physiological traits that are important from an oenological point of view, such as ethanol tolerance, growth in synthetic must media at various temperatures or resistance to fungicides. Data was analyzed using Principal Component Analysis and some phenotypes were identified (growth in the presence of potassium bisulfite, growth at 40˚C, and resistance to ethanol) as being responsible for the highest strain variability. Statistical analysis revealed relevant associations between several phenotypes and the strains technological use. Based on the phenotypic data, naїve Bayesian classifier, as implemented in the software Orange [2], correctly assigned (AUC=0.70) most of strains from vineyards (73%) and commercial strains (77%) to the respective group. Data mining approaches identified, for the group of commercial strains, 18 phenotypic tests with the highest weight. Globally, the growth patterns of this group of strains in must containing iprodion (0,05mg/mL) or cycloheximide (0,1µg/mL) revealed to have the highest predictive score for the assignment of a strain as a commercial strain. The results obtained herein demonstrate the potential of computational approaches to explore phenotypic variability and to predict the probability of a S. cerevisiae strain to be used as a commercial strain.Fundação para a Ciência e a Tecnologia (FCT

    Insights into the efficiencies of on-shore wind turbines: a data-centric analysis

    Get PDF
    Literature on renewable energy alternative of wind turbines does not include a multidimensional benchmarking studythat can help investment decisions as well as design processes. This paper presents a data-centric analysis of commercial on-shore wind turbines and provides actionable insights through analytical benchmarking through Data Envelopment Analysis (DEA), visual data analysis, and statistical hypothesis testing. The paper also introduces a novel visualization approach for the understanding and the interpretation of reference sets, the set of efficient wind turbines that should be taken as benchmark by inefficient ones

    Metoda razvrščanja z združevanjem najbližjih sosedov v programu Orange

    Get PDF
    Neighbour joining builds phylogenetic trees from distance matrices. It is mainly used in bioinformatics for inference of evolutional relationships between species and prediction of common ancestors. Despite its usefulness in various applications it is rarely available in data mining programs. For this reason we implemented neighbour joining as a widget in general-purpose data mining suite Orange. We also developed several methods for visualisation of the inferred phylogenetic trees. Using our implementation on several use cases we have demonstrated that neighbour joining and constructed clustering trees are useful in data mining tasks outside the scope of bioinformatics

    Anchorage Arrival Scheduling Under Off-Nominal Weather Conditions

    Get PDF
    Weather can cause flight diversions, passenger delays, additional fuel consumption and schedule disruptions at any high volume airport. The impacts are particularly acute at the Ted Stevens Anchorage International Airport in Anchorage, Alaska due to its importance as a major international portal. To minimize the impacts due to weather, a multi-stage scheduling process is employed that is iteratively executed, as updated aircraft demand and/or airport capacity data become available. The strategic scheduling algorithm assigns speed adjustments for flights that originate outside of Anchorage Center to achieve the proper demand and capacity balance. Similarly, an internal departure-scheduling algorithm assigns ground holds for pre-departure flights that originate from within Anchorage Center. Tactical flight controls in the form of airborne holding are employed to reactively account for system uncertainties. Real-world scenarios that were derived from the January 16, 2012 Anchorage visibility observations and the January 12, 2012 Anchorage arrival schedule were used to test the initial implementation of the scheduling algorithm in fast-time simulation experiments. Although over 90% of the flights in the scenarios arrived at Anchorage without requiring any delay, pre-departure scheduling was the dominant form of control for Anchorage arrivals. Additionally, tactical scheduling was used extensively in conjunction with the pre-departure scheduling to reactively compensate for uncertainties in the arrival demand. For long-haul flights, the strategic scheduling algorithm performed best when the scheduling horizon was greater than 1,000 nmi. With these long scheduling horizons, it was possible to absorb between ten and 12 minutes of delay through speed control alone. Unfortunately, the use of tactical scheduling, which resulted in airborne holding, was found to increase as the strategic scheduling horizon increased because of the additional uncertainty in the arrival times of the aircraft. Findings from these initial experiments indicate that it is possible to schedule arrivals into Anchorage with minimal delays under low-visibility conditions with less disruption to high-cost, international flights

    Re-mining association mining results through visualization, data envelopment analysis, and decision trees

    Get PDF
    Re-mining is a general framework which suggests the execution of additional data mining steps based on the results of an original data mining process. This study investigates the multi-faceted re-mining of association mining results, develops and presents a practical methodology, and shows the applicability of the developed methodology through real world data. The methodology suggests re-mining using data visualization, data envelopment analysis, and decision trees. Six hypotheses, regarding how re-mining can be carried out on association mining results, are answered in the case study through empirical analysis

    Constrained decoding for text-level discourse parsing

    Get PDF
    International audienceThis paper presents a novel approach to document-based discourse analysis by performing a global A* search over the space of possible structures while optimizing a global criterion over the set of potential coherence relations. Existing approaches to discourse analysis have so far relied on greedy search strategies or restricted themselves to sentence-level discourse parsing. Another advantage of our approach, over other global alternatives (like Maximum Spanning Tree decoding algorithms), is its flexibility in being able to integrate constraints (including linguistically motivated ones like the Right Frontier Constraint). Finally, our paper provides the first discourse parsing system for French; our evaluation is carried out on the Annodis corpus. While using a lot less training data than earlier approaches than previous work on English, our system manages to achieve state-of-the-art results, with F1-scores of 66.2 and 46.8 when compared to unlabeled and labeled reference structures

    Conformal Prediction with Orange

    Get PDF
    Conformal predictors estimate the reliability of outcomes made by supervised machine learning models. Instead of a point value, conformal prediction defines an outcome region that meets a user-specified reliability threshold. Provided that the data are independently and identically distributed, the user can control the level of the prediction errors and adjust it following the requirements of a given application. The quality of conformal predictions often depends on the choice of nonconformity estimate for a given machine learning method. To promote the selection of a successful approach, we have developed Orange3-Conformal, a Python library that provides a range of conformal prediction methods for classification and regression. The library also implements several nonconformity scores. It has a modular design and can be extended to add new conformal prediction methods and nonconformities

    Duomenų tyrybos sistemos, pagrįstos saityno paslaugomis

    Get PDF
    Straipsnis skirtas duomenų tyrybos, pagrįstos saityno paslaugomis, analizei. Apibrėžiamos pagrindinės su saityno paslaugomis susijusios sąvokos. Pristatomos paskirstytosios duomenų tyrybos galimybės bei jų įgyvendinimo priemonės – Grid, Hadoop. Atliekama duomenų tyrybos sistemų, pagrįstų saityno paslaugomis, analitinė apžvalga. Parenkami sistemų palyginimo kriterijai. Pagal šiuos kriterijus atliekama populiariausių duomenų tyrybos sistemų, pagrįstų saityno paslaugomis, lyginamoji analizė. Nustatoma, kurios sistemos įvertinamos geriausiai, o kurios neatitinka daugumos kriterijų.Data mining systems, based on Web services Olga Kurasova, Virginijus Marcinkevičius, Viktor Medvedev, Aurimas Rapečka SummaryIn the paper, data mining systems, based on web services, are analysed. The main notation related with web services is described. The possibilities of distributed data mining and their implementation tools – Grid, Hadoop are introduced. An analytical review of the data mining systems, based on web services, is provided. Some comparison criteria are selected. According to the criteria, a comparative analysis of the popular data mining systems, based on web services, is made. The paper illustrates, which systems are best for evaluating and which do not satisfy most of the criteria.11pt; line-height: 115%; font-family: Calibri, sans-serif;">&nbsp

    Obesity resistant mechanisms in the Lean polygenic mouse model as indicated by liver transcriptome and expression of selected genes in skeletal muscle

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Divergently selected Lean and Fat mouse lines represent unique models for a polygenic form of resistance and susceptibility to obesity development. Previous research on these lines focused mainly on obesity-susceptible factors in the Fat line. This study aimed to examine the molecular basis of obesity-resistant mechanisms in the Lean line by analyzing various fat depots and organs, the liver transcriptome of selected metabolic pathways, plasma and lipid homeostasis and expression of selected skeletal muscle genes.</p> <p>Results</p> <p>Expression profiling using our custom Steroltalk v2 microarray demonstrated that Lean mice exhibit a higher hepatic expression of cholesterol biosynthesis genes compared to the Fat line, although this was not reflected in elevation of total plasma or liver cholesterol. However, FPLC analysis showed that protective HDL cholesterol was elevated in Lean mice. A significant difference between the strains was also found in bile acid metabolism. Lean mice had a higher expression of <it>Cyp8b1</it>, a regulatory enzyme of bile acid synthesis, and the <it>Abcb11 </it>bile acid transporter gene responsible for export of acids to the bile. Additionally, a higher content of blood circulating bile acids was observed in Lean mice. Elevated HDL and upregulation of some bile acids synthesis and transport genes suggests enhanced reverse cholesterol transport in the Lean line - the flux of cholesterol out of the body is higher which is compensated by upregulation of endogenous cholesterol biosynthesis. Increased skeletal muscle <it>Il6 </it>and <it>Dio2 </it>mRNA levels as well as increased activity of muscle succinic acid dehydrogenase (SDH) in the Lean mice demonstrates for the first time that changes in muscle energy metabolism play important role in the Lean line phenotype determination and corroborate our previous findings of increased physical activity and thermogenesis in this line. Finally, differential expression of <it>Abcb11 </it>and <it>Dio2 </it>identifies novel strong positional candidate genes as they map within the quantitative trait loci (QTL) regions detected previously in crosses between the Lean and Fat mice.</p> <p>Conclusion</p> <p>We identified novel candidate molecular targets and metabolic changes which can at least in part explain resistance to obesity development in the Lean line. The major difference between the Lean and Fat mice was in increased liver cholesterol biosynthesis gene mRNA expression, bile acid metabolism and changes in selected muscle genes' expression in the Lean line. The liver <it>Abcb11 </it>and muscle <it>Dio2 </it>were identified as novel positional candidate genes to explain part of the phenotypic difference between the Lean and Fat lines.</p

    Generalized Query-Based Active Learning to Identify Differentially Methylated Regions in DNA

    Get PDF
    Active learning is a supervised learning technique that reduces the number of examples required for building a successful classifier, because it can choose the data it learns from. This technique holds promise for many biological domains in which classified examples are expensive and time-consuming to obtain. Most traditional active learning methods ask very specific queries to the Oracle (e.g., a human expert) to label an unlabeled example. The example may consist of numerous features, many of which are irrelevant. Removing such features will create a shorter query with only relevant features, and it will be easier for the Oracle to answer. We propose a generalized query-based active learning (GQAL) approach that constructs generalized queries based on multiple instances. By constructing appropriately generalized queries, we can achieve higher accuracy compared to traditional active learning methods. We apply our active learning method to find differentially DNA methylated regions (DMRs). DMRs are DNA locations in the genome that are known to be involved in tissue differentiation, epigenetic regulation, and disease. We also apply our method on 13 other data sets and show that our method is better than another popular active learning technique
    corecore