2,047 research outputs found

    Machine Learning and Integrative Analysis of Biomedical Big Data.

    Get PDF
    Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues

    Predictive maintenance of electrical grid assets: internship at EDP Distribuição - Energia S.A

    Get PDF
    Internship Report presented as the partial requirement for obtaining a Master's degree in Information Management, specialization in Knowledge Management and Business IntelligenceThis report will describe the activities developed during an internship at EDP Distribuição, focusing on a Predictive Maintenance analytics project directed at high voltage electrical grid assets including Overhead Lines, Power Transformers and Circuit Breakers. The project’s main goal is to support EDP’s asset management processes by improving maintenance and investing planning. The project’s main deliverables are the Probability of Failure metric that forecast asset failures 15 days ahead of time, estimated through supervised machine learning models; the Health Index metric that indicates asset’s current state and condition, implemented though the Ofgem methodology; and two asset management dashboards. The project was implemented by an external service provider, a consultant company, and during the internship it was possible to integrate the team, and participate in the development activities

    Benchmarking business analytics techniques in Big Data

    Get PDF
    Technological developments and the growing dependence of organizations and society in the world of the internet led to the growth and variety of data. This growth and variety have become a challenge to the traditional techniques of Business Analytics. In this project, we conducted a benchmarking process that aimed to assess the performance of some Data Mining tools, like RapidMiner, in Big Data environment. Firstly, was analyzed a study where a group of Data Mining tools are evaluated and determined what is the best Data Mining tool, according to the evaluation criteria. After that, the best two tools considered in the study are analyzed regarding their ability to analyze data in a Big Data environment. Finally, studies were carried out on the evaluations of the RapidMiner and KNIME tools for their performance in the Big Data environment.This work has been supported by national funds through FCT -Fundacao para a Ciencia e Tecnologia within the Project Scope: UID/CEC/00319/2019 and Deus ex Machina (DEM): Symbiotic technology for societal efficiency gains -NORTE-01-0145-FEDER-000026

    Flight Data of Airplane for Wind Forecasting

    Get PDF
    This research solely focuses on understanding and predicting weather behavior, which is one of the important factors that affect airplanes in flight. The future weather information is used for informing pilots about changing flight conditions. In this paper, we present a new approach towards forecasting one component of weather information, wind speed, from data captured by airplanes in flight. We compare NASA’s ACT-America project against NOAA’s Wind Aloft program for prediction suitability. A collinearity analysis between these datasets reveals better model performance and smaller test error with NASA’s dataset. We then apply machine learning and a genetic algorithm to process the data further and arrive at a competitive error rate. The sliding window approach is used to find the best window size, and then we create a forecasting model that predicts wind speed at high altitudes 10 mins ahead of time. Finally, a stacking-based framework was used for better performance than individual learning algorithms to get root means square error (RMSE) of the best combination as 0.674, which is 98.4% better than the state-of-the-art approach

    Does Activity Engagement Protect Against Cognitive Decline in Old Age? Methodological and Analytical Considerations

    Get PDF
    The literature about relationships between activity engagement and cognitive performance is abundant yet inconclusive. Some studies report that higher activity engagement leads to lower cognitive decline; others report no functional links, or that higher cognitive performance leads to less decline in activity engagement. We first discuss some methodological and analytical features that may contribute to the divergent findings. We then apply a longitudinal dynamic structural equation model to five repeated measurements of the Swiss Interdisciplinary Longitudinal Study on the Oldest Old. Performance on perceptual speed and verbal fluency tasks was analyzed in relation to six different activity composite scores. Results suggest that increased media and leisure activity engagement may lessen decline in perceptual speed, but not in verbal fluency or performance, whereas cognitive performance does not effect change in activity engagemen

    Extracting and Cleaning RDF Data

    Get PDF
    The RDF data model has become a prevalent format to represent heterogeneous data because of its versatility. The capability of dismantling information from its native formats and representing it in triple format offers a simple yet powerful way of modelling data that is obtained from multiple sources. In addition, the triple format and schema constraints of the RDF model make the RDF data easy to process as labeled, directed graphs. This graph representation of RDF data supports higher-level analytics by enabling querying using different techniques and querying languages, e.g., SPARQL. Anlaytics that require structured data are supported by transforming the graph data on-the-fly to populate the target schema that is needed for downstream analysis. These target schemas are defined by downstream applications according to their information need. The flexibility of RDF data brings two main challenges. First, the extraction of RDF data is a complex task that may involve domain expertise about the information required to be extracted for different applications. Another significant aspect of analyzing RDF data is its quality, which depends on multiple factors including the reliability of data sources and the accuracy of the extraction systems. The quality of the analysis depends mainly on the quality of the underlying data. Therefore, evaluating and improving the quality of RDF data has a direct effect on the correctness of downstream analytics. This work presents multiple approaches related to the extraction and quality evaluation of RDF data. To cope with the large amounts of data that needs to be extracted, we present DSTLR, a scalable framework to extract RDF triples from semi-structured and unstructured data sources. For rare entities that fall on the long tail of information, there may not be enough signals to support high-confidence extraction. Towards this problem, we present an approach to estimate property values for long tail entities. We also present multiple algorithms and approaches that focus on the quality of RDF data. These include discovering quality constraints from RDF data, and utilizing machine learning techniques to repair errors in RDF data
    • …
    corecore