Search CORE

7,161 research outputs found

Using ensemble and learning techniques towards extending the knowledge Discovery Pipeline.

Author: Karthigasoo Sakthiaseelan
Manickam Selvakumar
Yu N Cheah
Publication venue: 'Penerbit Universiti Sains Malaysia'
Publication date: 01/01/2002
Field of study

The generation of a huge amount of data by an enterprise is of great concern to decision makers. This problem is compounded by the many environmental challenges that an enterprise faces in the effort to produce better products and services. It is highly crucial to know what goes on in its business transactions both internally and externally and to examine the heart of an enterprise's transactions, that is its data, and to transform it into actionable knowledge through the process of knowledge discovery

Repository@USM

Applying Deep Learning To Airbnb Search

Author: Abdool Mustafa
Barrow-Williams Nick
Collins Brendan M.
Duan Huizhong
Haldar Malay
Legrand Thomas
Ramanathan Prashant
Turnbull Bradley C.
Xu Tao
Yang Shulin
Zhang Qing
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 24/10/2018
Field of study

The application to search ranking is one of the biggest machine learning success stories at Airbnb. Much of the initial gains were driven by a gradient boosted decision tree model. The gains, however, plateaued over time. This paper discusses the work done in applying neural networks in an attempt to break out of that plateau. We present our perspective not with the intention of pushing the frontier of new modeling techniques. Instead, ours is a story of the elements we found useful in applying neural networks to a real life product. Deep learning was steep learning for us. To other teams embarking on similar journeys, we hope an account of our struggles and triumphs will provide some useful pointers. Bon voyage!Comment: 8 page

arXiv.org e-Print Archive

Crossref

An ADMM Based Framework for AutoML Pipeline Configuration

Author: Bouneffouf Djallel
Bramble Gregory
Conn Andrew
Gray Alexander
Liu Sijia
Ram Parikshit
Samulowitz Horst
Vijaykeerthy Deepak
Wang Dakuo
Publication venue
Publication date: 06/12/2019
Field of study

We study the AutoML problem of automatically configuring machine learning pipelines by jointly selecting algorithms and their appropriate hyper-parameters for all steps in supervised learning pipelines. This black-box (gradient-free) optimization with mixed integer & continuous variables is a challenging problem. We propose a novel AutoML scheme by leveraging the alternating direction method of multipliers (ADMM). The proposed framework is able to (i) decompose the optimization problem into easier sub-problems that have a reduced number of variables and circumvent the challenge of mixed variable categories, and (ii) incorporate black-box constraints along-side the black-box optimization objective. We empirically evaluate the flexibility (in utilizing existing AutoML techniques), effectiveness (against open source AutoML toolkits),and unique capability (of executing AutoML with practically motivated black-box constraints) of our proposed scheme on a collection of binary classification data sets from UCI ML& OpenML repositories. We observe that on an average our framework provides significant gains in comparison to other AutoML frameworks (Auto-sklearn & TPOT), highlighting the practical advantages of this framework

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

The Hierarchic treatment of marine ecological information from spatial networks of benthic platforms

Author: Aguzzi Jacopo
Bonofiglio Federico
Chatzievangelou Damianos
Danovaro Roberto
Francescangeli Marco
Marini Simone
Río Fernandez Joaquín del
Publication venue: 'MDPI AG'
Publication date: 01/01/2020
Field of study

Measuring biodiversity simultaneously in different locations, at different temporal scales, and over wide spatial scales is of strategic importance for the improvement of our understanding of the functioning of marine ecosystems and for the conservation of their biodiversity. Monitoring networks of cabled observatories, along with other docked autonomous systems (e.g., Remotely Operated Vehicles [ROVs], Autonomous Underwater Vehicles [AUVs], and crawlers), are being conceived and established at a spatial scale capable of tracking energy fluxes across benthic and pelagic compartments, as well as across geographic ecotones. At the same time, optoacoustic imaging is sustaining an unprecedented expansion in marine ecological monitoring, enabling the acquisition of new biological and environmental data at an appropriate spatiotemporal scale. At this stage, one of the main problems for an effective application of these technologies is the processing, storage, and treatment of the acquired complex ecological information. Here, we provide a conceptual overview on the technological developments in the multiparametric generation, storage, and automated hierarchic treatment of biological and environmental information required to capture the spatiotemporal complexity of a marine ecosystem. In doing so, we present a pipeline of ecological data acquisition and processing in different steps and prone to automation. We also give an example of population biomass, community richness and biodiversity data computation (as indicators for ecosystem functionality) with an Internet Operated Vehicle (a mobile crawler). Finally, we discuss the software requirements for that automated data processing at the level of cyber-infrastructures with sensor calibration and control, data banking, and ingestion into large data portals.Peer ReviewedPostprint (published version

Multidisciplinary Digital Publishing Institute

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

IRIS UniversitÃ Politecnica delle Marche

Digital.CSIC

Applying AutoML techniques in drug discovery: systematic modelling of antimicrobial drug activity on a wide spectrum of pathogens

Author: Torre García Marcos de la
Publication venue
Publication date: 15/01/2023
Field of study

Treballs finals del Màster de Fonaments de Ciència de Dades, Facultat de matemàtiques, Universitat de Barcelona. Curs: 2022-2023. Tutor: Miquel Duran Frigola i Jordi Vitrià i Marca[en] Predictive modelling of antimicrobial activity of molecules is a crucial step towards the discovery of anti-infective medicines. Unfortunately, there is a shortage of models covering endemic pathogens of the Global South, reflecting the existing bias in research towards diseases prevalent in wealthy countries. This project has developed a pipeline to systematically build drug discovery models, in particular antimicrobial activity prediction models for small molecule compounds. The data of assay results on a selected pathogen is extracted from a publicly available database: ChEMBL. This data is then cleaned and processed in order to build predictive models with various Automated Machine Learning (AutoML) techniques using the ZairaChem tool from the Ersilia Open Data Initiative. The pipeline has been applied on 6 pathogens of great relevance to global health known as ESKAPE, for which the data has been obtained and processed, and baseline models created. We have built the full set of final models for one of these pathogens, Staphylococcus aureus. The pipeline can be used on any other pathogen for which ChEMBL has sufficient data. This pipeline will be used to deploy models in the Ersilia Model Hub, a repository of pre-trained ML for drug discovery in global health. This will be an opportunity to compensate for the shortage of ML models adapted to the needs of the Global South

Diposit Digital de la Universitat de Barcelona

Automatic extension of corpora from the intelligent ensembling of eHealth knowledge discovery systems outputs

Author: Almeida-Cruz Yudivian
Consuegra-Ayala Juan Pablo
Gutiérrez Yoan
Palomar Manuel
Piad-Morffis Alejandro
Publication venue: 'Elsevier BV'
Publication date: 01/04/2021
Field of study

Corpora are one of the most valuable resources at present for building machine learning systems. However, building new corpora is an expensive task, which makes the automatic extension of corpora a highly attractive task to develop. Hence, finding new strategies that reduce the cost and effort involved in this task, while at the same time guaranteeing quality, remains an open and important challenge for the research community. In this paper, we present a set of ensembling strategies oriented toward entity and relation extraction tasks. The main goal is to combine several automatically annotated versions of corpora to produce a single version with improved quality. An ensembler is built by exploring a configuration space in search of the combination that maximizes the fitness of the ensembled collection according to a reference collection. The eHealth-KD 2019 challenge was chosen for the case study. The submitted systems’ outputs were ensembled, resulting in the construction of an automatically annotated collection of 8000 sentences. We show that using this collection as additional training input for a baseline algorithm has a positive impact on its performance. Additionally, the ensembling pipeline was used as a participant system in the 2020 edition of the challenge. The ensembled run achieved a slightly better performance than the individual runs.This research has been partially funded by the University of Alicante and the University of Havana, the Generalitat Valenciana (Conselleria d’Educació, Investigació, Cultura i Esport) and the Spanish Government through the projects LIVING-LANG (RTI2018-094653-B-C22) and SIIA (PROMETEO/2018/089, PROMETEU/2018/089). Moreover, it has been backed by the work of both COST Actions: CA19134 - “Distributed Knowledge Graphs” and CA19142 - “Leading Platform for European Citizens, Industries, Academia and Policymakers in Media Accessibility”

Repositorio Institucional de la Universidad de Alicante