72 research outputs found

    Learning with con gurable operators and RL-based heuristics

    Full text link
    In this paper, we push forward the idea of machine learning systems for which the operators can be modi ed and netuned for each problem. This allows us to propose a learning paradigm where users can write (or adapt) their operators, according to the problem, data representation and the way the information should be navigated. To achieve this goal, data instances, background knowledge, rules, programs and operators are all written in the same functional language, Erlang. Since changing operators a ect how the search space needs to be explored, heuristics are learnt as a result of a decision process based on reinforcement learning where each action is de ned as a choice of operator and rule. As a result, the architecture can be seen as a `system for writing machine learning systems' or to explore new operators.This work was supported by the MEC projects CONSOLIDER-INGENIO 26706 and TIN 2010-21062-C02-02, GVA project PROMETEO/2008/051, and the REFRAME project granted by the European Coordinated Research on Long-term Challenges in Information and Communication Sciences & Technologies ERA-Net (CHIST-ERA), and funded by the Ministerio de Econom´ıa y Competitividad in Spain. Also, F. Mart´ınez-Plumed is supported by FPI-ME grant BES-2011-045099Martínez Plumed, F.; Ferri Ramírez, C.; Hernández Orallo, J.; Ramírez Quintana, MJ. (2013). Learning with con gurable operators and RL-based heuristics. En New Frontiers in Mining Complex Patterns. Springer Verlag (Germany). 7765:1-16. https://doi.org/10.1007/978-3-642-37382-4_1S1167765Armstrong, J.: A history of erlang. In: Proceedings of the Third ACM SIGPLAN Conf. on History of Programming Languages, HOPL III, pp. 1–26. ACM (2007)Brazdil, P., Giraud-Carrier: Metalearning: Concepts and systems. In: Metalearning. Cognitive Technologies, pp. 1–10. Springer, Heidelberg (2009)Daumé III, H., Langford, J.: Search-based structured prediction (2009)Dietterich, T., Domingos, P., Getoor, L., Muggleton, S., Tadepalli, P.: Structured machine learning: the next ten years. Machine Learning 73, 3–23 (2008)Dietterich, T.G., Lathrop, R., Lozano-Perez, T.: Solving the multiple-instance problem with axis-parallel rectangles. Artificial Intelligence 89, 31–71 (1997)Džeroski, S.: Towards a general framework for data mining. In: Džeroski, S., Struyf, J. (eds.) KDID 2006. LNCS, vol. 4747, pp. 259–300. Springer, Heidelberg (2007)Dzeroski, S., De Raedt, L., Driessens, K.: Relational reinforcement learning. Machine Learning 43, 7–52 (2001), 10.1023/A:1007694015589Dzeroski, S., Lavrac, N. (eds.): Relational Data Mining. Springer (2001)Estruch, V., Ferri, C., Hernández-Orallo, J., Ramírez-Quintana, M.J.: Similarity functions for structured data. an application to decision trees. Inteligencia Artificial, Revista Iberoamericana de Inteligencia Artificial 10(29), 109–121 (2006)Estruch, V., Ferri, C., Hernández-Orallo, J., Ramírez-Quintana, M.J.: Web categorisation using distance-based decision trees. ENTCS 157(2), 35–40 (2006)Estruch, V., Ferri, C., Hernández-Orallo, J., Ramírez-Quintana, M.J.: Bridging the Gap between Distance and Generalisation. Computational Intelligence (2012)Ferri-Ramírez, C., Hernández-Orallo, J., Ramírez-Quintana, M.J.: Incremental learning of functional logic programs. In: Kuchen, H., Ueda, K. (eds.) FLOPS 2001. LNCS, vol. 2024, pp. 233–247. Springer, Heidelberg (2001)Gärtner, T.: Kernels for Structured Data. PhD thesis, Universitat Bonn (2005)Holland, J.H., Booker, L.B., Colombetti, M., Dorigo, M., Goldberg, D.E., Forrest, S., Riolo, R.L., Smith, R.E., Lanzi, P.L., Stolzmann, W., Wilson, S.W.: What is a learning classifier system? In: Lanzi, P.L., Stolzmann, W., Wilson, S.W. (eds.) IWLCS 1999. LNCS (LNAI), vol. 1813, pp. 3–32. Springer, Heidelberg (2000)Holmes, J.H., Lanzi, P., Stolzmann, W.: Learning classifier systems: New models, successful applications. Information Processing Letters (2002)Kitzelmann, E.: Inductive programming: A survey of program synthesis techniques. In: Schmid, U., Kitzelmann, E., Plasmeijer, R. (eds.) AAIP 2009. LNCS, vol. 5812, pp. 50–73. Springer, Heidelberg (2010)Koller, D., Sahami, M.: Hierarchically classifying documents using very few words. In: Proceedings of the Fourteenth International Conference on Machine Learning, ICML 1997, pp. 170–178. Morgan Kaufmann Publishers Inc., San Francisco (1997)Lafferty, J., McCallum, A.: Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: ICML 2001, pp. 282–289 (2001)Lloyd, J.W.: Knowledge representation, computation, and learning in higher-order logic (2001)Maes, F., Denoyer, L., Gallinari, P.: Structured prediction with reinforcement learning. Machine Learning Journal 77(2-3), 271–301 (2009)Martínez-Plumed, F., Estruch, V., Ferri, C., Hernández-Orallo, J., Ramírez-Quintana, M.J.: Newton trees. In: Li, J. (ed.) AI 2010. LNCS, vol. 6464, pp. 174–183. Springer, Heidelberg (2010)Muggleton, S.: Inverse entailment and Progol. New Generation Computing (1995)Muggleton, S.H.: Inductive logic programming: Issues, results, and the challenge of learning language in logic. Artificial Intelligence 114(1-2), 283–296 (1999)Plotkin, G.: A note on inductive generalization. Machine Intelligence 5 (1970)Schmidhuber, J.: Optimal ordered problem solver. Maching Learning 54(3), 211–254 (2004)Srinivasan, A.: The Aleph Manual (2004)Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press (1998)Tadepalli, P., Givan, R., Driessens, K.: Relational reinforcement learning: An overview. In: Proc. of the Workshop on Relational Reinforcement Learning (2004)Tamaddoni-Nezhad, A., Muggleton, S.: A genetic algorithms approach to ILP. In: Matwin, S., Sammut, C. (eds.) ILP 2002. LNCS (LNAI), vol. 2583, pp. 285–300. Springer, Heidelberg (2003)Tsochantaridis, I., Hofmann, T., Joachims, T., Altun, Y.: Support vector machine learning for interdependent and structured output spaces. In: ICML (2004)Wallace, C.S., Dowe, D.L.: Refinements of MDL and MML coding. Comput. J. 42(4), 330–337 (1999)Watkins, C., Dayan, P.: Q-learning. Machine Learning 8, 279–292 (1992

    A computational analysis of general intelligence tests for evaluating cognitive development

    Full text link
    [EN] The progression in several cognitive tests for the same subjects at different ages provides valuable information about their cognitive development. One question that has caught recent interest is whether the same approach can be used to assess the cognitive development of artificial systems. In particular, can we assess whether the fluid or crystallised intelligence of an artificial cognitive system is changing during its cognitive development as a result of acquiring more concepts? In this paper, we address several IQ tests problems (odd-one-out problems, Raven s Progressive Matrices and Thurstone s letter series) with a general learning system that is not particularly designed on purpose to solve intelligence tests. The goal is to better understand the role of the basic cognitive perational constructs (such as identity, difference, order, counting, logic, etc.) that are needed to solve these intelligence test problems and serve as a proof-of-concept for evaluation in other developmental problems. From here, we gain some insights into the characteristics and usefulness of these tests and how careful we need to be when applying human test problems to assess the abilities and cognitive development of robots and other artificial cognitive systems.This work has been partially supported by the EU (FEDER) and the Spanish MINECO under grants TIN 2015-69175-C4-1-R and TIN 2013-45732-C4-1-P, and by Generalitat Valenciana under grant PROMETEOII/2015/013.Martínez-Plumed, F.; Ferri Ramírez, C.; Hernández-Orallo, J.; Ramírez Quintana, MJ. (2017). A computational analysis of general intelligence tests for evaluating cognitive development. Cognitive Systems Research. 43:100-118. https://doi.org/10.1016/j.cogsys.2017.01.006S1001184

    Can language models automate data wrangling?

    Full text link
    [EN] The automation of data science and other data manipulation processes depend on the integration and formatting of 'messy' data. Data wrangling is an umbrella term for these tedious and time-consuming tasks. Tasks such as transforming dates, units or names expressed in different formats have been challenging for machine learning because (1) users expect to solve them with short cues or few examples, and (2) the problems depend heavily on domain knowledge. Interestingly, large language models today (1) can infer from very few examples or even a short clue in natural language, and (2) can integrate vast amounts of domain knowledge. It is then an important research question to analyse whether language models are a promising approach for data wrangling, especially as their capabilities continue growing. In this paper we apply different variants of the language model Generative Pre-trained Transformer (GPT) to five batteries covering a wide range of data wrangling problems. We compare the effect of prompts and few-shot regimes on their results and how they compare with specialised data wrangling systems and other tools. Our major finding is that they appear as a powerful tool for a wide range of data wrangling tasks. We provide some guidelines about how they can be integrated into data processing pipelines, provided the users can take advantage of their flexibility and the diversity of tasks to be addressed. However, reliability is still an important issue to overcome.Open Access funding provided thanks to the CRUE-CSIC agreement with Springer Nature. This work was funded by the Future of Life Institute, FLI, under grant RFP2-152, the MIT-Spain - INDITEX Sustainability Seed Fund under project COST-OMIZE, the EU (FEDER) and Spanish MINECO under RTI2018-094403-B-C32 and PID2021-122830OB-C42, Generalitat Valenciana under PROMETEO/2019/098 and INNEST/2021/317, EU's Horizon 2020 research and innovation programme under grant agreement No. 952215 (TAILOR) and US DARPA HR00112120007 ReCOG-AI. AcknowledgementsWe thank Lidia Contreras for her help with the Data Wrangling Dataset Repository. We thank the anonymous reviewers from ECMLPKDD Workshop on Automating Data Science (ADS2021) and the anonymous reviewers of this special issue for their comments.Jaimovitch-López, G.; Ferri Ramírez, C.; Hernández-Orallo, J.; Martínez-Plumed, F.; Ramírez Quintana, MJ. (2023). Can language models automate data wrangling?. Machine Learning. 112(6):2053-2082. https://doi.org/10.1007/s10994-022-06259-920532082112

    Environmental Perception of the Housewives in the Communities of the Alta Sierra Tarahumara, Chihuahua, Mexico

    Get PDF
    In order to explore the perceptions of housewives who live in rural communities with large indigenous populations, a study was conducted in a community of the Sierra Madre of Chihuahua, northern Mexico, an area with a high level of marginalization. One hundred and twelve structured interviews were conducted based on a questionnaire that included such aspects as: socio-economic profile, background and basic knowledge about the environment, environmental issues, impacts of economic activities and priority issues for information and training. It was found that the ethnic, age, educational level and status of mothers, are important variables that influence how the environment is perceived. The depletion of vital resources (water and oxygen) is perceived as the most important impact of the overexploitation of natural resources; the Mestizo women showed a greater knowledge of the environment as compared the Indigenous ones. Keywords: environmental perception, housewives, rural communities, Sierra Tarahumar

    Screening of contaminants of emerging concern in surface water and wastewater effluents, assisted by the Persistency-Mobility-Toxicity Criteria

    Get PDF
    Contaminants of emerging concern (CECs) are compounds of diverse origins that have not been deeply studied in the past which are now accruing growing environmental interest. The NOR-Water project aimed to identify the main CECs and their sources in the water environment of Northern Portugal–Galicia (located in northwest Spain) transnational region. To achieve these goals, a suspect screening analytical methodology based on the use of liquid chromatography coupled to high resolution mass spectrometry (LC-HRMS) was applied to 29 sampling sites in two campaigns. These sampling sites included river and sea water, as well as treated wastewater. The screening was driven by a library of over 3500 compounds, which included 604 compounds prioritized from different relevant lists on the basis of the persistency, mobility, and toxicity criteria. Thus, a total of 343 chemicals could be tentatively identified in the analyzed samples. This list of 343 identified chemicals was submitted to the classification workflow used for prioritization and resulted in 153 chemicals tentatively classified as persistent, mobile, and toxic (PMT) and 23 as very persistent and very mobile (vMvP), pinpointing the relevance of these types of chemicals in the aqueous environment. Pharmaceuticals, such as the antidepressant venlafaxine or the antipsychotic sulpiride, and industrial chemicals, especially high production volume chemicals (HPVC) such as ε-caprolactam, were the groups of compounds that were detected at the highest frequencies.This research was funded by Xunta de Galicia (ED431C 2021/06) and the European Regional Development Fund through the Interreg V-A Spain-Portugal Programme (POCTEP) 2014-2020 (ref. 0725_NOR_WATER_1_P). R. M. acknowledges Banco Santander and Universidade de Santiago de Compostela for her outstanding researcher contract and N. A. acknowledges the Portuguese Foundation for Science and Technology (FCT) for his Ph.D. grant DFA/BD/6218/2020.S

    The impact from survey depth and resolution on the morphological classification of galaxies

    Get PDF
    We consistently analyse for the first time the impact of survey depth and spatial resolution on the most used morphological parameters for classifying galaxies through non-parametric methods: Abraham and Conselice-Bershady concentration indices, Gini, M20moment of light, asymmetry, and smoothness. Three different non-local data sets are used, Advanced Large Homogeneous Area Medium Band Redshift Astronomical (ALHAMBRA) and Subaru/XMMNewton Deep Survey (SXDS, examples of deep ground-based surveys), and Cosmos Evolution Survey (COSMOS, deep space-based survey). We used a sample of 3000 local, visually classified galaxies, measuring their morphological parameters at their real redshifts (z ~ 0). Then we simulated them to match the redshift and magnitude distributions of galaxies in the non-local surveys. The comparisons of the two sets allow us to put constraints on the use of each parameter for morphological classification and evaluate the effectiveness of the commonly used morphological diagnostic diagrams. All analysed parameters suffer from biases related to spatial resolution and depth, the impact of the former being much stronger. When including asymmetry and smoothness in classification diagrams, the noise effects must be taken into account carefully, especially for ground-based surveys. M20 is significantly affected, changing both the shape and range of its distribution at all brightness levels. We suggest that diagnostic diagrams based on 2-3 parameters should be avoided when classifying galaxies in ground-based surveys, independently of their brightness; for COSMOS they should be avoided for galaxies fainter than F814 = 23.0. These results can be applied directly to surveys similar to ALHAMBRA, SXDS and COSMOS, and also can serve as an upper/lower limit for shallower/deeper ones.MP acknowledge financial support from JAE-Doc programme of the Spanish National Research Council (CSIC), co-funded by the European Social Fund. This research was supported by the Junta de Andalucia through project TIC114, and the Spanish Ministry of Economy and Competitiveness (MINECO) through projects AYA2010-15169, AYA2013-42227-P, and AYA2013-43188-P.Peer Reviewe

    I. MUFFIT: A multi-filter fitting code for stellar population diagnostics

    Get PDF
    Numerical methods and codes.-- et al.[Aims]: We present MUFFIT, a new generic code optimized to retrieve the main stellar population parameters of galaxies in photometric multi-filter surveys, and check its reliability and feasibility with real galaxy data from the ALHAMBRA survey. [Methods]: Making use of an error-weighted X2-test, we compare the multi-filter fluxes of galaxies with the synthetic photometry of mixtures of two single stellar populations at different redshifts and extinctions, to provide the most likely range of stellar population parameters (mainly ages and metallicities), extinctions, redshifts, and stellar masses. To improve the diagnostic reliability, MUFFIT identifies and removes from the analysis those bands that are significantly affected by emission lines. The final parameters and their uncertainties are derived by a Monte Carlo method, using the individual photometric uncertainties in each band. Finally, we discuss the accuracies, degeneracies, and reliability of MUFFIT using both simulated and real galaxies from ALHAMBRA, comparing with results from the literature. [Results]: MUFFIT is a precise and reliable code to derive stellar population parameters of galaxies in ALHAMBRA. Using the results from photometric-redshift codes as input, MUFFIT improves the photometric-redshift accuracy by ∼10-20%. MUFFIT also detects nebular emissions in galaxies, providing physical information about their strengths. The stellar masses derived from MUFFIT show excellent agreement with the COSMOS and SDSS values. In addition, the retrieved age-metallicity locus for a sample of z ≤ 0.22 early-type galaxies in ALHAMBRA at different stellar mass bins are in very good agreement with the ones from SDSS spectroscopic diagnostics. Moreover, a one-to-one comparison between the redshifts, ages, metallicities, and stellar masses derived spectroscopically for SDSS and by MUFFIT for ALHAMBRA reveals good qualitative agreements in all the parameters, hence reinforcing the strengths of multi-filter galaxy data and optimized analysis techniques, like MUFFIT, to conduct reliable stellar population studies.L.A.D.G. acknowledges support from the "Caja Rural de Teruel" for developing this research. A.J.C. is a Ramon y Cajal Fellow of the Spanish Ministry of Science and Innovation. This work has been supported by the "Programa Nacional de Astronomia y Astrofisica" of the Spanish Ministry of Economy and Competitiveness (MINECO) under grant AYA2012-30789, as well as by FEDER funds and the Government of Aragon, through the Research Group E103. L.A.D.G. also thanks the Mullard Space Science Laboratory (MSSL) and Royal Astronomical Society (RAS) for offering the opportunity to support and develop part of this research in collaboration with I.F. MINECO grants AYA2010-15081, AYA2010-15169, AYA2010-22111-C03-01, AYA2010-22111-C03-02, AYA2011-29517-C03-01, AYA2013-40611-P, AYA2013-42227-P, AYA2013-43188-P, AYA2013-48623-C2-1, AYA2013-48623-C2-2, and AYA2014-58861-C3-1 are also acknowledged, together with Generalitat Valenciana projects Prometeo 2009/064 and PROMETEOII/2014/060, and Junta de Andalucia grants TIC114, JA2828, and P10-FQM-6444. MP acknowledges financial support from the JAE-Doc programme of the Spanish National Research Council (CSIC), co-funded by the European Social Fund.Peer Reviewe

    The ALHAMBRA survey: Bayesian photometric redshifts with 23 bands for 3 deg2

    Get PDF
    A. Molino et al.The Advance Large Homogeneous Area Medium-Band Redshift Astronomical (ALHAMBRA) survey has observed eight different regions of the sky, including sections of the Cosmic Evolution Survey (COSMOS), DEEP2, European Large-Area Infrared Space Observatory Survey (ELAIS), Great Observatories Origins Deep Survey North (GOODS-N), Sloan Digital Sky Survey (SDSS) and Groth fields using a new photometric system with 20 optical, contiguous ~300-Å filters plus the JHKs bands. The filter system is designed to optimize the effective photometric redshift depth of the survey, while having enough wavelength resolution for the identification of faint emission lines. The observations, carried out with the Calar Alto 3.5-m telescope using the wide-field optical camera Large Area Imager for Calar Alto (LAICA) and the near-infrared (NIR) instrument Omega-2000, represent a total of ~700 h of on-target science images. Here we present multicolour point-spread function (PSF) corrected photometry and photometric redshifts for ~438 000 galaxies, detected in synthetic F814W images. The catalogues are complete down to a magnitude I~24.5AB and cover an effective area of 2.79 deg2. Photometric zero-points were calibrated using stellar transformation equations and refined internally, using a new technique based on the highly robust photometric redshifts measured for emission-line galaxies. We calculate Bayesian photometric redshifts with the Bayesian Photometric Redshift (BPZ)2.0 code, obtaining a precision of δz/(1+zs)=1 per cent for I<22.5 and δz/(1+zs)=1.4 per cent for 22.5<I<24.5. The global n(z) distribution shows a mean redshift 〈z〉=0.56 for I<22.5 AB and 〈z〉=0.86 for I<24.5 AB. Given its depth and small cosmic variance, ALHAMBRA is a unique data set for galaxy evolution studies. © 2014 The Authors Published by Oxford University Press on behalf of the Royal Astronomical Society.We acknowledge financial support from the Spanish MICINN under the Consolider-Ingenio 2010 Program grant CSD2006-00070: First Science with the GTC. Part of this work was supported by Junta de Andalucía, through grant TIC-114 and the Excellence Project P08-TIC-3531, and by the Spanish Ministry for Science and Innovation through grants AYA2006-1456, AYA2010-15169, AYA2010-22111-C03-02, AYA2010-22111-C03-01 and Generalitat Valenciana project Prometeo 2009/064.Peer Reviewe

    The ALHAMBRA survey: accurate merger fractions derived by PDF analysis of photometrically close pairs

    Get PDF
    [Aims]: Our goal is to develop and test a novel methodology to compute accurate close-pair fractions with photometric redshifts. [Methods]: We improved the currently used methodologies to estimate the merger fraction fm from photometric redshifts by (i) using the full probability distribution functions (PDFs) of the sources in redshift space; (ii) including the variation in the luminosity of the sources with z in both the sample selection and the luminosity ratio constrain; and (iii) splitting individual PDFs into red and blue spectral templates to reliably work with colour selections.We tested the performance of our new methodology with the PDFs provided by the ALHAMBRA photometric survey. [Results]: The merger fractions and rates from the ALHAMBRA survey agree excellently well with those from spectroscopic work for both the general population and red and blue galaxies. With the merger rate of bright (MB ≤ -20 - 1:1z) galaxies evolving as (1 + z)n, the power-law index n is higher for blue galaxies (n = 2:7 0:5) than for red galaxies (n = 1:3 0:4), confirming previous results. Integrating the merger rate over cosmic time, we find that the average number of mergers per galaxy since z = 1 is Nm red = 0:57 0:05 for red galaxies and Nm blue = 0:26 0:02 for blue galaxies. [Conclusions]: Our new methodology statistically exploits all the available information provided by photometric redshift codes and yields accurate measurements of the merger fraction by close pairs from using photometric redshifts alone. Current and future photometric surveys will benefit from this new methodology.This work has been mainly funded by the FITE (Fondos de Inversiones de Teruel) and the projects AYA2012-30789, AYA2006-14056, and CSD2007-00060. We also acknowledge financial support from the Spanish Government grants AYA2010-15169, AYA2010-22111-C03-01, AYA2010-22111-C03-02, and AYA2013-48623-C2-2, from the Aragón Government through the Research Group E103, from the Junta de Andalucía through TIC-114 and the Excellence Project P08-TIC-03531, and from the Generalitat Valenciana through the projects Prometeo/2009/064 and PrometeoII/2014/060. A.J.C. is Ramón y Cajal fellow of the Spanish government. M.P. acknowledges the financial support from JAE-Doc program of the Spanish National Research Council (CSIC), co-funded by the European Social Fund.Peer Reviewe
    corecore