7 research outputs found

    Learning from networked examples

    Get PDF
    Many machine learning algorithms are based on the assumption that training examples are drawn independently. However, this assumption does not hold anymore when learning from a networked sample because two or more training examples may share some common objects, and hence share the features of these shared objects. We show that the classic approach of ignoring this problem potentially can have a harmful effect on the accuracy of statistics, and then consider alternatives. One of these is to only use independent examples, discarding other information. However, this is clearly suboptimal. We analyze sample error bounds in this networked setting, providing significantly improved results. An important component of our approach is formed by efficient sample weighting schemes, which leads to novel concentration inequalities

    Detección de patrones de personas desaparecidas mediante técnicas de aprendizaje no supervisado

    Get PDF
    La desaparición de personas es una de las preocupaciones principales tanto a nivel nacional como mundial, estás se pueden dar debido a la trata de personas, tráfico de órganos, entre otros. Dentro de los grupos de personas desaparecidas existe uno cuyas características alertan más a la sociedad, por lo cual requieren una respuesta más rápida y eficiente; a este grupo se le denomina personas en situación de vulnerabilidad y está conformado por niños, niñas, adolescentes, personas adultas mayores y personas con discapacidad física, mental o sensorial. El aprendizaje no supervisado por otro lado forma parte del aprendizaje automático que a su vez es parte del campo de la Inteligencia Artificial, esta rama busca recolectar o generar conocimiento a través de la información albergada en los datos sin la necesidad de etiquetarlos. Los algoritmos de aprendizaje no supervisado cotidianamente son parte de soluciones tecnológicas que permiten segmentar o descubrir patrones de un conjunto de datos. Dichos patrones han servido a múltiples campos para desarrollar estrategias focalizadas por grupo, incrementando así la eficacia de los procesos que se encargan de combatir una problemática determinada. Los datos recolectados de menores desaparecidos contienen múltiples atributos como: edad, genero, raza, color de ojos, color de cabello, tipo de nariz, tipo de boca, etc. Entre estos campos solo existe una etiqueta cuyo valor puede ser “desaparecido” o “encontrado”, esta etiqueta no solventa la aplicación de técnicas de aprendizaje supervisado; debido a esto se opto por utilizar técnicas de aprendizaje no supervisado que surgen como una alternativa viable para analizar los datos. Además, este tipo de aprendizaje debido a su enfoque que no requiere de etiquetas en los datos disminuye el costo de recursos. Por esta razón la investigación busca describir o mostrar conocimiento sobre los patrones que puedan ser detectados dentro del conjunto de datos haciendo uso de las técnicas de aprendizaje no supervisado. Por consiguiente, para aplicar las técnicas de aprendizaje no supervisado primero fue necesario extraer todos los datos albergados en la página web utilizando la técnica de web scraping que nos permitió obtener todos los datos sobre el perfil del menor. También, debido a que el conjunto de datos recolectado contenía inconsistencias entre sus registros, se preprocesaron con técnicas del proceso KDD para obtener la mayor cantidad de registros validos para el estudio. Finalmente, el análisis de los datos se llevo a cabo variando entre múltiples números de clústeres determinados por el método del codo, para así pasarlos al algoritmo k-means y así determinar mediante métricas de validación la cantidad adecuada para el conjunto de datos.The disappearance of people is one of the main concerns both nationally and globally, these can occur due to human trafficking, organ trafficking, among others. Within the groups of disappeared persons there is one whose characteristics alert society more, for which they require a faster and more efficient response; This group is called people in vulnerable situations and is made up of boys, girls, adolescents, older adults and people with physical, mental or sensory disabilities. Unsupervised learning on the other hand is part of machine learning which in turn is part of the field of Artificial Intelligence, this branch seeks to collect or generate knowledge through the information stored in the data without the need to label it. Unsupervised learning algorithms daily are part of technological solutions that allow you to segment or discover patterns in a data set. These patterns have served multiple fields to develop group strategies, thus increasing the effectiveness of the processes that are responsible for combating a specific problem. The data collected from missing minors contains multiple attributes such as: age, sex, race, eye color, hair color, type of nose, type of mouth, etc. Among these fields there is only one label whose value can "disappear" or "found". This label does not address the application of supervised learning techniques; Due to this, it was decided to use unsupervised learning techniques that emerge as a viable alternative to analyze the data. In addition, this type of learning due to its approach that does not require labels on the data reduces the cost of resources. For this reason, the research seeks to describe or show knowledge about the patterns that can be detected within the data set using unsupervised learning techniques. Therefore, to apply unsupervised learning techniques, it was first necessary to extract all the data stored in the web page using the web scraping technique that allowed us to obtain all the data from the child's profile. Furthermore, since the collected data set contained inconsistencies between their records, they were preprocessed with KDD processing techniques to obtain the largest number of valid records for the study. Finally, the data analysis was performed by varying between multiple numbers of clusters determined by the elbow method, in order to pass them to the k-means algorithm and thus determine the appropriate amount for the data set through validation metrics

    Learning with Partially Labeled and Interdependent Data

    No full text
    International audienceThis book develops two key machine learning principles: the semi-supervised paradigm and learning with interdependent data. It reveals new applications, primarily web related, that transgress the classical machine learning framework through learning with interdependent data.The book traces how the semi-supervised paradigm and the learning to rank paradigm emerged from new web applications, leading to a massive production of heterogeneous textual data. It explains how semi-supervised learning techniques are widely used, but only allow a limited analysis of the information content and thus do not meet the demands of many web-related tasks.Later chapters deal with the development of learning methods for ranking entities in a large collection with respect to precise information needed. In some cases, learning a ranking function can be reduced to learning a classification function over the pairs of examples. The book proves that this task can be efficiently tackled in a new framework: learning with interdependent data.Researchers and professionals in machine learning will find these new perspectives and solutions valuable. Learning with Partially Labeled and Interdependent Data is also useful for advanced-level students of computer science, particularly those focused on statistics and learning
    corecore