4,357 research outputs found

    A Robust Localization System for Inspection Robots in Sewer Networks †

    Get PDF
    Sewers represent a very important infrastructure of cities whose state should be monitored periodically. However, the length of such infrastructure prevents sensor networks from being applicable. In this paper, we present a mobile platform (SIAR) designed to inspect the sewer network. It is capable of sensing gas concentrations and detecting failures in the network such as cracks and holes in the floor and walls or zones were the water is not flowing. These alarms should be precisely geo-localized to allow the operators performing the required correcting measures. To this end, this paper presents a robust localization system for global pose estimation on sewers. It makes use of prior information of the sewer network, including its topology, the different cross sections traversed and the position of some elements such as manholes. The system is based on a Monte Carlo Localization system that fuses wheel and RGB-D odometry for the prediction stage. The update step takes into account the sewer network topology for discarding wrong hypotheses. Additionally, the localization is further refined with novel updating steps proposed in this paper which are activated whenever a discrete element in the sewer network is detected or the relative orientation of the robot over the sewer gallery could be estimated. Each part of the system has been validated with real data obtained from the sewers of Barcelona. The whole system is able to obtain median localization errors in the order of one meter in all cases. Finally, the paper also includes comparisons with state-of-the-art Simultaneous Localization and Mapping (SLAM) systems that demonstrate the convenience of the approach.Unión Europea ECHORD ++ 601116Ministerio de Ciencia, Innovación y Universidades de España RTI2018-100847-B-C2

    Personal Email Spam Filtering with Minimal User Interaction

    Get PDF
    This thesis investigates ways to reduce or eliminate the necessity of user input to learning-based personal email spam filters. Personal spam filters have been shown in previous studies to yield superior effectiveness, at the cost of requiring extensive user training which may be burdensome or impossible. This work describes new approaches to solve the problem of building a personal spam filter that requires minimal user feedback. An initial study investigates how well a personal filter can learn from different sources of data, as opposed to user’s messages. Our initial studies show that inter-user training yields substantially inferior results to intra-user training using the best known methods. Moreover, contrary to previous literature, it is found that transfer learning degrades the performance of spam filters when the source of training and test sets belong to two different users or different times. We also adapt and modify a graph-based semi-supervising learning algorithm to build a filter that can classify an entire inbox trained on twenty or fewer user judgments. Our experiments show that this approach compares well with previous techniques when trained on as few as two training examples. We also present the toolkit we developed to perform privacy-preserving user studies on spam filters. This toolkit allows researchers to evaluate any spam filter that conforms to a standard interface defined by TREC, on real users’ email boxes. Researchers have access only to the TREC-style result file, and not to any content of a user’s email stream. To eliminate the necessity of feedback from the user, we build a personal autonomous filter that learns exclusively on the result of a global spam filter. Our laboratory experiments show that learning filters with no user input can substantially improve the results of open-source and industry-leading commercial filters that employ no user-specific training. We use our toolkit to validate the performance of the autonomous filter in a user study

    Large-Scale Automatic Reconstruction of Neuronal Processes from Electron Microscopy Images

    Full text link
    Automated sample preparation and electron microscopy enables acquisition of very large image data sets. These technical advances are of special importance to the field of neuroanatomy, as 3D reconstructions of neuronal processes at the nm scale can provide new insight into the fine grained structure of the brain. Segmentation of large-scale electron microscopy data is the main bottleneck in the analysis of these data sets. In this paper we present a pipeline that provides state-of-the art reconstruction performance while scaling to data sets in the GB-TB range. First, we train a random forest classifier on interactive sparse user annotations. The classifier output is combined with an anisotropic smoothing prior in a Conditional Random Field framework to generate multiple segmentation hypotheses per image. These segmentations are then combined into geometrically consistent 3D objects by segmentation fusion. We provide qualitative and quantitative evaluation of the automatic segmentation and demonstrate large-scale 3D reconstructions of neuronal processes from a 27,000\mathbf{27,000} μm3\mathbf{\mu m^3} volume of brain tissue over a cube of 30  μm\mathbf{30 \; \mu m} in each dimension corresponding to 1000 consecutive image sections. We also introduce Mojo, a proofreading tool including semi-automated correction of merge errors based on sparse user scribbles

    Applications Of Machine Learning In Biology And Medicine

    Get PDF
    Machine learning as a field is defined to be the set of computational algorithms that improve their performance by assimilating data. As such, the field as a whole has found applications in many diverse disciplines from robotics and communication in engineering to economics and finance, and also biology and medicine. It should not come as a surprise that many popular methods in use today have completely different origins. Despite this heterogeneity, different methods can be divided into standard tasks, such as supervised, unsupervised, semi-supervised and reinforcement learning. Although machine learning as a field can be formalized as methods trying to solve certain standard tasks, applying these tasks on datasets from different fields comes with certain caveats, and sometimes is fraught with challenges. In this thesis, we develop general procedures and novel solutions, dealing with practical problems that arise when modeling biological and medical data. Cost sensitive learning is an important area of research in machine learning which addresses the widespread and practical problem of dealing with different costs during the learning and deployment of classification algorithms. In many applications such as credit fraud detection, network intrusion and specifically medical diagnosis domains, prior class distributions are highly skewed, which makes the training examples very much unbalanced. Combining this with uneven misclassification costs renders standard machine learning approaches useless in learning an acceptable decision function. We experimentally show the benefits and shortcomings of various methods that convert cost blind learning algorithms to cost sensitive ones. Using the results and best practices found for cost sensitive learning, we design and develop a machine learning approach to ontology mapping. Next, we present a novel approach to deal with uncertainty in classification when costs are unknown or otherwise hard to assign. Support Vector Machines (SVM) are considered to be among the most successful approaches for classification. However prediction of instances near the decision boundary depends more on the specific parameter selection or noise in data, rather than a clear difference in features. In many applications such as medical diagnosis, these regions should be labeled as uncertain rather than assigned to any particular class. Furthermore, instances may belong to novel disease subtypes that are not from any previously known class. In such applications, declining to make a prediction could be beneficial when more powerful but expensive tests are available. We develop a novel approach for optimal selection of the threshold and show its successful application on three biological and medical datasets. The last part of this thesis provides novel solutions for handling high dimensional data. Although high-dimensional data is ubiquitously found in many disciplines, current life science research almost always involves high-dimensional genomics/proteomics data. The ``omics\u27\u27 data provide a wealth of information and have changed the research landscape in biology and medicine. However, these data are plagued with noise, redundancy and collinearity, which makes the discovery process very difficult and costly. Any method that can accurately detect irrelevant and noisy variables in omics data would be highly valuable. We present Robust Feature Selection (RFS), a randomized feature selection approach dedicated to low-sample high-dimensional data. RFS combines an embedded feature selection method with a randomization procedure for stability. Recent advances in sparse recovery and estimation methods have provided efficient and asymptotically consistent feature selection algorithms. However, these methods lack finite sample error control due to instability. Furthermore, the chances of correct recovery diminish with more collinearity among features. To overcome these difficulties, RFS uses a randomization procedure to provide an accurate and stable feature selection method. We thoroughly evaluate RFS by comparing it to a number of popular univariate and multivariate feature selection methods and show marked prediction accuracy improvement of a diagnostic signature, while preserving a good stability
    • …
    corecore