3,018 research outputs found

    Analyze Large Multidimensional Datasets Using Algebraic Topology

    Get PDF
    This paper presents an efficient algorithm to extract knowledge from high-dimensionality, high- complexity datasets using algebraic topology, namely simplicial complexes. Based on concept of isomorphism of relations, our method turn a relational table into a geometric object (a simplicial complex is a polyhedron). So, conceptually association rule searching is turned into a geometric traversal problem. By leveraging on the core concepts behind Simplicial Complex, we use a new technique (in computer science) that improves the performance over existing methods and uses far less memory. It was designed and developed with a strong emphasis on scalability, reliability, and extensibility. This paper also investigate the possibility of Hadoop integration and the challenges that come with the framework

    Computationally designed peptides for zika virus detection: An incremental construction approach

    Get PDF
    Herein, and in contrast to current production of anti-Zika virus antibodies, we propose a semi-combinatorial virtual strategy to select short peptides as biomimetic antibodies/binding agents for the detection of intact Zika virus (ZIKV) particles. The virtual approach was based on generating different docking cycles of tetra, penta, hexa, and heptapeptide libraries by maximizing the discrimination between the amino acid motif in the ZIKV and dengue virus (DENV) envelope protein glycosylation site. Eight peptides, two for each length (tetra, penta, hexa, and heptapeptide) were then synthesized and tested vs. intact ZIKV particles by using a direct enzyme linked immunosorbent assay (ELISA). As a reference, we employed a well-established anti-ZIKV antibody, the antibody 4G2. Three peptide-based assays had good detection limits with dynamic range starting from 105 copies/mL of intact ZIKV particles; this was one order magnitude lower than the other peptides or antibodies. These three peptides showed slight cross-reactivity against the three serotypes of DENV (DENV-1,-2, and-3) at a concentration of 106 copies/mL of intact virus particles, but the discrimination between the DENV and ZIKV was lost when the coating concentration was increased to 107 copies/mL of the virus. The sensitivity of the peptides was tested in the presence of two biological matrices, serum and urine diluted 1:10 and 1:1, respectively. The detection limits decreased about one order of magnitude for ZIKV detection in serum or urine, albeit still having for two of the three peptides tested a distinct analytical signal starting from 106 copies/mL, the concentration of ZIKV in acute infection

    Numerical Evaluation of Algorithmic Complexity for Short Strings: A Glance into the Innermost Structure of Randomness

    Full text link
    We describe an alternative method (to compression) that combines several theoretical and experimental results to numerically approximate the algorithmic (Kolmogorov-Chaitin) complexity of all ∑n=182n\sum_{n=1}^82^n bit strings up to 8 bits long, and for some between 9 and 16 bits long. This is done by an exhaustive execution of all deterministic 2-symbol Turing machines with up to 4 states for which the halting times are known thanks to the Busy Beaver problem, that is 11019960576 machines. An output frequency distribution is then computed, from which the algorithmic probability is calculated and the algorithmic complexity evaluated by way of the (Levin-Zvonkin-Chaitin) coding theorem.Comment: 29 pages, 5 figures. Version as accepted by the journal Applied Mathematics and Computatio

    Serial-batch scheduling – the special case of laser-cutting machines

    Get PDF
    The dissertation deals with a problem in the field of short-term production planning, namely the scheduling of laser-cutting machines. The object of decision is the grouping of production orders (batching) and the sequencing of these order groups on one or more machines (scheduling). This problem is also known in the literature as "batch scheduling problem" and belongs to the class of combinatorial optimization problems due to the interdependencies between the batching and the scheduling decisions. The concepts and methods used are mainly from production planning, operations research and machine learning

    Overlapping Community Detection in Networks: the State of the Art and Comparative Study

    Full text link
    This paper reviews the state of the art in overlapping community detection algorithms, quality measures, and benchmarks. A thorough comparison of different algorithms (a total of fourteen) is provided. In addition to community level evaluation, we propose a framework for evaluating algorithms' ability to detect overlapping nodes, which helps to assess over-detection and under-detection. After considering community level detection performance measured by Normalized Mutual Information, the Omega index, and node level detection performance measured by F-score, we reached the following conclusions. For low overlapping density networks, SLPA, OSLOM, Game and COPRA offer better performance than the other tested algorithms. For networks with high overlapping density and high overlapping diversity, both SLPA and Game provide relatively stable performance. However, test results also suggest that the detection in such networks is still not yet fully resolved. A common feature observed by various algorithms in real-world networks is the relatively small fraction of overlapping nodes (typically less than 30%), each of which belongs to only 2 or 3 communities.Comment: This paper (final version) is accepted in 2012. ACM Computing Surveys, vol. 45, no. 4, 2013 (In press) Contact: [email protected]

    Element-centric clustering comparison unifies overlaps and hierarchy

    Full text link
    Clustering is one of the most universal approaches for understanding complex data. A pivotal aspect of clustering analysis is quantitatively comparing clusterings; clustering comparison is the basis for many tasks such as clustering evaluation, consensus clustering, and tracking the temporal evolution of clusters. In particular, the extrinsic evaluation of clustering methods requires comparing the uncovered clusterings to planted clusterings or known metadata. Yet, as we demonstrate, existing clustering comparison measures have critical biases which undermine their usefulness, and no measure accommodates both overlapping and hierarchical clusterings. Here we unify the comparison of disjoint, overlapping, and hierarchically structured clusterings by proposing a new element-centric framework: elements are compared based on the relationships induced by the cluster structure, as opposed to the traditional cluster-centric philosophy. We demonstrate that, in contrast to standard clustering similarity measures, our framework does not suffer from critical biases and naturally provides unique insights into how the clusterings differ. We illustrate the strengths of our framework by revealing new insights into the organization of clusters in two applications: the improved classification of schizophrenia based on the overlapping and hierarchical community structure of fMRI brain networks, and the disentanglement of various social homophily factors in Facebook social networks. The universality of clustering suggests far-reaching impact of our framework throughout all areas of science
    • …
    corecore