68 research outputs found

    Comparing the hierarchy of author given tags and repository given tags in a large document archive

    Full text link
    Folksonomies - large databases arising from collaborative tagging of items by independent users - are becoming an increasingly important way of categorizing information. In these systems users can tag items with free words, resulting in a tripartite item-tag-user network. Although there are no prescribed relations between tags, the way users think about the different categories presumably has some built in hierarchy, in which more special concepts are descendants of some more general categories. Several applications would benefit from the knowledge of this hierarchy. Here we apply a recent method to check the differences and similarities of hierarchies resulting from tags given by independent individuals and from tags given by a centrally managed repository system. The results from out method showed substantial differences between the lower part of the hierarchies, and in contrast, a relatively high similarity at the top of the hierarchies.Comment: 10 page

    Extracting tag hierarchies

    Get PDF
    Tagging items with descriptive annotations or keywords is a very natural way to compress and highlight information about the properties of the given entity. Over the years several methods have been proposed for extracting a hierarchy between the tags for systems with a "flat", egalitarian organization of the tags, which is very common when the tags correspond to free words given by numerous independent people. Here we present a complete framework for automated tag hierarchy extraction based on tag occurrence statistics. Along with proposing new algorithms, we are also introducing different quality measures enabling the detailed comparison of competing approaches from different aspects. Furthermore, we set up a synthetic, computer generated benchmark providing a versatile tool for testing, with a couple of tunable parameters capable of generating a wide range of test beds. Beside the computer generated input we also use real data in our studies, including a biological example with a pre-defined hierarchy between the tags. The encouraging similarity between the pre-defined and reconstructed hierarchy, as well as the seemingly meaningful hierarchies obtained for other real systems indicate that tag hierarchy extraction is a very promising direction for further research with a great potential for practical applications.Comment: 25 pages with 21 pages of supporting information, 25 figure

    Detecting and classifying lesions in mammograms with Deep Learning

    Get PDF
    In the last two decades Computer Aided Diagnostics (CAD) systems were developed to help radiologists analyze screening mammograms. The benefits of current CAD technologies appear to be contradictory and they should be improved to be ultimately considered useful. Since 2012 deep convolutional neural networks (CNN) have been a tremendous success in image recognition, reaching human performance. These methods have greatly surpassed the traditional approaches, which are similar to currently used CAD solutions. Deep CNN-s have the potential to revolutionize medical image analysis. We propose a CAD system based on one of the most successful object detection frameworks, Faster R-CNN. The system detects and classifies malignant or benign lesions on a mammogram without any human intervention. The proposed method sets the state of the art classification performance on the public INbreast database, AUC = 0.95 . The approach described here has achieved the 2nd place in the Digital Mammography DREAM Challenge with AUC = 0.85 . When used as a detector, the system reaches high sensitivity with very few false positive marks per image on the INbreast dataset. Source code, the trained model and an OsiriX plugin are availaible online at https://github.com/riblidezso/frcnn_cad

    Komplex hálózatok szerkezete és dinamikája = Structure and dynamics of complex networks

    Get PDF
    A komplex rendszerek tanulmányozásának jelenleg legsikeresebb eszköze a hálózati megközelítés. Az elméleti leírás kereteit tágítottuk azzal, hogy fogalmakat általánosítottunk a súlyozott hálózatok esetére, részletesen elemeztük a modulok meghatározásához használt algoritmusokat, új módszert dolgoztunk ki, valamint elemeztük az eljárások korlátait. A tőzsdei adatok példáján a korrelációs mátrix hatékony zajmentesítési lehetőségeit taulmányoztuk. Kommunikációs adatok elemzésével először sikerült a szociális hálózatra vonatkozó Granovetter-hipotézist, (""a gyenge kötések ereje"") társadalmi méretekben igazolni, és ennek alapján működő modellt konstruálni. A hálózatokon zajló dinamikai jelenségek közül a terjedés az egyik legfontosabb. Vizsgáltuk, hogyan hat a topológia és az élsúlyok kapcsolata az ilyen jelenségekre és mi a katasztrofális kaszkádok mechanizmusa. Bebizonyítottuk, hogy az emberi viselkedés rendkívül inhomogén jellege lényegesen befolyásolja az információterjedés sebességét. Vizsgálatainkból azt a következtetést lehet levonni, hogy annak ellenére, hogy nagyon különböző hálózatok meglepően hasonló sajátosságokat mutathatnak, működési szempontból igen eltérő optimalizációs elveknek felelnek meg. Végül megmutattuk, hogy a komplex hálózatokon, de általában a komplex rendszerekben lezajló dinamika általánosan mutatja a fluktuációs skálázást, elemeztük ennek lehetséges okait, valamint az egyszerű skálázáson túlmutató jelenségeket. | The network approach is presently the most efficient tool to study complex systems. We broadened the framework of theoretical description by generalizing concepts to the case of weighted networks, analyzing in detail community detection algorithms, constructing a new detection method and analyzed the limitations of the procedures. On the example of stock market data we studied the possibilities of denoising efficiently the correlation matrix. Using communication data we proved for the first time on a societal scale the Granovetter hypothesis (""The strength of weak ties"") on the social network. One of the most important dynamic phenomena on networks is that of spreading. We investigated how the topology and its relation to the link weights affect such phenomena and what is the mechanism of catastrophic cascades. We proved that the inhomogeneous, bursty character of human behavior substantially influences the speed of spreading of information. We can conclude from our investigations that in spite of the fact that very different networks may show surprisingly similar properties, they obey very different optimization principles from the point of view of their functioning. Finally, we showed that dynamics in complex networks but in complex systems in general shows fluctuation scaling, we analyzed the possible origins and the phenomena, which go beyond simple scaling

    Ontologies and tag-statistics

    Get PDF
    Due to the increasing popularity of collaborative tagging systems, the research on tagged networks, hypergraphs, ontologies, folksonomies and other related concepts is becoming an important interdisciplinary topic with great actuality and relevance for practical applications. In most collaborative tagging systems the tagging by the users is completely "flat", while in some cases they are allowed to define a shallow hierarchy for their own tags. However, usually no overall hierarchical organisation of the tags is given, and one of the interesting challenges of this area is to provide an algorithm generating the ontology of the tags from the available data. In contrast, there are also other type of tagged networks available for research, where the tags are already organised into a directed acyclic graph (DAG), encapsulating the "is a sub-category of" type of hierarchy between each other. In this paper we study how this DAG affects the statistical distribution of tags on the nodes marked by the tags in various real networks. We analyse the relation between the tag-frequency and the position of the tag in the DAG in two large sub-networks of the English Wikipedia and a protein-protein interaction network. We also study the tag co-occurrence statistics by introducing a 2d tag-distance distribution preserving both the difference in the levels and the absolute distance in the DAG for the co-occurring pairs of tags. Our most interesting finding is that the local relevance of tags in the DAG, (i.e., their rank or significance as characterised by, e.g., the length of the branches starting from them) is much more important than their global distance from the root. Furthermore, we also introduce a simple tagging model based on random walks on the DAG, capable of reproducing the main statistical features of tag co-occurrence.Comment: Submitted to New Journal of Physic

    Komplex Hálózatok Moduláris Szerkezete = Modular Structure of Complex Networks

    Get PDF
    Kidolgoztunk egy módszert, mely lehetővé teszi időben változó hálózatokban a csoportok nyomon követését. A csoportok időfejlődését nagyméretű társaskapcsolat hálózatokban vizsgáltuk és több érdekes összefüggést találtunk a csoportok mérete, időbeli változékonysága és fennmaradási valószínűsége között. Kiterjesztettük a klikk perkolációs módszert irányított- és súlyozott hálózatokra. Ezek segítségével számos nagyméretű valós hálózatot vizsgáltunk. Az irányított csoportosulások viselkedése két nagy osztályba sorolta a vizsgált rendszereket, a súlyozott hálózatoknál pedig érdekes élsúlyok korrelációkat fedtünk fel. A mikroRNS-ek és az általuk gátolt mRNS-ek hálózatát vizsgálva a klikk perkolációs módszer segítségével mikroRNS funkciós csoportokat sikerült beazonosítani, és a sejten belüli jelátviteli hálózatokban gyógyszer célpont fehérjék előrejelzéséhez fejlesztettünk bioinformatikai módszereket. A hálózati hierarchiához kapcsolódóan címkézett hálózatok statisztikai tulajdonságait vizsgálatuk olyan rendszerekben, ahol a címkék maguk is hierarchikusan szerveződnek. Eredményeink szerint a tanulmányozott hálózatok érdekes önhasonlóságot mutatnak a címke indukált részgráfokra történő leszűkítés esetén. A hierarchia tanulmányozásához kapcsolódóan kifejlesztettünk egy önhasonló, hierarchikus multifraktál élbekötési mértéken alapuló véletlen gráf generáló módszert. Megmutattuk, hogy ennek segítségével nagyon sokféle eltérő véletlen hálózat generálható le. | We developed a method enabling the tracking of communities in time evolving networks. We studied the statistical properties of community evolution in large social networks, and revealed interesting non trivial relations between the size, stationarity and survival probability of communities. We extended the clique percolation method for handling directed- and weighted networks, and analyzed numerous real networks with these new algorithms. The behavior of the directed communities classified the examined systems into two major groups, whereas the studies of the weighted networks revealed interesting link weight correlations. We located functional units with the help of the clique percolation method in the network of microRNAs and their regulated mRNAs, and developed bioinformatical tools for signal transduction networks, helping the prediction of drug target proteins. Relating to the field of network hierarchy, we studied the statistical features of tagged networks where the tags were hierarchically organized. According to our results, the examined networks showed an interesting self similarity when restricted to the tag-induced sub-graphs. Relating to the studies of hierarchy, we developed a random graph generator based on self-similar, hierarchical multifractal link probability measure. We have shown, that this method is capable of generating random networks with very diverse properties

    Hierarchical networks of scientific journals

    Get PDF
    Academic journals are the repositories of mankind’s gradually accumulating knowledge of the surrounding world. Just as knowledge is organized into classes ranging from major disciplines, subjects and fields, to increasingly specific topics, journals can also be categorized into groups using various metric. In addition, they can be ranked according to their overall influence. However, according to recent studies, the impact, prestige and novelty of journals cannot be characterized by a single parameter such as, for example, the impact factor. To increase understanding of journal impact, the knowledge gap we set out to explore in our study is the evaluation of journal relevance using complex multi-dimensional measures. Thus, for the first time, our objective is to organize journals into multiple hierarchies based on citation data. The two approaches we use are designed to address this problem from different perspectives. We use a measure related to the notion of m- reaching centrality and find a network that shows a journal’s level of influence in terms of the direction and efficiency with which information spreads through the network. We find we can also obtain an alternative network using a suitably modified nested hierarchy extraction method applied to the same data. In this case, in a self-organized way, the journals become branches according to the major scientific fields, where the local structure of the branches reflect the hierarchy within the given field, with usually the most prominent journal (according to other measures) in the field chosen by the algorithm as the local root, and more specialized journals positioned deeper in the branch. This can make the navigation within different scientific fields and sub- fields very simple, and equivalent to navigating in the different branches of the nested hierarchy. We expect this to be particularly helpful, for example, when choosing the most appropriate journal for a given manuscript. According to our results, the two alternative hierarchies show a somewhat different, but also consistent, picture of the intricate relations between scientific journals, and, as such, they also provide a new perspective on how scientific knowledge is organized into networks
    • …
    corecore