211 research outputs found
Efficiently Clustering Very Large Attributed Graphs
Attributed graphs model real networks by enriching their nodes with
attributes accounting for properties. Several techniques have been proposed for
partitioning these graphs into clusters that are homogeneous with respect to
both semantic attributes and to the structure of the graph. However, time and
space complexities of state of the art algorithms limit their scalability to
medium-sized graphs. We propose SToC (for Semantic-Topological Clustering), a
fast and scalable algorithm for partitioning large attributed graphs. The
approach is robust, being compatible both with categorical and with
quantitative attributes, and it is tailorable, allowing the user to weight the
semantic and topological components. Further, the approach does not require the
user to guess in advance the number of clusters. SToC relies on well known
approximation techniques such as bottom-k sketches, traditional graph-theoretic
concepts, and a new perspective on the composition of heterogeneous distance
measures. Experimental results demonstrate its ability to efficiently compute
high-quality partitions of large scale attributed graphs.Comment: This work has been published in ASONAM 2017. This version includes an
appendix with validation of our attribute model and distance function,
omitted in the converence version for lack of space. Please refer to the
published versio
On defining rules for cancer data fabrication
Funding: This research is partially funded by the Data Lab, and the EU H2020 project Serums: Securing Medical Data in Smart Patient-Centric Healthcare Systems (grant 826278).Data is essential for machine learning projects, and data accuracy is crucial for being able to trust the results obtained from the associated machine learning models. Previously, we have developed machine learning models for predicting the treatment outcome for breast cancer patients that have undergone chemotherapy, and developed a monitoring system for their treatment timeline showing interactively the options and associated predictions. Available cancer datasets, such as the one used earlier, are often too small to obtain significant results, and make it difficult to explore ways to improve the predictive capability of the models further. In this paper, we explore an alternative to enhance our datasets through synthetic data generation. From our original dataset, we extract rules to generate fabricated data that capture the different characteristics inherent in the dataset. Additional rules can be used to capture general medical knowledge. We show how to formulate rules for our cancer treatment data, and use the IBM solver to obtain a corresponding synthetic dataset. We discuss challenges for future work.Postprin
Outlier Edge Detection Using Random Graph Generation Models and Applications
Outliers are samples that are generated by different mechanisms from other
normal data samples. Graphs, in particular social network graphs, may contain
nodes and edges that are made by scammers, malicious programs or mistakenly by
normal users. Detecting outlier nodes and edges is important for data mining
and graph analytics. However, previous research in the field has merely focused
on detecting outlier nodes. In this article, we study the properties of edges
and propose outlier edge detection algorithms using two random graph generation
models. We found that the edge-ego-network, which can be defined as the induced
graph that contains two end nodes of an edge, their neighboring nodes and the
edges that link these nodes, contains critical information to detect outlier
edges. We evaluated the proposed algorithms by injecting outlier edges into
some real-world graph data. Experiment results show that the proposed
algorithms can effectively detect outlier edges. In particular, the algorithm
based on the Preferential Attachment Random Graph Generation model consistently
gives good performance regardless of the test graph data. Further more, the
proposed algorithms are not limited in the area of outlier edge detection. We
demonstrate three different applications that benefit from the proposed
algorithms: 1) a preprocessing tool that improves the performance of graph
clustering algorithms; 2) an outlier node detection algorithm; and 3) a novel
noisy data clustering algorithm. These applications show the great potential of
the proposed outlier edge detection techniques.Comment: 14 pages, 5 figures, journal pape
Ecological indicators to capture the effects of fishing on biodiversityand conservation status of marine ecosystems
IndiSeas (“Indicators for the Seas”) is a collaborative international working group that was established in2005 to evaluate the status of exploited marine ecosystems using a suite of indicators in a comparative framework. An initial shortlist of seven ecological indicators was selected to quantify the effects of fishing on the broader ecosystem using several criteria (i.e., ecological meaning, sensitivity to fishing, data avail-ability, management objectives and public awareness). The suite comprised: (i) the inverse coefficient of variation of total biomass of surveyed species, (ii) mean fish length in the surveyed community, (iii)mean maximum life span of surveyed fish species, (iv) proportion of predatory fish in the surveyed community, (v) proportion of under and moderately exploited stocks, (vi) total biomass of surveyed species,and (vii) mean trophic level of the landed catch. In line with the Nagoya Strategic Plan of the Convention on Biological Diversity (2011–2020), we extended this suite to emphasize the broader biodiversity and conservation risks in exploited marine ecosystems. We selected a subset of indicators from a list of empirically based candidate biodiversity indicators initially established based on ecological significance to complement the original IndiSeas indicators. The additional selected indicators were: (viii) mean intrinsic vulnerability index of the fish landed catch, (ix) proportion of non-declining exploited species in the surveyed community, (x) catch-based marine trophic index, and (xi) mean trophic level of the surveyed community. Despite the lack of data in some ecosystems, we also selected (xii) mean trophic level of the modelled community, and (xiii) proportion of discards in the fishery as extra indicators. These additional indicators were examined, along with the initial set of IndiSeas ecological indicators, to evaluate whether adding new biodiversity indicators provided useful additional information to refine our under-standing of the status evaluation of 29 exploited marine ecosystems. We used state and trend analyses,and we performed correlation, redundancy and multivariate tests. Existing developments in ecosystem-based fisheries management have largely focused on exploited species. Our study, using mostly fisheries independent survey-based indicators, highlights that biodiversity and conservation-based indicators are complementary to ecological indicators of fishing pressure. Thus, they should be used to provide additional information to evaluate the overall impact of fishing on exploited marine ecosystems
Ecological indicators to capture the effects of fishing on biodiversityand conservation status of marine ecosystems
IndiSeas (“Indicators for the Seas”) is a collaborative international working group that was established in2005 to evaluate the status of exploited marine ecosystems using a suite of indicators in a comparative framework. An initial shortlist of seven ecological indicators was selected to quantify the effects of fishing on the broader ecosystem using several criteria (i.e., ecological meaning, sensitivity to fishing, data avail-ability, management objectives and public awareness). The suite comprised: (i) the inverse coefficient of variation of total biomass of surveyed species, (ii) mean fish length in the surveyed community, (iii)mean maximum life span of surveyed fish species, (iv) proportion of predatory fish in the surveyed community, (v) proportion of under and moderately exploited stocks, (vi) total biomass of surveyed species,and (vii) mean trophic level of the landed catch. In line with the Nagoya Strategic Plan of the Convention on Biological Diversity (2011–2020), we extended this suite to emphasize the broader biodiversity and conservation risks in exploited marine ecosystems. We selected a subset of indicators from a list of empirically based candidate biodiversity indicators initially established based on ecological significance to complement the original IndiSeas indicators. The additional selected indicators were: (viii) mean intrinsic vulnerability index of the fish landed catch, (ix) proportion of non-declining exploited species in the surveyed community, (x) catch-based marine trophic index, and (xi) mean trophic level of the surveyed community. Despite the lack of data in some ecosystems, we also selected (xii) mean trophic level of the modelled community, and (xiii) proportion of discards in the fishery as extra indicators. These additional indicators were examined, along with the initial set of IndiSeas ecological indicators, to evaluate whether adding new biodiversity indicators provided useful additional information to refine our under-standing of the status evaluation of 29 exploited marine ecosystems. We used state and trend analyses,and we performed correlation, redundancy and multivariate tests. Existing developments in ecosystem-based fisheries management have largely focused on exploited species. Our study, using mostly fisheries independent survey-based indicators, highlights that biodiversity and conservation-based indicators are complementary to ecological indicators of fishing pressure. Thus, they should be used to provide additional information to evaluate the overall impact of fishing on exploited marine ecosystems
Effectiveness of septoplasty versus non-surgical management for nasal obstruction due to a deviated nasal septum in adults: study protocol for a randomized controlled trial
The iPlant Collaborative: Cyberinfrastructure for Plant Biology
The iPlant Collaborative (iPlant) is a United States National Science Foundation (NSF) funded project that aims to create an innovative, comprehensive, and foundational cyberinfrastructure in support of plant biology research (PSCIC, 2006). iPlant is developing cyberinfrastructure that uniquely enables scientists throughout the diverse fields that comprise plant biology to address Grand Challenges in new ways, to stimulate and facilitate cross-disciplinary research, to promote biology and computer science research interactions, and to train the next generation of scientists on the use of cyberinfrastructure in research and education. Meeting humanity's projected demands for agricultural and forest products and the expectation that natural ecosystems be managed sustainably will require synergies from the application of information technologies. The iPlant cyberinfrastructure design is based on an unprecedented period of research community input, and leverages developments in high-performance computing, data storage, and cyberinfrastructure for the physical sciences. iPlant is an open-source project with application programming interfaces that allow the community to extend the infrastructure to meet its needs. iPlant is sponsoring community-driven workshops addressing specific scientific questions via analysis tool integration and hypothesis testing. These workshops teach researchers how to add bioinformatics tools and/or datasets into the iPlant cyberinfrastructure enabling plant scientists to perform complex analyses on large datasets without the need to master the command-line or high-performance computational services
Antibody to intermediate filaments of the cytoskleton in patients with Behçet's disease
PubMedID: 3780056Antibodies to 10-nm intermediate filaments (anti-IF) were determined in the sera of 30 patients with Behçet's disease (BD), in addition to C-reactive protein and C9, and a attempt has been made to determine whether the presence of anti-IF indicate disease activity. The vimentin type of anti-IF was found to be positive in 14 out of 30 patients with BD (47%), whereas it was positive in 35% of the patients with rheumatoid arthritis (20 cases), 16% of the patients with systemic lupus erythematosus (19 cases) and in only 9% of the normal controls. The anti-IF were predominantly IgG class and the titers in BD were significantly higher than those in normal controls. Out of the 14 patients with anti-IF, 10 showed significantly increased levels of serum C9 and 8 showed increased levels of CRP activity. Only one patient showed increased C9, but was negative for anti-IF and CRP. The presence of anti-IF in the patients' sera was found to be a more sensitive indicator, though not specific, for the clinical assessment of disease activity. © 1986
- …