597 research outputs found
High-quality Web information provisioning and quality-based data pricing
Today, information can be considered a production factor. This is attributed to the technological innovations the Internet and the Web have brought about. Now, a plethora of information is available making it hard to find the most relevant information. Subsequently, the issue of finding and purchasing high-quality data arises. Addressing these challenges, this work first examines how high-quality information provisioning can be achieved with an approach called WiPo that exploits the idea of curation, i. e., the selection, organisation, and provisioning of information with human involvement. The second part of this work investigates the issue that there is little understanding of what the value of data is and how it can be priced â despite the fact that it is already being traded on data marketplaces. To overcome this, a pricing approach based on the Multiple-Choice Knapsack Problem is proposed that allows for utility maximisation for customers and profit maximisation for vendors
The eNanoMapper database for nanomaterial safety information
Background: The NanoSafety Cluster, a cluster of projects funded by the European Commision, identified the need for a computational infrastructure for toxicological data management of engineered nanomaterials (ENMs). Ontologies, open standards, and interoperable designs were envisioned to empower a harmonized approach to European research in nanotechnology. This setting provides a number of opportunities and challenges in the representation of nanomaterials data and the integration of ENM information originating from diverse systems. Within this cluster, eNanoMapper works towards supporting the collaborative safety assessment for ENMs by creating a modular and extensible infrastructure for data sharing, data analysis, and building computational toxicology models for ENMs.
Results: The eNanoMapper database solution builds on the previous experience of the consortium partners in supporting diverse data through flexible data storage, open source components and web services. We have recently described the design of the eNanoMapper prototype database along with a summary of challenges in the representation of ENM data and an extensive review of existing nano-related data models, databases, and nanomaterials-related entries in chemical and toxicogenomic databases. This paper continues with a focus on the database functionality exposed through its application programming interface (API), and its use in visualisation and modelling. Considering the preferred community practice of using spreadsheet templates, we developed a configurable spreadsheet parser facilitating user friendly data preparation and data upload. We further present a web application able to retrieve the experimental data via the API and analyze it with multiple data preprocessing and machine learning algorithms.
Conclusion: We demonstrate how the eNanoMapper database is used to import and publish online ENM and assay data from several data sources, how the ârepresentational state transferâ (REST) API enables building user friendly interfaces and graphical summaries of the data, and how these resources facilitate the modelling of reproducible quantitative structureâactivity relationships for nanomaterials (NanoQSAR)
The first IEEE workshop on the Future of Research Curation and Research Reproducibility
This report describes perspectives from the Workshop on the Future of Research Curation and Research Reproducibility that was collaboratively sponsored by the U.S. National Science Foundation (NSF) and IEEE (Institute of Electrical and Electronics Engineers) in November 2016. The workshop brought together stakeholders including researchers, funders, and notably, leading science, technology, engineering, and mathematics (STEM) publishers. The overarching objective was a deep dive into new kinds of research products and how the costs of creation and curation of these products can be sustainably borne by the agencies, publishers, and researcher communities that were represented by workshop participants.National Science Foundation Award #164101
Capturing mobile security policies precisely
The security policies of mobile devices that describe how we should use
these devices are often informally specified. Users have preferences for some
apps over others. Some users may avoid apps which can access large amounts
of their personal data, whilst others may not care. A user is unlikely to write
down these policies or describe them using a formal policy language. This is
unfortunate as without a formal description of the policy we cannot precisely
reason about them. We cannot help users to pick the apps they want if we
cannot describe their policies.
Companies have mobile security policies that definehowan employee should
use smart phone devices and tablet computers from home at work. A company
might describe the policy in a natural language document for employees to
read and agree to. They might also use some software installed on employeeâs
devices to enforce the company rules. Without a link between the specification
of the policy in the natural language document and the implementation of the
policy with the tool, understanding how they are related can be hard.
This thesis looks at developing an authorisation logic, called AppPAL, to
capture the informal security policies of the mobile ecosystem, which we define
as the interactions surrounding the use of mobile devices in a particular setting.
This includes the policies of the users, the devices, the app stores, and the
environments the users bring the devices into. Whilst earlier work has looked
on checking and enforcing policies with low-level controls, this work aims
to capture these informal policyâs intents and the trust relationships within
them separating the policy specification from its enforcement. This allows us to
analyse the informal policies precisely, and reason about how they are used.
We show how AppPAL instantiates SecPAL, a policy language designed
for access control in distributed environments. We describe AppPALâs implementation
as an authorisation logic for mobile ecosystems. We show how
we can check AppPAL policies for common errors. Using AppPAL we show
that policies describing users privacy preferences do not seem to match the
apps users install. We explore the diâ”erences between app stores and how to
create new ones based on policy. We look at five BYOD policies and discover
previously unexamined idioms within them. This suggests aspects of BYOD
policies not managed by current BYOD tools
Quantifying prey availability using the foraging plasticity of a marine predator, the little penguin
Detecting changes in marine food webs is challenging, but top predators can provide information on lower trophic levels. However, many commonly measured predator responses can be decoupled from prey availability by plasticity in predator foraging effort. This can be overcome by directly measuring foraging effort and success and integrating these into a measure of foraging efficiency analogous to the catch per unit effort (CPUE) index employed by fisheries. We extended existing CPUE methods so that they would be applicable to the study of generalist foragers, which introduce another layer of complexity through dietary plasticity. Using this method, we inferred speciesâspecific patterns in prey availability and estimated taxonâspecific biomass consumption. We recorded foraging trip duration and body mass change of breeding little penguins Eudyptula minor and combined these with diet composition identified via nonâinvasive faecal DNA metabarcoding to derive CPUE indices for individual prey taxa. We captured weekly patterns of availability of key fish prey in the penguinsâ diet and identified a major prey shift from sardine Sardinops sagax to red cod Pseudophycis bachus between years. In each year, predation on a dominant fish species (~150Â g/day) was replaced by greater diversity of fish in the diet as the breeding season progressed. We estimated that the colony extracted ~1,300 tonnes of biomass from their coastal ecosystem over two breeding seasons, including 219 tonnes of the commercially important sardine and 215 tonnes of red cod. This enhanced pCPUE is applicable to most centralâplaced foragers and offers a valuable alternative to existing metrics. Informed preyâspecies biomass estimates extracted by apex and meso predators will be a useful input for massâbalance ecosystem models and for informing ecosystemâbased management. A free Plain Language Summary can be found within the Supporting Information of this article
Methods to Improve Applicability and Efficiency of Distributed Data-Centric Compute Frameworks
The success of modern applications depends on the insights they collect from their data repositories. Data repositories for such applications currently exceed exabytes and are rapidly increasing in size, as they collect data from varied sources - web applications, mobile phones, sensors and other connected devices. Distributed storage and data-centric compute frameworks have been invented to store and analyze these large datasets. This dissertation focuses on extending the applicability and improving the efficiency of distributed data-centric compute frameworks
Anomaly-based Filtering of Application-Layer DDoS Against DNS Authoritatives
Authoritative DNS infrastructures are at the core of the Internet ecosystem.
But how resilient are typical authoritative DNS name servers against application-layer Denial-of-Service attacks?
In this paper, with the help of a large country-code TLD operator, we assess the expected attack load and DoS countermeasures.
We find that standard botnets or even single-homed attackers can overload the computational resources of authoritative name serversâeven if redundancy such as anycast is in place.
To prevent the resulting devastating DNS outages, we assess how effective upstream filters can be as a last resort.
We propose an anomaly detection defense that allows both, well-behaving high-volume DNS resolvers as well as low-volume clients to continue name lookupsâwhile blocking most of the attack traffic.
Upstream ISPs or IXPs can deploy our scheme and drop attack traffic to reasonable query loads at or below 100k queries per second at a false positive rate of 1.2âŻ% to 5.7âŻ% (median 2.4âŻ%)
Water and Nutrition: Harmonizing actions for the United Nations Decade of Action on Nutrition and the United Nations Water Action Decade
Progress for both SDG 2 and SDG 6 has been unsatisfactory, with several indicators worsening over time,
including an increase in the number of undernourished, overweight and obese people, as well as rapid increases
in the number of people at risk of severe water shortages. This lack of progress is exacerbated by climate
change and growing regional and global inequities in food and water security, including access to good quality
diets, leading to increased violation of the human rights to water and food.
Reversing these trends will require a much greater effort on the part of water, food security, and nutrition
communities, including stronger performances by the United Nations Decade of Action on Nutrition and the
United Nations International Decade for Action on Water for Sustainable Development. To date, increased
collaboration by these two landmark initiatives is lacking, as neither work program has systematically
explored linkages or possibilities for joint interventions.
Collaboration is especially imperative given the fundamental challenges that characterize the promotion of
one priority over another. Without coordination across the water, food security, and nutrition communities,
actions toward achieving SDG2 on zero hunger may contribute to further degradation of the worldâs water
resources and as such, further derail achievement of the UN Decade of Action on Water and SDG 6 on water
and sanitation. Conversely, actions to enhance SDG 6 may well reduce progress on the UN Decade of Action
on Nutrition and SDG 2.
This paper reviews these challenges as part of a broader analysis of the complex web of pathways that link
water, food security and nutrition outcomes. Climate change and the growing demand for water resources are
also considered, given their central role in shaping future water and nutrition security. The main conclusions
are presented as three recommendations focused on potential avenues to deal with the complexity of the
water-nutrition nexus, and to optimize outcomes
Algorithmic Techniques in Gene Expression Processing. From Imputation to Visualization
The amount of biological data has grown exponentially in recent decades. Modern biotechnologies, such as microarrays and next-generation sequencing, are capable to produce massive amounts of biomedical data in a single experiment. As the amount of the data is rapidly growing there is an urgent need for reliable computational methods for analyzing and visualizing it. This thesis addresses this need by studying how to efficiently and reliably analyze and visualize high-dimensional data, especially that obtained from gene expression microarray experiments.
First, we will study the ways to improve the quality of microarray data by replacing (imputing) the missing data entries with the estimated values for these entries. Missing value imputation is a method which is commonly used to make the original incomplete data complete, thus making it easier to be analyzed with statistical and computational methods. Our novel approach was to use curated external biological information as a guide for the missing
value imputation.
Secondly, we studied the effect of missing value imputation on the downstream data analysis methods like clustering. We compared multiple recent imputation algorithms against 8 publicly available microarray data sets. It was observed that the missing value imputation indeed is a rational way to improve the quality of biological data. The research revealed differences between the clustering results obtained with different imputation methods. On most data sets, the simple and fast k-NN imputation was good enough, but there were also needs for more advanced imputation methods, such as Bayesian Principal Component Algorithm (BPCA).
Finally, we studied the visualization of biological network data. Biological interaction networks are examples of the outcome of multiple biological experiments such as using the gene microarray techniques. Such networks are typically very large and highly connected, thus there is a need for fast algorithms for producing visually pleasant layouts. A computationally efficient way to produce layouts of large biological interaction networks was developed. The algorithm uses multilevel optimization within the regular force directed graph layout algorithm.Siirretty Doriast
- âŠ