648 research outputs found
Comparing knowledge sources for nominal anaphora resolution
We compare two ways of obtaining lexical knowledge for antecedent selection in other-anaphora
and definite noun phrase coreference. Specifically, we compare an algorithm that relies on links
encoded in the manually created lexical hierarchy WordNet and an algorithm that mines corpora
by means of shallow lexico-semantic patterns. As corpora we use the British National
Corpus (BNC), as well as the Web, which has not been previously used for this task. Our
results show that (a) the knowledge encoded in WordNet is often insufficient, especially for
anaphor-antecedent relations that exploit subjective or context-dependent knowledge; (b) for
other-anaphora, the Web-based method outperforms the WordNet-based method; (c) for definite
NP coreference, the Web-based method yields results comparable to those obtained using
WordNet over the whole dataset and outperforms the WordNet-based method on subsets of the
dataset; (d) in both case studies, the BNC-based method is worse than the other methods because
of data sparseness. Thus, in our studies, the Web-based method alleviated the lexical knowledge
gap often encountered in anaphora resolution, and handled examples with context-dependent relations
between anaphor and antecedent. Because it is inexpensive and needs no hand-modelling
of lexical knowledge, it is a promising knowledge source to integrate in anaphora resolution systems
What Can We Learn Privately?
Learning problems form an important category of computational tasks that
generalizes many of the computations researchers apply to large real-life data
sets. We ask: what concept classes can be learned privately, namely, by an
algorithm whose output does not depend too heavily on any one input or specific
training example? More precisely, we investigate learning algorithms that
satisfy differential privacy, a notion that provides strong confidentiality
guarantees in contexts where aggregate information is released about a database
containing sensitive information about individuals. We demonstrate that,
ignoring computational constraints, it is possible to privately agnostically
learn any concept class using a sample size approximately logarithmic in the
cardinality of the concept class. Therefore, almost anything learnable is
learnable privately: specifically, if a concept class is learnable by a
(non-private) algorithm with polynomial sample complexity and output size, then
it can be learned privately using a polynomial number of samples. We also
present a computationally efficient private PAC learner for the class of parity
functions. Local (or randomized response) algorithms are a practical class of
private algorithms that have received extensive investigation. We provide a
precise characterization of local private learning algorithms. We show that a
concept class is learnable by a local algorithm if and only if it is learnable
in the statistical query (SQ) model. Finally, we present a separation between
the power of interactive and noninteractive local learning algorithms.Comment: 35 pages, 2 figure
Under pressure: costs of living, financial hardship and emergency relief in Victoria
'Under pressure: Costs of living, financial hardship and emergency relief in Victoria' presents the findings of a research project conducted between 2007 and 2008 on demand for emergency relief in Victoria. The research project was a partnership between the Victorian Council of Social Service (VCOSS), RMIT University and the emergency relief peak body ER Victoria. Emergency relief can be defined as the provision of critical support to individuals and families experiencing a financial emergency or crisis. Emergency relief assistance can include a food voucher or parcel, household goods, clothing and financial assistance for utilities or food. Emergency relief is currently provided by over 700 non-government organisations in Victoria
Differential Privacy and the Fat-Shattering Dimension of Linear Queries
In this paper, we consider the task of answering linear queries under the
constraint of differential privacy. This is a general and well-studied class of
queries that captures other commonly studied classes, including predicate
queries and histogram queries. We show that the accuracy to which a set of
linear queries can be answered is closely related to its fat-shattering
dimension, a property that characterizes the learnability of real-valued
functions in the agnostic-learning setting.Comment: Appears in APPROX 201
HI 21-centimetre emission from an ensemble of galaxies at an average redshift of one
The baryonic processes in galaxy evolution include gas infall onto galaxies
to form neutral atomic hydrogen (HI), the conversion of HI to the molecular
state (H), and, finally, the conversion of H to stars. Understanding
galaxy evolution thus requires understanding the evolution of both the stars,
and the neutral atomic and molecular gas, the primary fuel for star-formation,
in galaxies. For the stars, the cosmic star-formation rate density is known to
peak in the redshift range , and to decline by an order of
magnitude over the next billion years; the causes of this decline
are not known. For the gas, the weakness of the hyperfine HI 21cm transition,
the main tracer of the HI content of galaxies, has meant that it has not
hitherto been possible to measure the atomic gas mass of galaxies at redshifts
higher than ; this is a critical lacuna in our understanding of
galaxy evolution. Here, we report a measurement of the average HI mass of
star-forming galaxies at a redshift , by stacking their individual
HI 21 cm emission signals. We obtain an average HI mass similar to the average
stellar mass of the sample. We also estimate the average star-formation rate of
the same galaxies from the 1.4 GHz radio continuum, and find that the HI mass
can fuel the observed star-formation rates for only billion years
in the absence of fresh gas infall. This suggests that gas accretion onto
galaxies at may have been insufficient to sustain high star-formation
rates in star-forming galaxies. This is likely to be the cause of the decline
in the cosmic star-formation rate density at redshifts below 1.Comment: 43 pages, 8 figures. Published in Nature:
https://www.nature.com/articles/s41586-020-2794-7. In the original version of
the paper, the upper x-axis "Lookback time (Gyr)" of Fig. 3 had
incorrectly-placed tick marks. This is now fixed in the updated version. We
thank Sambit Roychowdhury for pointing out the erro
Probing star formation in galaxies at via a Giant Metrewave Radio Telescope stacking analysis
We have used the Giant Metrewave Radio Telescope (GMRT) to carry out deep 610
MHz continuum imaging of four sub-fields of the DEEP2 Galaxy Redshift Survey.
We stacked the radio emission in the GMRT images from a near-complete (absolute
blue magnitude ) sample of 3698 blue star-forming galaxies
with redshifts to detect (at
significance) the median rest-frame 1.4 GHz radio continuum emission of the
sample galaxies. The stacked emission is unresolved, with a rest-frame 1.4 GHz
luminosity of W
Hz. We used the local relation between total star formation rate (SFR)
and 1.4 GHz luminosity to infer a median total SFR of yr for blue star-forming galaxies with at
. We detect the main-sequence relation between
SFR and stellar mass, , obtaining ; the
power-law index shows no change over . We find that the
nebular line emission suffers less extinction than the stellar continuum,
contrary to the situation in the local Universe; the ratio of nebular
extinction to stellar extinction increases with decreasing redshift. We obtain
an upper limit of 0.87 Gyr to the atomic gas depletion time of a sub-sample of
the DEEP2 galaxies at ; neutral atomic gas thus appears to be a
transient phase in high- star-forming galaxies.Comment: 16 pages, 7 figures; accepted for publication in ApJ. Minor changes
to match version accepted for publicatio
Targeted delivery of anti-inflammatory therapy to rheumatoid tissue by fusion proteins containing an IL-4-linked synovial targeting peptide
We provide first-time evidence that the synovial endothelium-targeting peptide (SyETP) CKSTHDRLC successfully delivers conjugated IL-4 to human rheumatoid synovium transplanted into SCID mice. SyETP, previously isolated by in vivo phage display and shown to preferentially localize to synovial xenografts, was linked by recombinant technology to hIL-4 via an MMP-cleavable sequence. Both IL-4 and the MMP-cleavable sequence were shown to be functional. IL-4-SyETP augmented production of IL-1ra by synoviocytes stimulated with IL-1[beta] in a dose-dependent manner. In vivo imaging confirmed increased retention of SyETP-linked-IL-4 in synovial grafts which was enhanced by increasing number of copies (one to three) in the constructs. Strikingly, SyETP delivered bioactive IL-4 in vivo as demonstrated by increased pSTAT6 in synovial grafts. Thus, this study provides proof of concept for peptide-tissue-specific targeted immunotherapy in rheumatoid arthritis. This technology is potentially applicable to other biological therapies providing enhanced potency to inflammatory sites and reducing systemic toxicity
Formalizing Data Deletion in the Context of the Right to be Forgotten
The right of an individual to request the deletion of their personal data by
an entity that might be storing it -- referred to as the right to be forgotten
-- has been explicitly recognized, legislated, and exercised in several
jurisdictions across the world, including the European Union, Argentina, and
California. However, much of the discussion surrounding this right offers only
an intuitive notion of what it means for it to be fulfilled -- of what it means
for such personal data to be deleted.
In this work, we provide a formal definitional framework for the right to be
forgotten using tools and paradigms from cryptography. In particular, we
provide a precise definition of what could be (or should be) expected from an
entity that collects individuals' data when a request is made of it to delete
some of this data. Our framework captures several, though not all, relevant
aspects of typical systems involved in data processing. While it cannot be
viewed as expressing the statements of current laws (especially since these are
rather vague in this respect), our work offers technically precise definitions
that represent possibilities for what the law could reasonably expect, and
alternatives for what future versions of the law could explicitly require.
Finally, with the goal of demonstrating the applicability of our framework
and definitions, we consider various natural and simple scenarios where the
right to be forgotten comes up. For each of these scenarios, we highlight the
pitfalls that arise even in genuine attempts at implementing systems offering
deletion guarantees, and also describe technological solutions that provably
satisfy our definitions. These solutions bring together techniques built by
various communities
Intelligent OS X malware threat detection with code inspection
With the increasing market share of Mac OS X operating system, there is a corresponding increase in the number of malicious programs (malware) designed to exploit vulnerabilities on Mac OS X platforms. However, existing manual and heuristic OS X malware detection techniques are not capable of coping with such a high rate of malware. While machine learning techniques offer promising results in automated detection of Windows and Android malware, there have been limited efforts in extending them to OS X malware detection. In this paper, we propose a supervised machine learning model. The model applies kernel base Support Vector Machine (SVM) and a novel weighting measure based on application library calls to detect OS X malware. For training and evaluating the model, a dataset with a combination of 152 malware and 450 benign were is created. Using common supervised Machine Learning algorithm on the dataset, we obtain over 91% detection accuracy with 3.9% false alarm rate. We also utilize Synthetic Minority Over-sampling Technique (SMOTE) to create three synthetic datasets with different distributions based on the refined version of collected dataset to investigate impact of different sample sizes on accuracy of malware detection. Using SMOTE datasets we could achieve over 96% detection accuracy and false alarm of less than 4%. All malware classification experiments are tested using cross validation technique. Our results reflect that increasing sample size in synthetic datasets has direct positive effect on detection accuracy while increases false alarm rate in compare to the original dataset
- …