798 research outputs found
Oblivion: Mitigating Privacy Leaks by Controlling the Discoverability of Online Information
Search engines are the prevalently used tools to collect information about
individuals on the Internet. Search results typically comprise a variety of
sources that contain personal information -- either intentionally released by
the person herself, or unintentionally leaked or published by third parties,
often with detrimental effects on the individual's privacy. To grant
individuals the ability to regain control over their disseminated personal
information, the European Court of Justice recently ruled that EU citizens have
a right to be forgotten in the sense that indexing systems, must offer them
technical means to request removal of links from search results that point to
sources violating their data protection rights. As of now, these technical
means consist of a web form that requires a user to manually identify all
relevant links upfront and to insert them into the web form, followed by a
manual evaluation by employees of the indexing system to assess if the request
is eligible and lawful.
We propose a universal framework Oblivion to support the automation of the
right to be forgotten in a scalable, provable and privacy-preserving manner.
First, Oblivion enables a user to automatically find and tag her disseminated
personal information using natural language processing and image recognition
techniques and file a request in a privacy-preserving manner. Second, Oblivion
provides indexing systems with an automated and provable eligibility mechanism,
asserting that the author of a request is indeed affected by an online
resource. The automated ligibility proof ensures censorship-resistance so that
only legitimately affected individuals can request the removal of corresponding
links from search results. We have conducted comprehensive evaluations, showing
that Oblivion is capable of handling 278 removal requests per second, and is
hence suitable for large-scale deployment
Embedding Words and Senses Together via Joint Knowledge-Enhanced Training
Word embeddings are widely used in Nat-ural Language Processing, mainly due totheir success in capturing semantic infor-mation from massive corpora. However,their creation process does not allow thedifferent meanings of a word to be auto-matically separated, as it conflates theminto a single vector. We address this issueby proposing a new model which learnsword and sense embeddings jointly. Ourmodel exploits large corpora and knowl-edge from semantic networks in order toproduce a unified vector space of wordand sense embeddings. We evaluate themain features of our approach both qual-itatively and quantitatively in a variety oftasks, highlighting the advantages of theproposed method in comparison to state-of-the-art word- and sense-based models
Unifying context with labeled property graph: A pipeline-based system for comprehensive text representation in NLP
Extracting valuable insights from vast amounts of unstructured digital text presents significant challenges across diverse domains. This research addresses this challenge by proposing a novel pipeline-based system that generates domain-agnostic and task-agnostic text representations. The proposed approach leverages labeled property graphs (LPG) to encode contextual information, facilitating the integration of diverse linguistic elements into a unified representation. The proposed system enables efficient graph-based querying and manipulation by addressing the crucial aspect of comprehensive context modeling and fine-grained semantics. The effectiveness of the proposed system is demonstrated through the implementation of NLP components that operate on LPG-based representations. Additionally, the proposed approach introduces specialized patterns and algorithms to enhance specific NLP tasks, including nominal mention detection, named entity disambiguation, event enrichments, event participant detection, and temporal link detection. The evaluation of the proposed approach, using the MEANTIME corpus comprising manually annotated documents, provides encouraging results and valuable insights into the system\u27s strengths. The proposed pipeline-based framework serves as a solid foundation for future research, aiming to refine and optimize LPG-based graph structures to generate comprehensive and semantically rich text representations, addressing the challenges associated with efficient information extraction and analysis in NLP
- …