431,170 research outputs found

    Improving Automatic Content Type Identification from a Data Set

    Get PDF
    Data file layout inference refers to building the structure and determining the metadata of a text file. The text files dealt within this research are personal information records that have a consistent structure. Traditionally, if the layout structure of a text file is unknown, the human user must undergo manual labor of identifying the metadata. This is inefficient and prone to error. Content-based oracles are the current state-of-the-art automation technology that attempts to solve the layout inference problem by using databases of known metadata. This paper builds upon the information and documentation of the content-based oracles, and improves the databases of the oracles through experimentation

    An architecture of a user-centred digital library for the academic community

    Get PDF
    An architecture of a user-centred digital library, designed to lead users of an academic community to the required information resources based on their tasks, is proposed. Information resources include full-text articles, databases, theses and dissertations, e-journals, e-books, multimedia databases, and so on. Other information resources such as university course calendars, university statutes, course registration, thesis and dissertation guidelines, style guides, and so on, are also needed by users. A prototype has been designed and developed using the School of Computer Engineering at Nanyang Technological University (NTU) as an example of such an environment to provide access to these information resources which are spread across different servers and in different home pages This prototype provides links to various information resources according to users' needs, as well as a personal work space to record/store his/her publications, frequently used or favorite hyperlinks and references or notes. Various stages of the prototype design and development are described and future works on this line are highlighted

    Privacy and Confidentiality in an e-Commerce World: Data Mining, Data Warehousing, Matching and Disclosure Limitation

    Full text link
    The growing expanse of e-commerce and the widespread availability of online databases raise many fears regarding loss of privacy and many statistical challenges. Even with encryption and other nominal forms of protection for individual databases, we still need to protect against the violation of privacy through linkages across multiple databases. These issues parallel those that have arisen and received some attention in the context of homeland security. Following the events of September 11, 2001, there has been heightened attention in the United States and elsewhere to the use of multiple government and private databases for the identification of possible perpetrators of future attacks, as well as an unprecedented expansion of federal government data mining activities, many involving databases containing personal information. We present an overview of some proposals that have surfaced for the search of multiple databases which supposedly do not compromise possible pledges of confidentiality to the individuals whose data are included. We also explore their link to the related literature on privacy-preserving data mining. In particular, we focus on the matching problem across databases and the concept of ``selective revelation'' and their confidentiality implications.Comment: Published at http://dx.doi.org/10.1214/088342306000000240 in the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Guilt By Genetic Association: The Fourth Amendment and the Search of Private Genetic Databases by Law Enforcement

    Get PDF
    Over the course of 2018, a number of suspects in unsolved crimes have been identified through the use of GEDMatch, a public online genetic database. Law enforcement’s use of GEDMatch to identify suspects in cold cases likely does not constitute a search under the Fourth Amendment because the genetic information hosted on the website is publicly available. Transparency reports from direct-to-consumer (DTC) genetic testing providers like 23andMe and Ancestry suggest that federal and state officials may now be requesting access to private genetic databases as well. Whether law enforcement’s use of private DTC genetic databases to search for familial relatives of a suspect’s genetic profile constitutes a search within the meaning of the Fourth Amendment is far less clear. A strict application of the third-party doctrine suggests that individuals have no expectation of privacy in genetic information that they voluntarily disclose to third parties, including DTC providers. This Note, however, contends that the U.S. Supreme Court’s recent decision in Carpenter v. United States overwhelmingly supports the proposition that genetic information disclosed to third-party DTC providers is subject to Fourth Amendment protection. Approximately fifteen million individuals in the United States have already submitted their genetic information to DTC providers. The genetic information held by these providers can reveal a host of highly intimate details about consumers’ medical conditions, behavioral traits, genetic health risks, ethnic background, and familial relationships. Allowing law enforcement warrantless access to investigate third-party DTC genetic databases circumvents their consumers’ reasonable expectations of privacy by exposing this sensitive genetic information to law enforcement without any meaningful oversight. Furthermore, individuals likely reasonably expect that they retain ownership over their uniquely personal genetic information despite their disclosure of that information to a thirdparty provider. This Note therefore asserts that the third-party doctrine does not permit law enforcement to conduct warrantless searches for suspects on private DTC genetics databases under the Fourth Amendment

    Use of anti-terrorism digital ecosystem in the fight against terrorism

    Get PDF
    In this paper, we propose an Anti-terrorist Digital Ecosystem (ATDES) that enables efficient terrorist identification and protection against terrorist attacks. An Anti-terrorist Digital Environment (ATDE) is designed as being populated by interconnected Anti-terrorist Digital Components (ATDC). ATDC are combined together to support collaboration, cooperation and sharing of available information between various regions, countries and even continents.ATDC may be any useful idea that can be digitalized, transported within the ecosystem and processed by humans or by computers. The key ATDC include ID databases that contain personal records, screening components that read personal records and match them with the available information from the ID databases and machine-readable personal records. The available information is put into one big virtual database and enables matching of personal records.If the available information is to be shared between various ID information resources, standardization of data needs to take place. Ontologies can be used for this purpose. Instantiation of the Ontology concepts result in ID Ontologies that act as personal records. Because Ontology files are machine readable, it is possible to do the matching of personal records with the available ID records from the networked ID databases and to action the results.The significance of this research lies in the unification of the advances of the Ontology technology and Ecosystem paradigm for the purpose of creating a more secure environment in which to fight against terrorism

    Huber, Marper and Others: Throwing new light on the shadows of suspicion. INEX Policy Brief No. 8, June 2010

    Get PDF
    The proliferation of large-scale databases containing personal information, and the multiple uses to which they can be put, can be highly problematic from the perspective of fundamental rights and freedoms. This paper discusses two landmark decisions that illustrate some of the risks linked to these developments and point to a better framing of such practices: the Heinz Huber v. Germany judgement, from the European Court of Justice, and the S. and Marper v. United Kingdom ruling, from the European Court of Human Rights. The paper synthesises the lessons to be learnt from such decisions. Additionally, it questions the impact of the logic of pure prevention that is being combined with other rationales in the design and management of databases. This Policy Brief is published in the context of the INEX project, which looks at converging and conflicting ethical values in the internal/external security continuum in Europe, and is funded by the Security Programme of DG Enterprise of the European Commission’s Seventh Framework Research Programme. For more information visit: www.inexproject.e

    Record-Linkage from a Technical Point of View

    Get PDF
    TRecord linkage is used for preparing sampling frames, deduplication of lists and combining information on the same object from two different databases. If the identifiers of the same objects in two different databases have error free unique common identifiers like personal identification numbers (PID), record linkage is a simple file merge operation. If the identifiers contains errors, record linkage is a challenging task. In many applications, the files have widely different numbers of observations, for example a few thousand records of a sample survey and a few million records of an administrative database of social security numbers. Available software, privacy issues and future research topics are discussed.Record-Linkage, Data-mining, Privacy preserving protocols
    • 

    corecore