Search CORE

190,670 research outputs found

Moving from Data-Constrained to Data-Enabled Research: Experiences and Challenges in Collecting, Validating and Analyzing Large-Scale e-Commerce Data

Author: Bapna Ravi
Goes Paulo
Gopal Ram
Marsden James R.
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/05/2006
Field of study

Widespread e-commerce activity on the Internet has led to new opportunities to collect vast amounts of micro-level market and nonmarket data. In this paper we share our experiences in collecting, validating, storing and analyzing large Internet-based data sets in the area of online auctions, music file sharing and online retailer pricing. We demonstrate how such data can advance knowledge by facilitating sharper and more extensive tests of existing theories and by offering observational underpinnings for the development of new theories. Just as experimental economics pushed the frontiers of economic thought by enabling the testing of numerous theories of economic behavior in the environment of a controlled laboratory, we believe that observing, often over extended periods of time, real-world agents participating in market and nonmarket activity on the Internet can lead us to develop and test a variety of new theories. Internet data gathering is not controlled experimentation. We cannot randomly assign participants to treatments or determine event orderings. Internet data gathering does offer potentially large data sets with repeated observation of individual choices and action. In addition, the automated data collection holds promise for greatly reduced cost per observation. Our methods rely on technological advances in automated data collection agents. Significant challenges remain in developing appropriate sampling techniques integrating data from heterogeneous sources in a variety of formats, constructing generalizable processes and understanding legal constraints. Despite these challenges, the early evidence from those who have harvested and analyzed large amounts of e-commerce data points toward a significant leap in our ability to understand the functioning of electronic commerce.Comment: Published at http://dx.doi.org/10.1214/088342306000000231 in the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

Crossref

Caltech Authors

Building a document genre corpus: a profile of the KRYS I corpus

Author: Berninger Ms Vera
Kim Dr Yunhyong
Ross Seamus
Publication venue
Publication date: 01/01/2008
Field of study

This paper describes the KRYS I corpus, consisting of documents classified into 70 genre classes. It has been constructed as part of an effort to automate document genre classification as distinct from topic detection. Previously there has been very little work on building corpora of texts which have been classified using a nontopical genre palette. The reason for this is partly due to the fact that genre as a concept, is rooted in philosophy, rhetoric and literature, and highly complex and domain dependent in its interpretation ([11]). The usefulness of genre in everyday information search is only now starting to be recognised and there is no genre classification schema that has been consolidated to have applicable value in this direction. By presenting here our experiences in constructing the KRYS I corpus, we hope to shed light on the information gathering and seeking behaviour and the role of genre in these activities, as well as a way forward for creating a better corpus for testing automated genre classification tasks and the application of these tasks to other domains.

Building a Document Genre Corpus: a Profile of the KRYS I Corpus

Author: Berninger Vera
Kim Yunhyong
Ross Seamus
Publication venue
Publication date: 18/10/2008
Field of study

This paper describes the KRYS I corpus (http://www.krys-corpus.eu/Info.html), consisting of documents classified into 70 genre classes. It has been constructed as part of an effort to automate document genre classification as distinct from topic detection. Previously there has been very little work on building corpora of texts which have been classified using a non-topical genre palette. The reason for this is partly due to the fact that genre as a concept, is rooted in philosophy, rhetoric and literature, and highly complex and domain dependent in its interpretation ([11]). The usefulness of genre in everyday information search is only now starting to be recognised and there is no genre classification schema that has been consolidated to have applicable value in this direction. By presenting here our experiences in constructing the KRYS I corpus, we hope to shed light on the information gathering and seeking behaviour and the role of genre in these activities, as well as a way forward for creating a better corpus for testing automated genre classification tasks and the application of these tasks to other domains

Crossref

Enlighten

Justice Data Base Directory

Author: Alaska Justice Statistical Analysis Unit
Moras Antonia
Publication venue: Alaska Justice Statistical Analysis Unit, Justice Center, University of Alaska Anchorage
Publication date: 01/09/1992
Field of study

The Justice Data Base Directory was originally published in 1988 with an introduction, 8 chapters describing Alaska justice agencies and their data holdings, and an index. It was published in looseleaf notebook format for easy updating. Four updates were published in 1989–1992, each update consisting of additional chapters, revised table of contents and index, and updates to existing pages to reflect changes such as agency addresses. Five chapters were added in 1989; five in 1990; four in 1991; and five in 1992, for a total of 27 agencies covered by the Justice Data Base Directory in its final form. For archival purposes, this record includes all five versions of the directory. The 1992 edition is the most complete.The Justice Data Base Directory, first published in 1988 with new chapters added annually through 1992, presents information about the primary databases maintained by Alaska justice agencies and the procedures to be followed for access to the data. Its availability should substantially reduce the work required to identify the sources of data for research and policy development in law, law enforcement, courts, and corrections. The 1992 update to the directory adds five chapters, for a total of 27 Alaska agencies whose justice-related data holdings are described: Alaska Court System; Alaska Judicial Council; Alaska Commission on Judicial Conduct; Alaska Department of Law; Alaska Department of Public Safety (DPS) and three agencies under DPS: Alaska Police Standards Council, Council on Domestic Violence and Sexual Assault (CDSA), and Violent Crimes Compensation Board; Alaska Department of Corrections (DOC) and Parole Board; four agencies of the Alaska Department of Health and Social Services — Bureau of Vital Statistics (Division of Public Health), Epidemiology Section (Division of Public Health), Division of Family and Youth Services, and Office of Alcoholism and Drug Abuse; Alaska Public Defender Agency; Office of Public Advocacy (OPA); Alaska Bar Association; Alaska Justice Statistical Analysis Unit; Alaska Office of Equal Employment Opportunity (Office of the Governor); Alaska Office of the Ombudsman; Alaska Legal Services Corporation; Alaska Public Offices Commission; Alaska State Commission for Human Rights; Alcoholic Beverage Control (ABC) Board; Legislative Research Agency; Legislative Affairs Agency; State Archives and Records Management Services (Alaska Department of Education). Fully indexed.Funded in part by a grant from the Bureau of Justice Statistics.1. Introduction / 2. Alaska Court System / 3. Alaska Department of Law / 4. Alaska Department of Public Safety / 5. Alaska Department of Corrections / 6. Division of Family and Youth Services, Alaska Department of Health and Social Services / 7. Alaska Bar Association / 8. Alaska Judicial Council / 9. Alaska Justice Statistical Analysis Unit / 10. Bureau of Vital Statistics, Division of Public Health, Alaska Department of Health and Social Services / 11. Alaska Office of Equal Employment Opportunity, Office of the Governor / 12. Office of Alcoholism and Drug Abuse, Alaska Department of Health and Social Services / 13. Council on Domestic Violence and Sexual Assault, Alaska Department of Public Safety / 14. Epidemiology Section, Division of Public Health, Alaska Department of Health and Social Services / 15. Violent Crimes Compensation Board, Alaska Department of Public Safety / 16. Alaska Police Standards Council, Alaska Department of Public Safety / 17. Alcoholic Beverage Control Board / 18. Alaska Office of the Ombudsman / 19. State Archives and Records Management Services, Alaska Department of Education / 20. Legislative Research Agency / 21. Legislative Affairs Agency / 22. Alaska State Commission for Human Rights / 23. Parole Board, Alaska Department of Corrections / 24. Alaska Public Offices Commission / 25. Alaska Commission on Judicial Conduct / 26. Alaska Legal Services Corporation / 27. Office of Public Advocacy / 28. Alaska Public Defender Agency / 29. Inde

ScholarWorks@UA

A Case Study in Matching Service Descriptions to Implementations in an Existing System

Author: D'Souza Deepak
Gupta Hari S.
Komondoor Raghavan
Rama Girish M.
Publication venue
Publication date: 01/01/2010
Field of study

A number of companies are trying to migrate large monolithic software systems to Service Oriented Architectures. A common approach to do this is to first identify and describe desired services (i.e., create a model), and then to locate portions of code within the existing system that implement the described services. In this paper we describe a detailed case study we undertook to match a model to an open-source business application. We describe the systematic methodology we used, the results of the exercise, as well as several observations that throw light on the nature of this problem. We also suggest and validate heuristics that are likely to be useful in partially automating the process of matching service descriptions to implementations.Comment: 20 pages, 19 pdf figure

arXiv.org e-Print Archive

Open Access Repository of IISc Research Publications

VIENA2: A Driving Anticipation Dataset

Author: A Pentland
FS Saleh
G Ros
HS Koppula
JFP Kooij
L Wang
SR Richter
X Li
X Wang
Publication venue
Publication date: 29/10/2018
Field of study

Action anticipation is critical in scenarios where one needs to react before the action is finalized. This is, for instance, the case in automated driving, where a car needs to, e.g., avoid hitting pedestrians and respect traffic lights. While solutions have been proposed to tackle subsets of the driving anticipation tasks, by making use of diverse, task-specific sensors, there is no single dataset or framework that addresses them all in a consistent manner. In this paper, we therefore introduce a new, large-scale dataset, called VIENA2, covering 5 generic driving scenarios, with a total of 25 distinct action classes. It contains more than 15K full HD, 5s long videos acquired in various driving conditions, weathers, daytimes and environments, complemented with a common and realistic set of sensor measurements. This amounts to more than 2.25M frames, each annotated with an action label, corresponding to 600 samples per action class. We discuss our data acquisition strategy and the statistics of our dataset, and benchmark state-of-the-art action anticipation techniques, including a new multi-modal LSTM architecture with an effective loss function for action anticipation in driving scenarios.Comment: Accepted in ACCV 201

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Crossref

Why (and How) Networks Should Run Themselves

Author: Feamster Nick
Rexford Jennifer
Publication venue
Publication date: 31/10/2017
Field of study

The proliferation of networked devices, systems, and applications that we depend on every day makes managing networks more important than ever. The increasing security, availability, and performance demands of these applications suggest that these increasingly difficult network management problems be solved in real time, across a complex web of interacting protocols and systems. Alas, just as the importance of network management has increased, the network has grown so complex that it is seemingly unmanageable. In this new era, network management requires a fundamentally new approach. Instead of optimizations based on closed-form analysis of individual protocols, network operators need data-driven, machine-learning-based models of end-to-end and application performance based on high-level policy goals and a holistic view of the underlying components. Instead of anomaly detection algorithms that operate on offline analysis of network traces, operators need classification and detection algorithms that can make real-time, closed-loop decisions. Networks should learn to drive themselves. This paper explores this concept, discussing how we might attain this ambitious goal by more closely coupling measurement with real-time control and by relying on learning for inference and prediction about a networked application or system, as opposed to closed-form analysis of individual protocols

arXiv.org e-Print Archive

Princeton University Open Access Repository

Crossref

Accurator: Nichesourcing for Cultural Heritage

Author: Aroyo Lora
De Boer Victor
Dijkshoorn Chris
Schreiber Guus
Publication venue
Publication date: 01/01/2017
Field of study

With more and more cultural heritage data being published online, their usefulness in this open context depends on the quality and diversity of descriptive metadata for collection objects. In many cases, existing metadata is not adequate for a variety of retrieval and research tasks and more specific annotations are necessary. However, eliciting such annotations is a challenge since it often requires domain-specific knowledge. Where crowdsourcing can be successfully used for eliciting simple annotations, identifying people with the required expertise might prove troublesome for tasks requiring more complex or domain-specific knowledge. Nichesourcing addresses this problem, by tapping into the expert knowledge available in niche communities. This paper presents Accurator, a methodology for conducting nichesourcing campaigns for cultural heritage institutions, by addressing communities, organizing events and tailoring a web-based annotation tool to a domain of choice. The contribution of this paper is threefold: 1) a nichesourcing methodology, 2) an annotation tool for experts and 3) validation of the methodology and tool in three case studies. The three domains of the case studies are birds on art, bible prints and fashion images. We compare the quality and quantity of obtained annotations in the three case studies, showing that the nichesourcing methodology in combination with the image annotation tool can be used to collect high quality annotations in a variety of domains and annotation tasks. A user evaluation indicates the tool is suited and usable for domain specific annotation tasks

arXiv.org e-Print Archive

VU Research Portal