190,670 research outputs found
Moving from Data-Constrained to Data-Enabled Research: Experiences and Challenges in Collecting, Validating and Analyzing Large-Scale e-Commerce Data
Widespread e-commerce activity on the Internet has led to new opportunities
to collect vast amounts of micro-level market and nonmarket data. In this paper
we share our experiences in collecting, validating, storing and analyzing large
Internet-based data sets in the area of online auctions, music file sharing and
online retailer pricing. We demonstrate how such data can advance knowledge by
facilitating sharper and more extensive tests of existing theories and by
offering observational underpinnings for the development of new theories. Just
as experimental economics pushed the frontiers of economic thought by enabling
the testing of numerous theories of economic behavior in the environment of a
controlled laboratory, we believe that observing, often over extended periods
of time, real-world agents participating in market and nonmarket activity on
the Internet can lead us to develop and test a variety of new theories.
Internet data gathering is not controlled experimentation. We cannot randomly
assign participants to treatments or determine event orderings. Internet data
gathering does offer potentially large data sets with repeated observation of
individual choices and action. In addition, the automated data collection holds
promise for greatly reduced cost per observation. Our methods rely on
technological advances in automated data collection agents. Significant
challenges remain in developing appropriate sampling techniques integrating
data from heterogeneous sources in a variety of formats, constructing
generalizable processes and understanding legal constraints. Despite these
challenges, the early evidence from those who have harvested and analyzed large
amounts of e-commerce data points toward a significant leap in our ability to
understand the functioning of electronic commerce.Comment: Published at http://dx.doi.org/10.1214/088342306000000231 in the
Statistical Science (http://www.imstat.org/sts/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Building a document genre corpus: a profile of the KRYS I corpus
This paper describes the KRYS I corpus, consisting of documents classified into 70 genre classes. It has
been constructed as part of an effort to automate document genre classification as distinct from topic
detection. Previously there has been very little work on building corpora of texts which have been classified
using a nontopical
genre palette. The reason for this is partly due to the fact that genre as a concept, is
rooted in philosophy, rhetoric and literature, and highly complex and domain dependent in its interpretation
([11]). The usefulness of genre in everyday information search is only now starting to be recognised and
there is no genre classification schema that has been consolidated to have applicable value in this direction.
By presenting here our experiences in constructing the KRYS I corpus, we hope to shed light on the
information gathering and seeking behaviour and the role of genre in these activities, as well as a way
forward for creating a better corpus for testing automated genre classification tasks and the application of
these tasks to other domains.
Building a Document Genre Corpus: a Profile of the KRYS I Corpus
This paper describes the KRYS I corpus (http://www.krys-corpus.eu/Info.html), consisting of documents classified into 70 genre classes. It has been constructed as part of an effort to automate document genre classification as distinct from topic detection. Previously there has been very little work on building corpora of texts which have been classified using a non-topical genre palette. The reason for this is partly due to the fact that genre as a concept, is rooted in philosophy, rhetoric and literature, and highly complex and domain dependent in its interpretation ([11]). The usefulness of genre in everyday information search is only now starting to be recognised and there is no genre classification schema that has been consolidated to have applicable value in this direction. By presenting here our experiences in constructing the KRYS I corpus, we hope to shed light on the information gathering and seeking behaviour and the role of genre in these activities, as well as a way forward for creating a better corpus for testing automated genre classification tasks and the application of these tasks to other domains
Justice Data Base Directory
The Justice Data Base Directory was originally published in 1988 with an introduction, 8 chapters describing Alaska justice agencies and their data holdings, and an index. It was published in looseleaf notebook format for easy updating. Four updates were published in 1989–1992, each update consisting of additional chapters, revised table of contents and index, and updates to existing pages to reflect changes such as agency addresses. Five chapters were added in 1989; five in 1990; four in 1991; and five in 1992, for a total of 27 agencies covered by the Justice Data Base Directory in its final form.
For archival purposes, this record includes all five versions of the directory. The 1992 edition is the most complete.The Justice Data Base Directory, first published in 1988 with new chapters added annually through 1992, presents information about the primary databases maintained by Alaska justice agencies and the procedures to be followed for access to the data. Its availability should substantially reduce the work required to identify the sources of data for research and policy development in law, law enforcement, courts, and corrections. The 1992 update to the directory adds five chapters, for a total of 27 Alaska agencies whose justice-related data holdings are described: Alaska Court System; Alaska Judicial Council; Alaska Commission on Judicial Conduct; Alaska Department of Law; Alaska Department of Public Safety (DPS) and three agencies under DPS: Alaska Police Standards Council, Council on Domestic Violence and Sexual Assault (CDSA), and Violent Crimes Compensation Board; Alaska Department of Corrections (DOC) and Parole Board; four agencies of the Alaska Department of Health and Social Services — Bureau of Vital Statistics (Division of Public Health), Epidemiology Section (Division of Public Health), Division of Family and Youth Services, and Office of Alcoholism and Drug Abuse; Alaska Public Defender Agency; Office of Public Advocacy (OPA); Alaska Bar Association; Alaska Justice Statistical Analysis Unit; Alaska Office of Equal Employment Opportunity (Office of the Governor); Alaska Office of the Ombudsman; Alaska Legal Services Corporation; Alaska Public Offices Commission; Alaska State Commission for Human Rights; Alcoholic Beverage Control (ABC) Board; Legislative Research Agency; Legislative Affairs Agency; State Archives and Records Management Services (Alaska Department of Education). Fully indexed.Funded in part by a grant from the Bureau of Justice Statistics.1. Introduction /
2. Alaska Court System /
3. Alaska Department of Law /
4. Alaska Department of Public Safety /
5. Alaska Department of Corrections /
6. Division of Family and Youth Services, Alaska Department of Health and Social Services /
7. Alaska Bar Association /
8. Alaska Judicial Council /
9. Alaska Justice Statistical Analysis Unit /
10. Bureau of Vital Statistics, Division of Public Health, Alaska Department of Health and Social Services /
11. Alaska Office of Equal Employment Opportunity, Office of the Governor /
12. Office of Alcoholism and Drug Abuse, Alaska Department of Health and Social Services /
13. Council on Domestic Violence and Sexual Assault, Alaska Department of Public Safety /
14. Epidemiology Section, Division of Public Health, Alaska Department of Health and Social Services /
15. Violent Crimes Compensation Board, Alaska Department of Public Safety /
16. Alaska Police Standards Council, Alaska Department of Public Safety /
17. Alcoholic Beverage Control Board /
18. Alaska Office of the Ombudsman /
19. State Archives and Records Management Services, Alaska Department of Education /
20. Legislative Research Agency /
21. Legislative Affairs Agency /
22. Alaska State Commission for Human Rights /
23. Parole Board, Alaska Department of Corrections /
24. Alaska Public Offices Commission /
25. Alaska Commission on Judicial Conduct /
26. Alaska Legal Services Corporation /
27. Office of Public Advocacy /
28. Alaska Public Defender Agency /
29. Inde
A Case Study in Matching Service Descriptions to Implementations in an Existing System
A number of companies are trying to migrate large monolithic software systems
to Service Oriented Architectures. A common approach to do this is to first
identify and describe desired services (i.e., create a model), and then to
locate portions of code within the existing system that implement the described
services. In this paper we describe a detailed case study we undertook to match
a model to an open-source business application. We describe the systematic
methodology we used, the results of the exercise, as well as several
observations that throw light on the nature of this problem. We also suggest
and validate heuristics that are likely to be useful in partially automating
the process of matching service descriptions to implementations.Comment: 20 pages, 19 pdf figure
VIENA2: A Driving Anticipation Dataset
Action anticipation is critical in scenarios where one needs to react before
the action is finalized. This is, for instance, the case in automated driving,
where a car needs to, e.g., avoid hitting pedestrians and respect traffic
lights. While solutions have been proposed to tackle subsets of the driving
anticipation tasks, by making use of diverse, task-specific sensors, there is
no single dataset or framework that addresses them all in a consistent manner.
In this paper, we therefore introduce a new, large-scale dataset, called
VIENA2, covering 5 generic driving scenarios, with a total of 25 distinct
action classes. It contains more than 15K full HD, 5s long videos acquired in
various driving conditions, weathers, daytimes and environments, complemented
with a common and realistic set of sensor measurements. This amounts to more
than 2.25M frames, each annotated with an action label, corresponding to 600
samples per action class. We discuss our data acquisition strategy and the
statistics of our dataset, and benchmark state-of-the-art action anticipation
techniques, including a new multi-modal LSTM architecture with an effective
loss function for action anticipation in driving scenarios.Comment: Accepted in ACCV 201
Why (and How) Networks Should Run Themselves
The proliferation of networked devices, systems, and applications that we
depend on every day makes managing networks more important than ever. The
increasing security, availability, and performance demands of these
applications suggest that these increasingly difficult network management
problems be solved in real time, across a complex web of interacting protocols
and systems. Alas, just as the importance of network management has increased,
the network has grown so complex that it is seemingly unmanageable. In this new
era, network management requires a fundamentally new approach. Instead of
optimizations based on closed-form analysis of individual protocols, network
operators need data-driven, machine-learning-based models of end-to-end and
application performance based on high-level policy goals and a holistic view of
the underlying components. Instead of anomaly detection algorithms that operate
on offline analysis of network traces, operators need classification and
detection algorithms that can make real-time, closed-loop decisions. Networks
should learn to drive themselves. This paper explores this concept, discussing
how we might attain this ambitious goal by more closely coupling measurement
with real-time control and by relying on learning for inference and prediction
about a networked application or system, as opposed to closed-form analysis of
individual protocols
Accurator: Nichesourcing for Cultural Heritage
With more and more cultural heritage data being published online, their
usefulness in this open context depends on the quality and diversity of
descriptive metadata for collection objects. In many cases, existing metadata
is not adequate for a variety of retrieval and research tasks and more specific
annotations are necessary. However, eliciting such annotations is a challenge
since it often requires domain-specific knowledge. Where crowdsourcing can be
successfully used for eliciting simple annotations, identifying people with the
required expertise might prove troublesome for tasks requiring more complex or
domain-specific knowledge. Nichesourcing addresses this problem, by tapping
into the expert knowledge available in niche communities. This paper presents
Accurator, a methodology for conducting nichesourcing campaigns for cultural
heritage institutions, by addressing communities, organizing events and
tailoring a web-based annotation tool to a domain of choice. The contribution
of this paper is threefold: 1) a nichesourcing methodology, 2) an annotation
tool for experts and 3) validation of the methodology and tool in three case
studies. The three domains of the case studies are birds on art, bible prints
and fashion images. We compare the quality and quantity of obtained annotations
in the three case studies, showing that the nichesourcing methodology in
combination with the image annotation tool can be used to collect high quality
annotations in a variety of domains and annotation tasks. A user evaluation
indicates the tool is suited and usable for domain specific annotation tasks
- …