3 research outputs found
Recommended from our members
Enabling Structured Navigation of Longform Spoken Dialog with Automatic Summarization
Longform spoken dialog is a rich source of information that is present in all facets of everyday life, taking the form of podcasts, debates, and interviews; these mediums contain important topics ranging from healthcare and diversity to current events, economics and politics. Individuals need to digest informative content to know how to vote, decide how to stay safe from COVID-19, and how to increase diversity in the workplace.
Unfortunately compared to text, spoken dialog can be challenging to consume as it is slower than reading and difficult to skim or navigate. Although an individual may be interested in a given topic, they may be unwilling to commit the required time necessary to consume long form auditory media given the uncertainty as to whether such content will live up to their expectations. Clearly, there exists a need to provide access to the information spoken dialog provides in a manner through which individuals can quickly and intuitively access areas of interest without investing large amounts of time.
From Human Computer Interaction, we apply the idea of information foraging, which theorizes how people browse and navigate to satisfy an information need, to the longform spoken dialog domain. Information foraging states that people do not browse linearly. Rather people “forage” for information similar to how animals sniff around for food, scanning from area to area, constantly deciding whether to keep investigating their current area or to move on to greener pastures. This is an instance of the classic breadth vs. depth dilemma. People rely on perceived structure and information cues to make these decisions. Unfortunately speech, either spoken or transcribed, is unstructured and lacks information cues, making it difficult for users to browse and navigate.
We create a longform spoken dialog browsing system that utilizes automatic summarization and speech modeling to structure longform dialog to present information in a manner that is both intuitive and flexible towards different user browsing needs. Leveraging summarization models to automatically and hierarchically structure spoken dialog, the system is able to distill information into increasingly salient and abstract summaries, allowing for a tiered representation that, if interested, users can progressively explore. Additionally, we address spoken dialog’s own set of technical challenges to speech modeling that are not present in written text, such as disfluencies, improper punctuation, lack of annotated speech data, and inherent lack of structure.
We create a longform spoken dialog browsing system that utilizes automatic summarization and speech modeling to structure longform dialog to present information in a manner that is both intuitive and flexible towards different user browsing needs. Leveraging summarization models to automatically and hierarchically structure spoken dialog, the system is able to distill information into increasingly salient and abstract summaries, allowing for a tiered representation that, if interested, users can progressively explore. Additionally, we address spoken dialog’s own set of technical challenges to speech modeling that are not present in written text, such as disfluencies, improper punctuation, lack of annotated speech data, and inherent lack of structure. Since summarization is a lossy compression of information, the system provides users with information cues to signal how much additional information is contained on a topic.
This thesis makes the following contributions:
1. We applied the HCI concept of information foraging to longform speech, enabling people to browse and navigate information in podcasts, interviews, panels, and meetings.
2. We created a system that structures longform dialog into hierarchical summaries which help users to 1) skim (browse) audio and 2) navigate and drill down into interesting sections to read full details.
3. We created a human annotated hierarchical dataset to quantitatively evaluate the effectiveness of our system’s hierarchical text generation performance.
4. Lastly, we developed a suite of dialog oriented processing optimizations to improve the user experience of summaries: enhanced readability and fluency of short summaries through better topic chunking and pronoun imputation, and reliable indication of semantic coverage within short summaries to help direct navigation towards interesting information.
We discuss future research in extending the browsing and navigating system to more challenging domains such as lectures, which contain many external references, or workplace conversations, which contain uncontextualized background information and are far less structured than podcasts and interviews
Design Research For Personal Information Management Systems To Support Undergraduate Students
This dissertation investigated the personal information management (PIM) behaviors and practices of undergraduate college students during a four month academic semester period. Qualitative data on the day-to-day PIM practices for 15 students enrolled in an honors biology class were collected through in-depth observations and interviews. Four students experimented with MyLifeBits--a next-generation PIM system developed at Microsoft Research. A participatory design session involving six students explored and identified new directions for PIM design. Analysis of the field data revealed that students engage regularly in project management activities, and their work is often highly collaborative. Students were observed to have difficulty with core PIM activities, such as managing tasks and reminders (and both PIM and technical skills vary widely among students). Students were observed to manage a diverse array of information formats, applications, and media, which are rarely integrated. Gaps in understanding and awareness among students and instructors were also noted. MyLifeBits was found to be intuitive and effective for visual browsing and refinding, although specific elements of the MyLifeBits user interface could likely be improved to support efficient task completion. The MyLifeBits system includes annotation, collection building, and other features that may support new approaches for making order and stimulating reflection. Observations of student usage suggested further design modifications to improve these features and supporting user interfaces. Implications for future research and design include: Incorporating social awareness and communication into PIM systems to help reduce gaps in understanding and facilitate reflection; integrating collaboration technologies into PIM systems to support students' highly collaborative work practices; providing tools to stimulate reflection (e.g., personal analytics) and create reflective artifacts (e.g., journals, multimedia scrapbooks); shifting the focus of design to outcomes (such as, "getting my assignment done on time, and in the way the teacher expects") that PIM supports rather than the PIM process itself; and developing ways to scaffold students' learning of PIM skills, such as metadata creation, project analysis and management, collaboration, and reflection