Multimodal Input for Wearable Augmented Reality

.: 3D Multimodal Interaction in Augmented and Virtual Reality :.

This project has been developed in cooperation with Phil Cohen and the Center for Human-Ccomputer Communication at Oregon Health & Science University.

Columbia Team Memebers:

Project Description:

This project focuses on exploring multimodal interaction in immersive environments, particularly on the problem of target disambiguation while selecting an object in 3D. We have created an interactive 3D environment as our test bed and have used it in a variety of augmented reality (AR) and virtual reality (VR) scenarios.

Often in 3D immersive environments the user is faced with many selection problems, such as imprecise pointing at a distance, selection of occluded/hidden objects, and recognition errors (e.g., speech recognition errors). Our goal is to reduce selection errors by considering multiple input sources and trying to compensate for errors in some of these sources through the results of others. For example, if one object (e.g., a "chair") is occluded by another object (e.g., a "desk"), simple ray-based selection will fail to select the "chair" since it cannot be seen. However, if one is able to specify for example that the object of interest is a "chair," and it is "behind the desk," then speech can help disambiguate the pointing gesture and yield the correct result.

In a multimodal environment, each input source (e.g., spoken language) has its associated uncertainties. In our system, these are represented as the n-best lists for each associated input source, as well as corresponding probabilities representing the actual prediction certainty. In addition to spoken language, the sources in our system include 3D gestures and a set of visibility and spatiality perceptors that use our SenseShapes approach.

Our multimodal system fuses symbolic and statistical information from these sources and employs mutual disambiguation of these modalities to improve the decision-making process. Thus, it is possible (and probable) that the top choice of each recognizer will not always be the selected one, and the the choice that provides a best fit across all available inputs will more likely be selected. User studies conducted with the system demonstrate that such mutual disambiguation corrections account for over 45% of the successful 3D multimodal interpretations.

Screenshots:

Screenshoots of our system in AR (01/21/03)


"Paint that chair green."

Several simple scenarios in our system in VR (10/25/02):


"Move that couch here."

"Rotate the couch."		"Paint that table blue"

Publications:

Kaiser, E., Olwal, A., McGee, D., Benko, H., Corradini, A., Li, X., Feiner, S., Cohen, P. Mutual Dissambiguation of 3D Multimodal Interaction in Augmented and Virtual Reality. In Proceedings of The Fifth International Conference on Multimodal Interfaces (ICMI 2003). Vancouver, BC. Canada. November 5–7, 2003. p. 12–19. [copyright]

Downloads:

Mutual Dissambiguation of 3D Multimodal Interaction in Augmented and Virtual Reality Video (DivX) - avi (20MB) (download DivX)