User Tools

Site Tools


public:publications:proj-videos

«– HUMANOBS.ORG front page | Project Documents Page | HUMANOBS Main Wiki Page

Humanobs header

HUMANOBS Project Videos


The below videos show the AERA/S1 system learn to conduct (simplified) TV interview via goal-level learning, through observation and imitation. The agent S1 infers goals of the two agents via observation of their behavior, creates models intended to predict how they achieve their goals, and then tests these models via (internal) simulation – that is, given some new observed behavior, uses the models to predict ahead of time what the agents will do in continuation, with the hypothesized goal achievement as the termination event for each model. To learn complex tasks from observation using such goal-directed models, hierarchies of such models must be constructed by the system.

The behavior of the two persons is tracked in realtime with specialized sensors and used to re-create, in realtime, the interaction between the two people in a virtual environment. The two people interact as they would in a video conference call, but instead of a video image of the other person, each sees the other's avatar on their screen. While somewhat simplified compared a real human-human interview, the dialogue contains all the key organizing principles and observable behaviors are present in realworld interaction.

In this first evaluation of the system, the full system, including the AERA-based mind of S1, was run on a 6-core desktop machine. Given 20 hours of observation-based learning of human-human dialogue, S1 can take the role of either interviewer or interviewee and continue the dialogue exactly as before. Apart from the fact that S1 has a synthesized voice, evaluations of S1's performance after these 20 hours show virtually zero difference between its performance and the performance of the humans. These videos were produced immediately after the system observed these 20 hours of human-human interaction, without any extra computation time, and the interaction shown in the videos proceeded in realtime, with S1 interacting in realtime with a human. Both roles of interviewer and interviewee can immediately be assumed by S1, with the other role assumed by a human. No prior technology exists that can demonstrate the task performed by our AI in this task.




Human-Human Interaction

This is the input to the AERA agent S1 when learning psycho-social dialogue skills

Human-human interaction, as observed by the S1 AERA agent. Two humans, 
Kris and Eric, interact in a virtual environment. Their behavior is 
being tracked in realtime by sensors, they speak to each other via 
microphones. S1 observes their gesture and speech, via off-the-shelf 
speech recognition software and prosody tracking. After observing for 
sufficiently long, S1 can overtake either avatar and carry on with 
the interview in precisely the same fasion (see videos MH.no_interrupt.mp4, 
HM.no_interrupt.mp4, HM.interrupt.mp4 - in the "interrupt" scenario S1 
has learned to use interruption as a method to keep the interview 
from going over a pre-defined time limit).

Watch it on YouTube

What S1 is Given at the Outset

This is a complete list of what is in the seed (initial code) given to the system as it starts to observe the human-human interaction:

  1. words* (but no grammar)
  2. actions: grab, release, point-at, look-at (defined as event types constrained by geometric relationships)
  3. stopping the interview clock ends the session
  4. objects: glass-bottle, plastic-bottle, cardboard-box, wodden-cube, newspaper, wooden-cube
  5. objects have properties (e.g. made-of)
  6. interviewee-role
  7. interviewer-role
  8. Model for interviewer
    1. top-level goal of interviewer: prompt interviewee to communicate
    2. in interruption case: an imposed interview duration time limit
  9. Models for interviewee
    1. top-level goal of interviewee: to communicate
    2. never communicate unless prompted
    3. communicate about properties of objects being asked about, for as long as there still are properties available
    4. don't communicate about properties that have already been mentioned

This information is encapsulated in the system as Replicode programs. Then S1 observes about 20 hours of the type of interaction shown in the human-human video above. Observation is done by monitoring the event streams produced by the two avatars and the world they interact in, that is, timed productions of changes in word output (via speech recognition and prosody tracking) and geometric changes in orientation and positions of named objects.

*Due to frequent errors of commission from the speech recognizer (that is, the recognizer outputting words that were actually not uttered by the users), the set of accepted words served as a filter to weed these out. This filtering was done outside of the S1 agent so these words were in fact not used by the AERA-based S1 agent in any way as part of the seed.


Human-S1 Interaction

Here S1 has learned a number of basic psycho-social dialogue skills from observation

After having observed two humans interact in a simulated TV interview 
for some time, the AERA agent S1 takes the role of interviewer, 
continuing the interview in precisely the same fashion as before, asking
questions of the human interviewee (see videos HH.no_interrupt.mp4 and 
HH.no_interrupt.mp4 for the human-human interaction that S1 observed; 
see MH.no_interrupt_mp4 and HM_interrupt_mp4 for other examples of the 
skills that S1 has acquired by observation). In the "interrupt" scenario 
(MH_interrupt.mp4) S1 has learned to use interruption as a method to 
keep the interview from going over the allowed time limit.

Watch it on YouTube

What S1 Learns by Observation and Imitation

After 20 hours of watching two humans in a simulated TV interview like the one above, S1 has learned the following via goal-level imitation:

  • GENERAL INTERVIEW PRINCIPLES
    1. word order in sentences (with no a-priori grammar)
    2. disambiguation via co-verbal deictic references
    3. role of interviewer and interviewee
    4. interview involves serialization of joint actions (a series of Qs and As by each participant)
  • MULTIMODAL COORDINATION & JOINT ACTION
    1. take turns speaking
    2. co-verbal deictic reference
      1. manipulation as deictic reference
      2. looking as deictic reference
      3. pointing as deictic reference
  • INTERVIEWER
    1. to ask a series of questions, not repeating questions about objects already addressed
    2. “thank you” stops the interview clock
    3. interruption condition: using “hold on, let's go to the next question” can be used to keep interview within time limits
  • INTERVIEWEE
    1. what to answer based on what is asked
    2. an object property is not spoken of if it is not asked for
    3. a silence from the interviewer means “go on”
    4. a nod from the interviewer means “go on”


S1-Human Interaction

This was the input to the AERA agent S1 when learning psycho-social dialogue skills

After having observed two humans interact in a simulated TV interview 
for some time, the AERA agent S1 takes the role of interviewee, continuing 
the interview in precisely the same fasion as before, answering the 
questions of the human interviewer (see videos HH.no_interrupt.mp4 and 
HH.no_interrupt.mp4 for the human-human interaction that S1 observed; 
see HM.no_interrupt_mp4 and HM_interrupt_mp4 for other examples of the 
skills that S1 has acquired by observation). In the "interrupt" scenario 
S1 has learned to use interruption as a method to keep the interview 
from going over a pre-defined time limit.

Watch it on YouTube



Human-Human Interaction with interruption example

Two humans interacting via a virtual world in realtime; this interaction
provides the AERA agent S1 with an example of how to use interruption to
move the interview forward for meeting deadlines

Watch on YouTube

This video clip shows two humans interacting, and the interviewer use interruption
to keep the interview below a pre-defined time limit of 4 minutes. 



Human-S1 Interaction with Interruption

In this video S1 has learned to use interruption to move the dialogue
along so that the interview can finish on time

Having observed a human interviewer use interruption to move the interview 
forward, S1 takes the role of interviewer and demonstrates the acquisition 
of this skill by interrupting the human interviewee to meet pre-defined 
time-limits for the interview. (See videos HH.interrupt.mp4 to see what S1 
observed to learn this technique; see HH.no_interrupt.mp4 for the general 
human-human interaction that S1 learned interview skills from; see 
HM.no_interrupt_mp4 and HM_interrupt_mp4 for other examples of the skills 
that S1 has acquired automatically by observation). 

Watch on YouTube



EOF

public/publications/proj-videos.txt · Last modified: 2015/09/21 10:32 by thorisson