The below videos show the AERA/S1 system learn to conduct (simplified) TV interview via goal-level learning, through observation and imitation. The agent S1 infers goals of the two agents via observation of their behavior, creates models intended to predict how they achieve their goals, and then tests these models via (internal) simulation – that is, given some new observed behavior, uses the models to predict ahead of time what the agents will do in continuation, with the hypothesized goal achievement as the termination event for each model. To learn complex tasks from observation using such goal-directed models, hierarchies of such models must be constructed by the system.
The behavior of the two persons is tracked in realtime with specialized sensors and used to re-create, in realtime, the interaction between the two people in a virtual environment. The two people interact as they would in a video conference call, but instead of a video image of the other person, each sees the other's avatar on their screen. While somewhat simplified compared a real human-human interview, the dialogue contains all the key organizing principles and observable behaviors are present in realworld interaction.
In this first evaluation of the system, the full system, including the AERA-based mind of S1, was run on a 6-core desktop machine. Given 20 hours of observation-based learning of human-human dialogue, S1 can take the role of either interviewer or interviewee and continue the dialogue exactly as before. Apart from the fact that S1 has a synthesized voice, evaluations of S1's performance after these 20 hours show virtually zero difference between its performance and the performance of the humans. These videos were produced immediately after the system observed these 20 hours of human-human interaction, without any extra computation time, and the interaction shown in the videos proceeded in realtime, with S1 interacting in realtime with a human. Both roles of interviewer and interviewee can immediately be assumed by S1, with the other role assumed by a human. No prior technology exists that can demonstrate the task performed by our AI in this task.
This is the input to the AERA agent S1 when learning psycho-social dialogue skills
Human-human interaction, as observed by the S1 AERA agent. Two humans, Kris and Eric, interact in a virtual environment. Their behavior is being tracked in realtime by sensors, they speak to each other via microphones. S1 observes their gesture and speech, via off-the-shelf speech recognition software and prosody tracking. After observing for sufficiently long, S1 can overtake either avatar and carry on with the interview in precisely the same fasion (see videos MH.no_interrupt.mp4, HM.no_interrupt.mp4, HM.interrupt.mp4 - in the "interrupt" scenario S1 has learned to use interruption as a method to keep the interview from going over a pre-defined time limit).
This is a complete list of what is in the seed (initial code) given to the system as it starts to observe the human-human interaction:
This information is encapsulated in the system as Replicode programs. Then S1 observes about 20 hours of the type of interaction shown in the human-human video above. Observation is done by monitoring the event streams produced by the two avatars and the world they interact in, that is, timed productions of changes in word output (via speech recognition and prosody tracking) and geometric changes in orientation and positions of named objects.
*Due to frequent errors of commission from the speech recognizer (that is, the recognizer outputting words that were actually not uttered by the users), the set of accepted words served as a filter to weed these out. This filtering was done outside of the S1 agent so these words were in fact not used by the AERA-based S1 agent in any way as part of the seed.
Here S1 has learned a number of basic psycho-social dialogue skills from observation
After having observed two humans interact in a simulated TV interview for some time, the AERA agent S1 takes the role of interviewer, continuing the interview in precisely the same fashion as before, asking questions of the human interviewee (see videos HH.no_interrupt.mp4 and HH.no_interrupt.mp4 for the human-human interaction that S1 observed; see MH.no_interrupt_mp4 and HM_interrupt_mp4 for other examples of the skills that S1 has acquired by observation). In the "interrupt" scenario (MH_interrupt.mp4) S1 has learned to use interruption as a method to keep the interview from going over the allowed time limit.
After 20 hours of watching two humans in a simulated TV interview like the one above, S1 has learned the following via goal-level imitation:
This was the input to the AERA agent S1 when learning psycho-social dialogue skills
After having observed two humans interact in a simulated TV interview for some time, the AERA agent S1 takes the role of interviewee, continuing the interview in precisely the same fasion as before, answering the questions of the human interviewer (see videos HH.no_interrupt.mp4 and HH.no_interrupt.mp4 for the human-human interaction that S1 observed; see HM.no_interrupt_mp4 and HM_interrupt_mp4 for other examples of the skills that S1 has acquired by observation). In the "interrupt" scenario S1 has learned to use interruption as a method to keep the interview from going over a pre-defined time limit.
Two humans interacting via a virtual world in realtime; this interaction
provides the AERA agent S1 with an example of how to use interruption to
move the interview forward for meeting deadlines
This video clip shows two humans interacting, and the interviewer use interruption to keep the interview below a pre-defined time limit of 4 minutes.
In this video S1 has learned to use interruption to move the dialogue
along so that the interview can finish on time
Having observed a human interviewer use interruption to move the interview forward, S1 takes the role of interviewer and demonstrates the acquisition of this skill by interrupting the human interviewee to meet pre-defined time-limits for the interview. (See videos HH.interrupt.mp4 to see what S1 observed to learn this technique; see HH.no_interrupt.mp4 for the general human-human interaction that S1 learned interview skills from; see HM.no_interrupt_mp4 and HM_interrupt_mp4 for other examples of the skills that S1 has acquired automatically by observation).