We address skill development as a fundamental architectural epiphenomenon: Learning happens through unique architectural constructs specified by developers, coupled with the ability of the architecture to automatically reconfigure itself to accommodate new skills through observation. While we initially target socio-communicative skills, we intend to develop our architecture and imitation learning process in a generic way, so that the principles developed can be applied to other equally complicated tasks. Within the framework of this project the learning will be supervised; however, our long-term goal is to provide humanoid agents and robots with full autonomy for learning such multimodal skills in dynamic social situations.
As an appropriate and challenging demonstration we will have the resulting system control a virtual humanoid television host, capable of taking interviews with users and conducting a 30-minute TV program. By interacting with humans, via their advanced multimodal avatars, the TV-host will acquire increasingly complex socio-communicative skills. By the end of the project the host will have approximated the socio-communicative skills of an average human television show host.
We have three main objectives, all of which derive directly from the above. They are to build an auto-reconfiguring architecture, behavior observation mechanisms and behavior generation and coordination mechanisms.
The ability to learn complex new skills on top of old ones brings new requirements on the underlying architecture: it must be able to reorganize its internal agency while displaying robustness and resilience, and this constitutes our first scientific objective:
A. To build a framework for developing an auto-reconfiguring architecture.
As the architecture grows in complexity in this way – on its path towards reaching human-like capabilities – its relatively large size begins to significantly challenge the engineering of the system. Therefore, scale factors into the overall engineering approach that we must take. We target large-scale, modular architectures able to reassemble their components in light of new operational conditions, that is, to compute new assembly schemes and system configurations, given (1) the specification of the available modules and (2) the specification of the new skills and behaviors.
Achieving this objective will result in the implementation of a highly reconfigurable architecture, and this builds on three sub-objectives:
Many human skills, including socio-communicative ones, are highly complex, with many degrees of freedom. Imitation learning is potentially an extremely efficient way to acquire such skills since providing a target specification for behaviors in all circumstances would be extremely tedious (or impossible) to program manually. So computing such target specifications automatically is eventually needed, if the system is supposed to imitate.
In our approach, imitation learning is a two-step process, (1) observing a behavior and extracting a specification for it and (2) according to this specification, implementing the behavior and trying it out, or storing it for future use. Accordingly, our second objective is:
B. To build a system that can auto-generate specifications for skills and behaviors based on their observation. This is broken down into three sub-objectives:
Behavior Generation and Coordination Mechanisms
Lastly, generating implementations for the observed skills is the second half of imitation learning, and this is our third objective:
C. To build behavior generation and coordination mechanisms for the reproduction and reuse of observed skills. From the specifications of target skills and behaviors we will develop processes that can build, integrate and test new system configurations as implementations of the desired skills. The testing is done by actually trying them out and evaluating the result. This leads to three sub-objectives: