Robot assistants will play a pivotal role in addressing pressing issues of industrial labor, personalized healthcare, and household chores. For intelligent systems to be widely accepted, they must be capable of perceiving and interpreting human movements while naturally interacting with users like a human would. Together with research scientist Dr. Jouh Yeong Chew from Honda Research Institute (HRI) in Japan, assistant professor Dr. Xucong Zhang from TU Delft aims to create an intelligent system that fosters natural and seamless interaction between human users and robots, promoting the use of assistant robots in industrial and daily life settings. HRI is famous for its pioneering work, including the creation of the humanoid robot ASIMO with the ambitious goal of practical deployment by the 2030s to benefit society.

Intelligent systems for robot assistants are believed to have a significant role in industrial labor and personalized healthcare. It requires the robot to perceive and interact with users like a human would. Our goal is to create an AI system that can comprehend and model user behavior with multi-modal input.

Following the fact that we humans express ourselves with multiple modalities, such as body gestures, hand gestures, eye gaze, and facial expressions, we envision an AI system that can use all these input signals to estimate the person's attention and intention, as well as cognitive loads, such as stress and health condition. According to these estimations, the robot can perform the correct reaction accordingly with the same multiple modalities. The project includes human behavior detection, multi-modal modeling of human behavior, and application on real-world robots. Initially, we plan to develop methods and gather training data to accurately detect multiple human behavior signals individually. We will then fuse these signals using a holistic approach to jointly estimate human behavior and states, which will enable us to build a model that can understand the inner states of human(s) and emulate human behavior and empathy during interaction. Finally, we will deploy the proposed AI system to a real robot to evaluate its practicality.


We believe that our proposed system will largely increase the use case of intelligent robots in real-world applications. The developed system can greatly improve the efficiency of robots on the production line so that the robot will be capable of understanding the commands from humans easily and accurately for the flexible adjustment for complex labor works. The development of a personal assistant robot has the potential to greatly improve the quality of life for individuals with physiological, essential, and protective needs, particularly those with early dementia or chronic disease. Embedding such novel systems on different embodiments like humanoid robots will enable the robots to predict psychological issues like stress and depression and detect potentially dangerous situations such as falls and injuries. The proposed system is versatile as it can also be embedded in tabletop robots for talking therapies as treatments for psychological problems like stress and depression which are becoming a norm in the modern age. Such treatment can be realized by state-of-the-art human understanding technology and embedded conversational assistants.


Starting from October 2023, we are working on eye gaze estimation in real-world settings to detect the attention of human users as the detection of the first behavior signal. We already built the pipeline based on the diffusion model for gaze target detection with promising results. We plan to improve the performance with temporal and environmental information.