CC: Grounding Human-Centred Ai on Embodied Multimodal Interaction