pdf |
bib |
Front matter |
pages |
pdf |
bib |
What is Multimodality? Letitia Parcalabescu, Nils Trost and Anette Frank |
pp. 1‑10 |
pdf |
bib |
Are Gestures Worth a Thousand Words? An Analysis of Interviews in the Political Domain Daniela Trotta and Sara Tonelli |
pp. 11‑20 |
pdf |
bib |
Requesting clarifications with speech and gestures Jonathan Ginzburg and Andy Luecking |
pp. 21‑31 |
pdf |
bib |
Seeing past words: Testing the cross-modal capabilities of pretrained V&L models on counting tasks Letitia Parcalabescu, Albert Gatt, Anette Frank and Iacer Calixto |
pp. 32‑44 |
pdf |
bib |
How Vision Affects Language: Comparing Masked Self-Attention in Uni-Modal and Multi-Modal Transformer Nikolai Ilinykh and Simon Dobnik |
pp. 45‑55 |
pdf |
bib |
EMISSOR: A platform for capturing multimodal interactions as Episodic Memories and Interpretations with Situated Scenario-based Ontological References Selene Baez Santamaria, Thomas Baier, Taewoon Kim, Lea Krause, Jaap Kruijt and Piek Vossen |
pp. 56‑77 |
pdf |
bib |
Annotating anaphoric phenomena in situated dialogue Sharid LoƔiciga, Simon Dobnik and David Schlangen |
pp. 78‑88 |
pdf |
bib |
Incremental Unit Networks for Multimodal, Fine-grained Information State Representation Casey Kennington and David Schlangen |
pp. 89‑94 |
pdf |
bib |
Teaching Arm and Head Gestures to a Humanoid Robot through Interactive Demonstration and Spoken Instruction Michael Brady and Han Du |
pp. 95‑101 |
pdf |
bib |
Building a Video-and-Language Dataset with Human Actions for Multimodal Logical Inference Riko Suzuki, Hitomi Yanaka, Koji Mineshima and Daisuke Bekki |
pp. 102‑107 |
Last modified on June 3, 2021, 9:30 a.m.