News
At present, video captioning methods based on codec frameworks often rely too much on the information of a single visual modality, which makes it difficult for the model to understand the video ...
In this letter, we propose a simple but effective visual-text contrastive learning solution that utilizes text information for MGR. In addition, instead of using handcrafted prompts for visual-text ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results