Thư viện Đại học Duy Tân, Đà Nẵng, Việt Nam

CSDL Bài trích Báo - Tạp chí

Hiển thị Marc

EVJVQA challenge: multilingual visual question answering

Tác giả: Ngan Luu-Thuy Nguyen, Nghia Hieu Nguyen, Duong T.D. Vo, Khanh Quoc Tran, Kiet Van Nguyen

Số trang: P. 237-258

Số phát hành: Tập 39 - Số 3

Kiểu tài liệu: Tạp chí trong nước

Nơi lưu trữ: 03 Quang Trung

Mã phân loại: 005

Ngôn ngữ: English

Từ khóa: Computer science, visual question answering, vision-language understanding, multiModal learning, information fusion, transformer model

Chủ đề: Computer science

Tóm tắt:

In this article, we present details of the organization of the challenge, an overview of the methods employed by shared-task participants, and the results. The highest performances are 0.4392 in F1-score and 0.4009 in BLUE on the private test set. The multilingual QA systems proposed by the top 2 teams use ViT for the pre-trained vision model and mT5 for the pre-trained language model, a powerful pre-trained language model based on the transformer architecture. EVJVQA is a challenging dataset that motivates NLP and CV researchers to further explore the multilingual models or systems for visual question answering systems.

Tạp chí liên quan

Bài báo Giảng viên DTU

Thư mục chuyên đề

CSDL Bài trích Báo - Tạp chí

EVJVQA challenge: multilingual visual question answering

Tóm tắt: