Open Access
Open access

Qwen2.5-VL Technical Report

Тип публикацииPosted Content
Дата публикации2025-02-19
SJR
CiteScore
Impact factor
ISSN23318422
Computer Vision and Pattern Recognition
Computation and Language
Краткое описание
We introduce Qwen2.5-VL, the latest flagship model of Qwen vision-language series, which demonstrates significant advancements in both foundational capabilities and innovative functionalities. Qwen2.5-VL achieves a major leap forward in understanding and interacting with the world through enhanced visual recognition, precise object localization, robust document parsing, and long-video comprehension. A standout feature of Qwen2.5-VL is its ability to localize objects using bounding boxes or points accurately. It provides robust structured data extraction from invoices, forms, and tables, as well as detailed analysis of charts, diagrams, and layouts. To handle complex inputs, Qwen2.5-VL introduces dynamic resolution processing and absolute time encoding, enabling it to process images of varying sizes and videos of extended durations (up to hours) with second-level event localization. This allows the model to natively perceive spatial scales and temporal dynamics without relying on traditional normalization techniques. By training a native dynamic-resolution Vision Transformer (ViT) from scratch and incorporating Window Attention, we reduce computational overhead while maintaining native resolution. As a result, Qwen2.5-VL excels not only in static image and document understanding but also as an interactive visual agent capable of reasoning, tool usage, and task execution in real-world scenarios such as operating computers and mobile devices. Qwen2.5-VL is available in three sizes, addressing diverse use cases from edge AI to high-performance computing. The flagship Qwen2.5-VL-72B model matches state-of-the-art models like GPT-4o and Claude 3.5 Sonnet, particularly excelling in document and diagram understanding. Additionally, Qwen2.5-VL maintains robust linguistic performance, preserving the core language competencies of the Qwen2.5 LLM.
Для доступа к списку цитирований публикации необходимо авторизоваться.
Для доступа к списку профилей, цитирующих публикацию, необходимо авторизоваться.

Топ-30

Журналы

1
2
3
medRxiv
3 публикации, 10.34%
Journal of Materials Science
1 публикация, 3.45%
eLife
1 публикация, 3.45%
Frontiers in Artificial Intelligence
1 публикация, 3.45%
ACS applied materials & interfaces
1 публикация, 3.45%
Scientific data
1 публикация, 3.45%
Discover Computing
1 публикация, 3.45%
Chemical Science
1 публикация, 3.45%
Scientific Reports
1 публикация, 3.45%
bioRxiv
1 публикация, 3.45%
Frontiers in Medicine
1 публикация, 3.45%
Journal of Biomedical Informatics
1 публикация, 3.45%
Applied Computing and Geosciences
1 публикация, 3.45%
1
2
3

Издатели

1
2
3
4
5
6
7
8
9
Association for Computing Machinery (ACM)
9 публикаций, 31.03%
Institute of Electrical and Electronics Engineers (IEEE)
5 публикаций, 17.24%
openRxiv
4 публикации, 13.79%
Springer Nature
4 публикации, 13.79%
Frontiers Media S.A.
2 публикации, 6.9%
Elsevier
2 публикации, 6.9%
eLife Sciences Publications
1 публикация, 3.45%
American Chemical Society (ACS)
1 публикация, 3.45%
Royal Society of Chemistry (RSC)
1 публикация, 3.45%
1
2
3
4
5
6
7
8
9
  • Мы не учитываем публикации, у которых нет DOI.
  • Статистика публикаций обновляется еженедельно.

Вы ученый?

Создайте профиль, чтобы получать персональные рекомендации коллег, конференций и новых статей.
Метрики
98
Поделиться
Цитировать
ГОСТ |
Цитировать
Bai S. et al. Qwen2.5-VL Technical Report // ArXiv. 2025.
ГОСТ со всеми авторами (до 50) Скопировать
Bai S., Chen K., Liu X., Wang J., Ge W., Song S., DANG K., Wang P., Wang S., Tang J., Zhong H., Zhu Y., Yang M., Li Z., Wan J., Wang P., Ding W., Fu Z., Xu Y., Ye J., Zhang X., Xie T., Cheng Z., Zhang Hang, Yang Z., XU H., Lin J. Qwen2.5-VL Technical Report // ArXiv. 2025.
RIS |
Цитировать
TY - GENERIC
DO - 10.48550/ARXIV.2502.13923
UR - https://doi.org/10.48550/ARXIV.2502.13923
TI - Qwen2.5-VL Technical Report
T2 - ArXiv
AU - Bai, Shuai
AU - Chen, Keqin
AU - Liu, Xuejing
AU - Wang, Jialin
AU - Ge, Wenbin
AU - Song, Sibo
AU - DANG, KAI
AU - Wang, Peng
AU - Wang, Shijie
AU - Tang, Jun
AU - Zhong, Humen
AU - Zhu, Yuanzhi
AU - Yang, Mingkun
AU - Li, Zhaohai
AU - Wan, Jianqiang
AU - Wang, Pengfei
AU - Ding, Wei
AU - Fu, Zheren
AU - Xu, Yiheng
AU - Ye, Jiabo
AU - Zhang, Xi
AU - Xie, Tianbao
AU - Cheng, Zesen
AU - Zhang Hang
AU - Yang, Zhibo
AU - XU, HAIYANG
AU - Lin, Junyang
PY - 2025
DA - 2025/02/19
PB - Cornell University Press
SN - 2331-8422
ER -
BibTex
Цитировать
BibTex (до 50 авторов) Скопировать
@article{2025_Bai,
author = {Shuai Bai and Keqin Chen and Xuejing Liu and Jialin Wang and Wenbin Ge and Sibo Song and KAI DANG and Peng Wang and Shijie Wang and Jun Tang and Humen Zhong and Yuanzhi Zhu and Mingkun Yang and Zhaohai Li and Jianqiang Wan and Pengfei Wang and Wei Ding and Zheren Fu and Yiheng Xu and Jiabo Ye and Xi Zhang and Tianbao Xie and Zesen Cheng and Zhang Hang and Zhibo Yang and HAIYANG XU and Junyang Lin},
title = {Qwen2.5-VL Technical Report},
journal = {ArXiv},
year = {2025},
publisher = {Cornell University Press},
month = {feb},
url = {https://doi.org/10.48550/ARXIV.2502.13923},
doi = {10.48550/ARXIV.2502.13923}
}
Ошибка в публикации?