Open Access
Qwen2.5-VL Technical Report
Тип публикации: Posted Content
Дата публикации: 2025-02-19
Computer Vision and Pattern Recognition
Computation and Language
Краткое описание
We introduce Qwen2.5-VL, the latest flagship model of Qwen vision-language
series, which demonstrates significant advancements in both foundational
capabilities and innovative functionalities. Qwen2.5-VL achieves a major leap
forward in understanding and interacting with the world through enhanced visual
recognition, precise object localization, robust document parsing, and
long-video comprehension. A standout feature of Qwen2.5-VL is its ability to
localize objects using bounding boxes or points accurately. It provides robust
structured data extraction from invoices, forms, and tables, as well as
detailed analysis of charts, diagrams, and layouts. To handle complex inputs,
Qwen2.5-VL introduces dynamic resolution processing and absolute time encoding,
enabling it to process images of varying sizes and videos of extended durations
(up to hours) with second-level event localization. This allows the model to
natively perceive spatial scales and temporal dynamics without relying on
traditional normalization techniques. By training a native dynamic-resolution
Vision Transformer (ViT) from scratch and incorporating Window Attention, we
reduce computational overhead while maintaining native resolution. As a result,
Qwen2.5-VL excels not only in static image and document understanding but also
as an interactive visual agent capable of reasoning, tool usage, and task
execution in real-world scenarios such as operating computers and mobile
devices. Qwen2.5-VL is available in three sizes, addressing diverse use cases
from edge AI to high-performance computing. The flagship Qwen2.5-VL-72B model
matches state-of-the-art models like GPT-4o and Claude 3.5 Sonnet, particularly
excelling in document and diagram understanding. Additionally, Qwen2.5-VL
maintains robust linguistic performance, preserving the core language
competencies of the Qwen2.5 LLM.
Для доступа к списку цитирований публикации необходимо авторизоваться.
Для доступа к списку профилей, цитирующих публикацию, необходимо авторизоваться.
Топ-30
Журналы
|
1
2
3
|
|
|
medRxiv
3 публикации, 10.34%
|
|
|
Journal of Materials Science
1 публикация, 3.45%
|
|
|
eLife
1 публикация, 3.45%
|
|
|
Frontiers in Artificial Intelligence
1 публикация, 3.45%
|
|
|
ACS applied materials & interfaces
1 публикация, 3.45%
|
|
|
Scientific data
1 публикация, 3.45%
|
|
|
Discover Computing
1 публикация, 3.45%
|
|
|
Chemical Science
1 публикация, 3.45%
|
|
|
Scientific Reports
1 публикация, 3.45%
|
|
|
bioRxiv
1 публикация, 3.45%
|
|
|
Frontiers in Medicine
1 публикация, 3.45%
|
|
|
Journal of Biomedical Informatics
1 публикация, 3.45%
|
|
|
Applied Computing and Geosciences
1 публикация, 3.45%
|
|
|
1
2
3
|
Издатели
|
1
2
3
4
5
6
7
8
9
|
|
|
Association for Computing Machinery (ACM)
9 публикаций, 31.03%
|
|
|
Institute of Electrical and Electronics Engineers (IEEE)
5 публикаций, 17.24%
|
|
|
openRxiv
4 публикации, 13.79%
|
|
|
Springer Nature
4 публикации, 13.79%
|
|
|
Frontiers Media S.A.
2 публикации, 6.9%
|
|
|
Elsevier
2 публикации, 6.9%
|
|
|
eLife Sciences Publications
1 публикация, 3.45%
|
|
|
American Chemical Society (ACS)
1 публикация, 3.45%
|
|
|
Royal Society of Chemistry (RSC)
1 публикация, 3.45%
|
|
|
1
2
3
4
5
6
7
8
9
|
- Мы не учитываем публикации, у которых нет DOI.
- Статистика публикаций обновляется еженедельно.
Вы ученый?
Создайте профиль, чтобы получать персональные рекомендации коллег, конференций и новых статей.
Метрики
98
Всего цитирований:
98
Цитирований c 2025:
29
(100%)
Цитировать
ГОСТ |
RIS |
BibTex
Цитировать
ГОСТ
Скопировать
Bai S. et al. Qwen2.5-VL Technical Report // ArXiv. 2025.
ГОСТ со всеми авторами (до 50)
Скопировать
Bai S., Chen K., Liu X., Wang J., Ge W., Song S., DANG K., Wang P., Wang S., Tang J., Zhong H., Zhu Y., Yang M., Li Z., Wan J., Wang P., Ding W., Fu Z., Xu Y., Ye J., Zhang X., Xie T., Cheng Z., Zhang Hang, Yang Z., XU H., Lin J. Qwen2.5-VL Technical Report // ArXiv. 2025.
Цитировать
RIS
Скопировать
TY - GENERIC
DO - 10.48550/ARXIV.2502.13923
UR - https://doi.org/10.48550/ARXIV.2502.13923
TI - Qwen2.5-VL Technical Report
T2 - ArXiv
AU - Bai, Shuai
AU - Chen, Keqin
AU - Liu, Xuejing
AU - Wang, Jialin
AU - Ge, Wenbin
AU - Song, Sibo
AU - DANG, KAI
AU - Wang, Peng
AU - Wang, Shijie
AU - Tang, Jun
AU - Zhong, Humen
AU - Zhu, Yuanzhi
AU - Yang, Mingkun
AU - Li, Zhaohai
AU - Wan, Jianqiang
AU - Wang, Pengfei
AU - Ding, Wei
AU - Fu, Zheren
AU - Xu, Yiheng
AU - Ye, Jiabo
AU - Zhang, Xi
AU - Xie, Tianbao
AU - Cheng, Zesen
AU - Zhang Hang
AU - Yang, Zhibo
AU - XU, HAIYANG
AU - Lin, Junyang
PY - 2025
DA - 2025/02/19
PB - Cornell University Press
SN - 2331-8422
ER -
Цитировать
BibTex (до 50 авторов)
Скопировать
@article{2025_Bai,
author = {Shuai Bai and Keqin Chen and Xuejing Liu and Jialin Wang and Wenbin Ge and Sibo Song and KAI DANG and Peng Wang and Shijie Wang and Jun Tang and Humen Zhong and Yuanzhi Zhu and Mingkun Yang and Zhaohai Li and Jianqiang Wan and Pengfei Wang and Wei Ding and Zheren Fu and Yiheng Xu and Jiabo Ye and Xi Zhang and Tianbao Xie and Zesen Cheng and Zhang Hang and Zhibo Yang and HAIYANG XU and Junyang Lin},
title = {Qwen2.5-VL Technical Report},
journal = {ArXiv},
year = {2025},
publisher = {Cornell University Press},
month = {feb},
url = {https://doi.org/10.48550/ARXIV.2502.13923},
doi = {10.48550/ARXIV.2502.13923}
}
Ошибка в публикации?