Document Details
Clip:
Under review as a conference paper at ICLR 2026 R V Jie Huang 1,2,*, Xuejing Liu 1,* Sibo Song 1 Ruibing Hou 2,† Hong Chang 2 Junyang Lin 1 Shuai Bai
Filename:
2510.23095v1.pdf
Filetype:
application/pdf
Size:
5493452 bytes
Uploaded On:
2025-10-29
Abstract:
Summary:
Tags:
Notes:
Visible:
1
Status:
Parsed
Author:
Jie Huang; Xuejing Liu; Sibo Song; Ruibing Hou; Hong Chang; Junyang Lin; Shuai Bai
Creator:
arXiv GenPDF (tex2pdf:e76afa9)
DOI:
https://doi.org/10.48550/arXiv.2510.23095
License:
http://creativecommons.org/licenses/by-nc-sa/4.0/
PTEX.Fullbanner:
This is pdfTeX, Version 3.141592653-2.6-1.40.28 (TeX Live 2025) kpathsea version 6.4.1
Producer:
pikepdf 8.15.1
Title:
Revisiting Multimodal Positional Encoding in Vision-Language Models
Trapped:
False
ArXivID:
https://arxiv.org/abs/2510.23095v1
Pages:
16
Return to Document Library