RoPERotary Positional Encoding完整数学表达式一、基本形式二维旋转对 embedding 向量x∈Rd\boldsymbol{x} \in \mathbb{R}^dx∈Rd把每两维分成一组(x2i, x2i1)(x_{2i},\,x_{2i1})(x2i​,x2i1​)对第pos\text{pos}pos个位置[x2i′x2i1′][cos⁡θi,pos−sin⁡θi,possin⁡θi,poscos⁡θi,pos][x2ix2i1] \begin{bmatrix} x_{2i} \\ x_{2i1} \end{bmatrix} \begin{bmatrix} \cos\theta_{i,\text{pos}} -\sin\theta_{i,\text{pos}} \\ \sin\theta_{i,\text{pos}} \cos\theta_{i,\text{pos}} \end{bmatrix} \begin{bmatrix} x_{2i} \\ x_{2i1} \end{bmatrix}[x2i′​x2i1′​​][cosθi,pos​sinθi,pos​​−sinθi,pos​cosθi,pos​​][x2i​x2i1​​]二、角度定义频率设计θi,pospos⋅ωi \theta_{i,\text{pos}} \text{pos} \cdot \omega_iθi,pos​pos⋅ωi​其中ωi1100002i/d \omega_i \frac{1}{10000^{2i/d}}ωi​100002i/d1​因此θi,pospos100002i/d \theta_{i,\text{pos}} \frac{\text{pos}}{10000^{2i/d}}θi,pos​100002i/dpos​三、展开写法工程常用x2i′x2icos⁡θi,pos−x2i1sin⁡θi,posx2i1′x2isin⁡θi,posx2i1cos⁡θi,pos \begin{aligned} x_{2i} x_{2i}\cos\theta_{i,\text{pos}} - x_{2i1}\sin\theta_{i,\text{pos}} \\ x_{2i1} x_{2i}\sin\theta_{i,\text{pos}} x_{2i1}\cos\theta_{i,\text{pos}} \end{aligned}x2i′​x2i1′​​x2i​cosθi,pos​−x2i1​sinθi,pos​x2i​sinθi,pos​x2i1​cosθi,pos​​四、作用在 Q / K 上RoPE 实际是作用在Q′RoPE(Q,pos)K′RoPE(K,pos) \begin{aligned} Q \text{RoPE}(Q,\text{pos}) \\ K \text{RoPE}(K,\text{pos}) \end{aligned}Q′K′​RoPE(Q,pos)RoPE(K,pos)​Attention 计算AttnQ′(K′)T \text{Attn} Q(K)^TAttnQ′(K′)T五、核心性质旋转满足⟨Rθiq, Rθjk⟩⟨q, Rθj−θik⟩ \langle R_{\theta_i}q,\;R_{\theta_j}k\rangle \langle q,\;R_{\theta_j-\theta_i}k\rangle⟨Rθi​​q,Rθj​​k⟩⟨q,Rθj​−θi​​k⟩结论attention 只依赖位置差posj−posi\text{pos}_j - \text{pos}_iposj​−posi​。六、复数形式更本质把每两维写成复数zix2iix2i1 z_i x_{2i} i x_{2i1}zi​x2i​ix2i1​RoPEzi′zi⋅eiθi,pos z_i z_i \cdot e^{i\theta_{i,\text{pos}}}zi′​zi​⋅eiθi,pos​Attention 内积zi(posi)⋅zj(posj)‾zizj‾⋅ei(θi,posi−θi,posj) z_i(\text{pos}_i) \cdot \overline{z_j(\text{pos}_j)} z_i \overline{z_j} \cdot e^{i(\theta_{i,\text{pos}_i}-\theta_{i,\text{pos}_j})}zi​(posi​)⋅zj​(posj​)​zi​zj​​⋅ei(θi,posi​​−θi,posj​​)七、向量整体写法RoPE(x,pos)x⊙cos⁡(Θpos)rotate(x)⊙sin⁡(Θpos) \text{RoPE}(x,\text{pos}) x \odot \cos(\Theta_{\text{pos}}) \text{rotate}(x) \odot \sin(\Theta_{\text{pos}})RoPE(x,pos)x⊙cos(Θpos​)rotate(x)⊙sin(Θpos​)其中rotate(x)\text{rotate}(x)rotate(x)把每对(x2i,x2i1)(x_{2i},x_{2i1})(x2i​,x2i1​)变成(−x2i1,x2i)(-x_{2i1},x_{2i})(−x2i1​,x2i​)。八、工程实现PyTorchdefrope(x,sin,cos):x1x[...,::2]x2x[...,1::2]returntorch.cat([x1*cos-x2*sin,x1*sinx2*cos],dim-1)