BiKAN-ViT: Enhancing vision transformers via spline-based patch embedding and nonlinear token mixing


Kavalcı Yılmaz E., ADEM K., Zontul M.

Applied Soft Computing, cilt.199, 2026 (SCI-Expanded, Scopus) identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 199
  • Basım Tarihi: 2026
  • Doi Numarası: 10.1016/j.asoc.2026.115297
  • Dergi Adı: Applied Soft Computing
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Compendex, INSPEC
  • Anahtar Kelimeler: Explainable AI, Kolmogorov–arnold networks, Learnable splines, Nonlinear token mixing, Patch embedding, Vision transformer
  • Sivas Cumhuriyet Üniversitesi Adresli: Evet

Özet

Visual Transformers (ViTs) effectively capture global contextual relationships through a self-attention mechanism. However, linear patch embedding and MLP-based feedforward networks can limit nonlinear modeling capabilities and increase computational costs. To address these limitations, this work introduces BiKAN-ViT (Bi-level KAN integration: Patch + MLP), a Visual Transformer that integrates Kolmogorov-Arnold Networks (KANs) into both the patch embedding phase and feedforward blocks. By replacing linear projections and fixed activations with learnable spline-based univariate functions, the proposed architecture enables more expressive, locally adaptable, and interpretable feature transformations. Experimental results show that BiKAN-ViT outperforms basic ViTs in terms of classification performance, achieving lower FLOPs and faster training times. Ablation analyses highlight the critical contribution of KAN-based patch embedding. Overall, BiKAN-ViT offers an efficient and explainable alternative to traditional Visual Transformer architectures.