BiKAN-ViT: Enhancing vision transformers via spline-based patch embedding and nonlinear token mixing

Kavalcı Yılmaz, Esra; ADEM, KEMAL; Zontul, Metin

doi:10.1016/j.asoc.2026.115297

BiKAN-ViT: Enhancing vision transformers via spline-based patch embedding and nonlinear token mixing

Kavalcı Yılmaz E., ADEM K., Zontul M.

Applied Soft Computing, cilt.199, 2026 (SCI-Expanded, Scopus)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 199
Basım Tarihi: 2026
Doi Numarası: 10.1016/j.asoc.2026.115297
Dergi Adı: Applied Soft Computing
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Compendex, INSPEC
Anahtar Kelimeler: Explainable AI, Kolmogorov–arnold networks, Learnable splines, Nonlinear token mixing, Patch embedding, Vision transformer
Sivas Cumhuriyet Üniversitesi Adresli: Evet

Özet

Visual Transformers (ViTs) effectively capture global contextual relationships through a self-attention mechanism. However, linear patch embedding and MLP-based feedforward networks can limit nonlinear modeling capabilities and increase computational costs. To address these limitations, this work introduces BiKAN-ViT (Bi-level KAN integration: Patch + MLP), a Visual Transformer that integrates Kolmogorov-Arnold Networks (KANs) into both the patch embedding phase and feedforward blocks. By replacing linear projections and fixed activations with learnable spline-based univariate functions, the proposed architecture enables more expressive, locally adaptable, and interpretable feature transformations. Experimental results show that BiKAN-ViT outperforms basic ViTs in terms of classification performance, achieving lower FLOPs and faster training times. Ablation analyses highlight the critical contribution of KAN-based patch embedding. Overall, BiKAN-ViT offers an efficient and explainable alternative to traditional Visual Transformer architectures.