Applied Soft Computing, cilt.199, 2026 (SCI-Expanded, Scopus)
Visual Transformers (ViTs) effectively capture global contextual relationships through a self-attention mechanism. However, linear patch embedding and MLP-based feedforward networks can limit nonlinear modeling capabilities and increase computational costs. To address these limitations, this work introduces BiKAN-ViT (Bi-level KAN integration: Patch + MLP), a Visual Transformer that integrates Kolmogorov-Arnold Networks (KANs) into both the patch embedding phase and feedforward blocks. By replacing linear projections and fixed activations with learnable spline-based univariate functions, the proposed architecture enables more expressive, locally adaptable, and interpretable feature transformations. Experimental results show that BiKAN-ViT outperforms basic ViTs in terms of classification performance, achieving lower FLOPs and faster training times. Ablation analyses highlight the critical contribution of KAN-based patch embedding. Overall, BiKAN-ViT offers an efficient and explainable alternative to traditional Visual Transformer architectures.