The advantages of the ears as a means of identification over other biometric modalities provided an avenue for researchers to conduct biometric recognition studies on state-of-the-art computing methods. This paper presented a deep learning pipeline for unconstrained ear recognition using a Transformer neural network: Vision Transformer (ViT) and Data-efficient image Transformers (DeiT). The ViT-Ear and DeiT-Ear models of this study achieved recognition accuracy comparable or more significant than the results of state-of-the-art CNN-based methods and other deep learning algorithms. This study also determined that the performance of Vision Transformer and Data-efficient image Transformer models work better than ResNets without using exhaustive data augmentation processes. Moreover, this study observed that the performance of ViT-Ear is nearly similar to other ViT-based biometric studies.
Key words: Deep Learning, Neural Network, Transformers, Vision Transformer, Data-efficient image Transformers, Ear Recognition
|