Abstract
Subject of study. Two classes of dynamically configurable computer vision systems trained using reinforcement learning algorithms were considered. The first class of models comprises models of visual attention that recognize images by successively viewing their fragments. The second class of models comprises least action classifiers that analyze images indirectly by successively calling pretrained convolutional neural networks. Aim of study. This study investigated the possibility of adding actions to the system for termination of computations so that the models spend more resources on analysis of complex images than on analysis of simpler images. Method. A stop network for termination of computations that receives a hidden state vector of the system at its input and returns a signal to stop or continue computations was added to the investigated architectures. Three-stage curriculum training of the individual network modules was used, and the obtained strategies of image viewing and classifier selection were analyzed. Main results. The proposed model of visual attention with dynamic termination of computations significantly surpassed the existing solutions in terms of accuracy in the recognition of images in the MNIST database and average number of image fragments intelligible to the agent. The importance of curriculum learning was demonstrated. The agent’s use of a similar attention control strategy for different images with adaptations to specific images was demonstrated. A similar effect was observed for a common model of visual attention trained using ImageNet. The dynamic termination of computation for least action classifiers also reduced the average number of actions required for image analysis at a specified recognition accuracy. However, the increase in effectiveness in this case was less prominent. Practical significance. The methods of visual attention developed in this study can be advantageous for designing optoelectronic systems with intelligent control of a camera with a narrow-field lens for target recognition. The technology used in the least action classifiers can be applied to reduce computations in solutions obtained by the Bagging algorithm that averages several models.
© 2022 Optica Publishing Group
PDF Article
More Like This
Cited By
You do not have subscription access to this journal. Cited by links are available to subscribers only. You may subscribe either as an Optica member, or as an authorized user of your institution.
Contact your librarian or system administrator
or
Login to access Optica Member Subscription