Expand this Topic clickable element to expand a topic
Skip to content
Optica Publishing Group

Training a dynamically configurable classifier with deep Q-learning

Not Accessible

Your library or personal account may give you access

Abstract

Subject of study. We studied dynamic networks capable of performing calculations from input data. Aim. We studied whether deep Q-learning can be used for the construction of dynamic computer vision networks. Methods. In modern dynamically configurable systems, image analysis is typically performed using a policy gradient algorithm. We propose a method for hybrid Q-learning by an image classification agent taking into account limitations on available computer resources. We train the agent to recognize images using a set of pretrained classifiers, and the resulting dynamically configurable system is capable of constructing a computational graph that takes into account the limitations on the number of operations with a trajectory that corresponds to the maximum expected accuracy. The agent only receives an award when the image is correctly recognized within a limit on the number of actions that can be taken by the agent. Experiments were performed using the CIFAR-10 image database and a set of six external classifiers that the agent was trained to control. The experiments performed showed that the standard deep learning method using action values (Deep Q-Network) does not permit the agent to learn strategies that are better than random ones in terms of recognition accuracy. We therefore propose a Q-least-action classifier that approximates the desired classifier selection function by reinforcement learning and the label prediction function by supervised learning. Main results. The trained agent exceeded the recognition accuracy of random strategies (reduces the error by 9.65%). We show that such an agent can make explicit use of information from several classifiers since the accuracy increases when the number of permitted actions increases. Practical significance. Our research shows that the deep Q-learning method is capable of extracting information from sparse responses by classifiers as well as a least-action classifier trained by the policy-gradient method. In addition, the method proposed herein did not require the development of special loss functions.

© 2022 Optica Publishing Group

PDF Article
More Like This
Efficient automated high dynamic range 3D measurement via deep reinforcement learning

Pan Zhang, Kai Zhong, Zhongwei Li, and Yusheng Shi
Opt. Express 32(4) 4857-4875 (2024)

Deep learning image transmission through a multimode fiber based on a small training dataset

Binbin Song, Chang Jin, Jixuan Wu, Wei Lin, Bo Liu, Wei Huang, and Shengyong Chen
Opt. Express 30(4) 5657-5672 (2022)

Cited By

You do not have subscription access to this journal. Cited by links are available to subscribers only. You may subscribe either as an Optica member, or as an authorized user of your institution.

Contact your librarian or system administrator
or
Login to access Optica Member Subscription

Select as filters


Select Topics Cancel
© Copyright 2024 | Optica Publishing Group. All rights reserved, including rights for text and data mining and training of artificial technologies or similar technologies.