ART 2: self-organization of stable category recognition codes for analog input patterns

Gail A. Carpenter; Stephen Grossberg

doi:10.1364/AO.26.004919

I. Adaptive Resonance Architectures

Adaptive resonance architectures are neural networks that self-organize stable recognition codes in real time in response to arbitrary sequences of input patterns. The basic principles of adaptive resonance theory (ART) were introduced by Grossberg.[1] A class of adaptive resonance architectures, called ART 1, has since been characterized as a system of ordinary differential equations by Carpenter and Grossberg.[2],[3] Theorems have been proved that trace the real-time dynamics of ART 1 networks in response to arbitrary sequences of binary input patterns. These theorems predict both the order of search, as a function of the learning history of the network, and the asymptotic category structure self-organized by arbitrary sequences of binary input patterns. They also prove the self-stabilization property and show that the system's adaptive weights oscillate at most once, yet do not get trapped in spurious memory states or local minima.

This paper describes a new class of adaptive resonance architectures, called ART 2. ART 2 networks self-organize stable recognition categories in response to arbitrary sequences of analog (gray-scale, continuous-valued) input patterns, as well as binary input patterns. Computer simulations are used to illustrate system dynamics. One such simulation is summarized in Fig. 1, which shows how a typical ART 2 architecture has quickly learned to group fifty inputs into thirty-four stable recognition categories after a single presentation of each input. The plots below each number show all those input patterns ART 2 has grouped into the corresponding category. Equations for the system used in the simulation are given in Sees. V–VIII.

ART networks encode new input patterns, in part, by changing the weights, or long-term memory (LTM) traces, of a bottom-up adaptive filter (Fig. 2). This filter is contained in pathways leading from a feature representation field (F₁) to a category representation field (F₂) whose nodes undergo cooperative and competitive interactions. Such a combination of adaptive filtering and competition, sometimes called competitive learning, is shared by many other models of adaptive pattern recognition and associative learning. See Grossberg[4] for a review of the development of competitive learning models. In an ART network, however, it is a second, top-down adaptive filter that leads to the crucial property of code self-stabilization. Such top-down adaptive signals play the role of learned expectations in an ART system. They enable the network to carry out attentional priming, pattern matching, and self-adjusting parallel search. One of the key insights of the ART design is that top-down attentional and intentional, or expectation, mechanisms are necessary to self-stabilize learning in response to an arbitrary input environment.

The fields F₁ and F₂, as well as the bottom-up and top-down adaptive filters, are contained within ART's attentional subsystem (Fig. 2). An auxiliary orienting subsystem becomes active when a bottom-up input to F₁ fails to match the learned top-down expectation read-out by the active category representation at F₂. In this case, the orienting subsystem is activated and causes rapid reset of the active category representation at F₂. This reset event automatically induces the attentional subsystem to proceed with a parallel search. Alternative categories are tested until either an adequate match is found or a new category is established. The search remains efficient because the search strategy is updated adaptively throughout the learning process. The search proceeds rapidly, relative to the learning rate. Thus significant changes in the bottom-up and top-down adaptive filters occur only when a search ends and a matched F₁ pattern resonates within the system. For the simulation illustrated in Fig. 1, the ART 2 system carried out a search during many of the initial fifty input presentations.

The processing cycle of bottom-up adaptive filtering, code (or hypothesis) selection, read-out of a top-down learned expectation, matching, and code reset shows that, within an ART system, adaptive pattern recognition is a special case of the more general cognitive process of discovering, testing, searching, learning, and recognizing hypotheses. Applications of ART systems to problems concerning the adaptive processing of large abstract knowledge bases are thus a key goal for future research.

The fact that learning within an ART system occurs only within a resonant state enables such a system to solve the design trade-off between plasticity and stability. Plasticity, or the potential for rapid change in the LTM traces, remains intact indefinitely, thereby enabling an ART architecture to learn about future unexpected events until it exhausts its full memory capacity.

Learning within a resonant state either refines the code of a previously established recognition code, based on any new information that the input pattern may contain, or initiates code learning within a previously uncommitted set of nodes. If, for example, a new input were added at any time to the set of fifty inputs in Fig. 1, the system would search the established categories. If an adequate match were found, possibly on the initial search cycle, the LTM category representation would be refined, if necessary, to incorporate the new pattern. If no match were found, and the full coding capacity were not yet exhausted, a new category would be formed, with previously uncommitted LTM traces encoding the STM pattern established by the input.

The architecture's adaptive search enables it to discover and learn appropriate recognition codes without getting trapped in spurious memory states or local minima. In other search models, such as search trees, the search time can become increasingly prolonged as the learned code becomes increasingly complex. In an ART architecture, by contrast, search takes place only as a recognition code is being learned, and the search maintains its efficiency as learning goes on.

Self-stabilization of prior learning is achieved via the dynamic buffering provided by read-out of a learned top-down expectation, not by switching off plasticity or restricting the class of admissible inputs. For example, after the initial presentation of fifty input patterns in the simulation illustrated by Fig. 1, learning self-stabilized. In general, within an ART architecture, once learning self-stabilizes within a particular recognition category, the search mechanism is automatically disengaged. Thereafter, that category can be directly activated, or accessed, with great rapidity and without search by any of its input exemplars.

The criterion for an adequate match between an input pattern and a chosen category template is adjustable in an ART architecture. The matching criterion is determined by a vigilance parameter that controls activation of the orienting subsystem. All other things being equal, higher vigilance imposes a stricter matching criterion, which in turn partitions the input set into finer categories. Lower vigilance tolerates greater top-down/bottom-up mismatches at F₁, leading in turn to coarser categories (Fig. 3). In addition, at every vigilance level, the matching criterion is self-scaling: a small mismatch may be tolerated if the input pattern is complex, while the same featural mismatch would trigger reset if the input represented only a few features.

Even without any search, as when vigilance is low or the orienting subsystem is removed, ART 2 can often establish a reasonable category structure (Fig. 4). In this case, however, the top-down learned expectations assume the full burden of code self-stabilization by generating the attentional focus to dynamically buffer the emergent code. Although mismatch of bottom-up and top-down patterns at F₁ can attenuate unmatched features at F₁, such a mismatch does not elicit a search for a more appropriate F₂ code before learning can occur. Such learning will incorporate the unattenuated F₁ features into the initially selected category's recognition code. In this situation, more input trials may be needed before the code self-stabilizes; false groupings may occur during the early trials, as in category 1 of Fig. 4(a); and the flexible matching criterion achieved by variable vigilance is lost. Nonetheless, the top-down expectations can actively regulate the course of learning to generate a stable asymptotic code with desirable properties. For example, despite the initial anomalous coding in the example of Fig. 4(a), Fig. 4(b) shows that a stable category structure is established by the third round of inputs in which the false groupings within category 1 in Fig. 4(a) have been corrected by splitting grossly dissimilar inputs into the separate categories 1 and 7.

The top-down learned expectations and the orienting subsystem are not the only means by which an ART network carries out active regulation of the learning process. Attentional gain control at F₁ and F₂ also contributes to this active regulation (Sec. II). Gain control acts to adjust overall sensitivity to patterned inputs and to coordinate the separate, asynchronous functions of the ART subsystems. Gain control nuclei are represented as large filled circles in the figures.

II. ART 1: Binary Input Patterns

Figure 2 illustrates the main features of a typical ART 1 network. Two successive stages, F₁ and F₂, of the attentional subsystem encode patterns of activation in short-term memory (STM). Each bottom-up or top-down pathway between F₁ and F₂ contains ar adaptive LTM trace that multiplies the signal in its pathway. The rest of the circuit modulates these STM and LTM processes. Modulation by gain 1 enables F₁ to distinguish between a bottom-up input pattern and a top-down priming or template pattern, as well as to match these bottom-up and top-down patterns. In particular, bottom-up inputs can supraliminally activate F₂; top-down expectations in the absence of bottom-up inputs can subliminally sensitize, or prime, F₁; and a combination of bottom-up and top-down inputs is matched according to a 2/3 Rule which activates the nodes within the intersection of the bottom-up and top-down patterns (Fig. 5). Thus, within the context of a self-organizing ART architecture, intentionality (or the action of learned top-down expectations) implies a spatial logic matching rule. Carpenter and Grossberg[3] prove that 2/3 Rule matching is necessary for self-stabilization of learning within ART 1 in response to arbitrary sequences of binary input patterns.

The orienting subsystem generates a reset wave to F₂ when the bottom-up input pattern and top-down template pattern mismatch at F₁, according to the vigilance criterion. The reset wave selectively and enduringly inhibits active F₂ cells until the current input is shut off. Offset of the input pattern terminates its processing at F₁ and triggers offset of gain 2. Gain 2 offset causes rapid decay of STM at F₂, and thereby prepares F₂ to encode the next input pattern without bias.

An ART 1 system is fully defined by a system of differential equations that determines STM and LTM dynamics in response to an arbitrary temporal sequence of binary input patterns. Theorems characterizing these dynamics have been proved in the case where fast learning occurs; that is, where each trial is long enough for the LTM traces to approach equilibrium values.[3] Variations of the ART 1 architecture exhibit similar dynamics. Hence the term ART 1 designates a family, or class, of functionally equivalent architectures rather than a single model.

III. ART 2: Analog Input Patterns

ART 2 architectures are designed for the processing of analog, as well as binary, input patterns. A category representation system for analog inputs needs to be able to pick out and enhance similar signals embedded in various noisy backgrounds, as in category 16 of Fig. 1.

Figure 6 illustrates a typical ART 2 architecture. A comparison of Figs. 2 and 6 illustrates some of the principal differences between ART 1 and ART 2 networks. For ART 2 to match and learn sequences of analog input patterns in a stable fashion, its feature representation field F₁ includes several processing levels and gain control systems. Bottom-up input patterns and top-down signals are received at different locations in F₁. Positive feedback loops within F₁ anhance salient features and suppress noise. Although F₁ is more complex in ART 2 than in ART 1, the LTM equations of ART 2 are simpler.

How the signal functions and parameters of the various ART 2 architectures can best be chosen to categorize particular classes of analog input patterns for specialized applications is the subject of ongoing research. In particular, since ART 2 architectures are designed to categorize arbitrary sequences of analog or digital input patterns, an arbitrary preprocessor can be attached to the front end of an ART 2 architecture. This property is being exploited to design a self-organizing architecture for invariant recognition and recall using laser radar, boundary segmentation, and invariant filter methods to generate preprocessed inputs to ART 2.[5]–[8]

IV. ART 2 Design Principles

ART 2 architectures satisfy a set of design principles derived from an analysis of neural networks that form recognition categories for arbitrary sequences of analog input patterns. ART 2 systems have been developed to satisfy the multiple design principles or processing constraints that give rise to the architecture's emergent properties. At least three variations on the ART 2 architecture have been identified that are capable of satisfying these constraints. Indeed, the heart of the ART 2 analysis consists of discovering how different combinations of network mechanisms work together to generate particular combinations of desirable emergent properties. That is why theoretical ablation experiments on ART 2 architectures have proved to be so useful, since they reveal which emergent properties are spared and which are lost in reduced architectures.

In each ART 2 architecture, combinations of normalization, gain control, matching and learning mechanisms are interwoven in generally similar ways. Although how this is done may be modified to some extent, in all the ART 2 variations that we have discovered, F₁ needs to include different levels to receive and transform bottom-up input patterns and top-down expectation patterns, as well as an interfacing level of interneurons that matches the transformed bottom-up and top-down information and feeds the results back to the bottom and top F₁ levels. How the particular F₁ levels shown in Fig. 6 work will be described in Sees. IX–XII. Alternative ART 2 models are illustrated in Sec. XIII and in [Ref. 5].

We now describe the main ART 2 design principles.

A. Stability-Plasticity Trade-Off

An ART 2 system needs to be able to learn a stable recognition code in response to an arbitrary sequence of analog input patterns. Since the plasticity of an ART system is maintained for all time, and since input presentation times can be of arbitrary duration, STM processing must be defined in such a way that a sustained new input pattern does not wash away previously learned information. Section XII shows how removal, or ablation, of one part of the F₁ internal feedback loop in Fig. 6 can lead to a type of instability in which a single input, embedded in a particular input sequence, can jump between categories indefinitely.

B. Search-Direct Access Trade-Off

An ART 2 system carries out a parallel search in order to regulate the selection of appropriate recognition codes during the learning process, yet automatically disengages the search process as an input pattern becomes familiar. Thereafter the familiar input pattern directly accesses its recognition code no matter how complex the total learned recognition structure may have become, much as we can rapidly recognize our parents at different stages of our life even though we may learn much more as we grow older.

C. Match-Reset Trade-Off

An ART 2 system needs to be able to resolve several potentially conflicting properties which can be formulated as variants of a design trade-off between the requirements of sensitive matching and formation of new codes.

The system should, on the one hand, be able to recognize and react to arbitrarily small differences between an active F₁ STM pattern and the LTM pattern being read-out from an established category. In particular, if vigilance is high, the F₁ STM pattern established by a bottom-up input exemplar should be nearly identical to the learned top-down F₂ → F₁ expectation pattern in order for the exemplar to be accepted as a member of an established category. On the other hand, when an uncommitted F₂ node becomes active for the first time, it should be able to remain active, without being reset, so that it can encode its first input exemplar, even though in this case there is no top-down/bottom-up pattern match whatsoever. Section IX shows how a combination of an appropriately chosen ART 2 reset rule and LTM initial values work together to satisfy both of these processing requirements. In fact, ART 2 parameters can be chosen to satisfy the more general property that learning increases the system's sensitivity to mismatches between bottom-up and top-down patterns.

D. STM Invariance Under Read-Out of Matched LTM

Further discussion of match-reset trade-off clarifies why F₁ is composed of several internal processing levels. Suppose that before an uncommitted F₂ node is first activated, its top-down F₂ → F₁ LTM traces are chosen equal to zero. On the node's first learning trial, its LTM traces will progressively learn the STM pattern that is generated by the top level of F₁. As noted above, such learning must not be allowed to cause a mismatch capable of resetting F₂, because the LTM traces have not previously learned any other pattern. This property is achieved by designing the bottom and middle levels of F₁ so that their STM activity patterns are not changed at all by the read-out of these LTM traces as they learn their first positive values.

More generally, F₁ is designed so that read-out by F₂ of a previously learned LTM pattern that matches perfectly the STM pattern at the top level of F₁ does not change the STM patterns circulating at the bottom and middle levels of F₁. Thus, in a perfect match situation, or in a situation where a zero-vector of LTM values learns a perfect match, the STM activity patterns at the bottom and middle F₁ levels are left invariant; hence, no reset occurs.

This invariance property enables the bottom and middle F₁ levels to nonlinearly transform the input pattern in a manner that remains stable during learning. In particular, the input pattern may be contrast enhanced while noise in the input is suppressed. If read-out of a top-down LTM pattern could change even the base line of activation at the F₁ levels which execute this transformation, the degree of contrast enhancement and noise suppression could be altered, thereby generating a new STM pattern for learning by the top-down LTM traces. The STM invariance property prevents read-out of a perfectly matched LTM pattern from causing reset by preventing any change whatsoever from occurring in the STM patterning at the lower F₁ levels.

E. Coexistence of LTM Read-Out and STM Normalization

The STM invariance property leads to the use of multiple F₁ levels because the F₁ nodes at which top-down LTM read-out occurs receive an additional input when top-down signals are active than when they are not. The extra F₁ levels provide enough degrees of computational freedom to both read-out top-down LTM and normalize the total STM pattern at the top F₁ level before this normalized STM pattern can interact with the middle F₁ level at which top-down and bottom-up information are matched.

In a similar fashion, the bottom F₁ level enables an input pattern to be normalized before this normalized STM pattern can interact with the middle F₁ level. Thus separate bottom and top F₁ levels provide enough degrees of computational freedom to compensate for fluctuations in base line activity levels. In the absence of such normalization, confusion between useful pattern differences and spurious base line fluctuations could easily upset the matching process and cause spurious reset events to occur, thereby destabilizing the network's search and learning processes.

F. No LTM Recoding by Superset Inputs

Although read-out of a top-down LTM pattern that perfectly matches the STM pattern at the F₁ top level never causes F₂ reset, even a very small mismatch in these patterns is sufficient to reset F₂ if the vigilance parameter is chosen sufficiently high. The middle F₁ level plays a key role in causing the attenuation of STM activity that causes such a reset event to occur.

An important example of such a reset-inducing mismatch occurs when one or more, but not all, of the top-down LTM traces equal zero or very small values and the corresponding F₁ nodes have positive STM activities. When this occurs, the STM activities of these F₁ nodes are suppressed. If the total STM suppression is large enough to reset F₂, the network searches for a better match. If the total STM suppression is not large enough to reset F₂, the top-down LTM traces of these nodes remain small during the ensuing learning trial, because they sample the small STM values that their own small LTM values have caused.

This property is a version of the 2/3 Rule that has been used to prove stability of learning by an ART 1 architecture in response to an arbitrary sequence of binary input patterns.[3] It also is necessary for ART 2 to achieve stable learning in response to an arbitrary sequence of analog input patterns (Sec. XII). In the jargon of ART 1, a superset bottom-up input pattern cannot recode a subset top-down expectation. In ART 1, this property was achieved by an intentional gain control channel (Fig. 2). In the versions of ART 2 developed so far, it is realized as part of the F₁ internal levels. These design variations are still a subject of ongoing research.

G. Stable Choice Until Reset

Match-reset trade-off also requires that only a reset event that is triggered by the orienting subsystem can cause a change in the chosen F₂ code. This property is imposed at any degree of mismatch between a top-down F₂ → F₁ LTM pattern and the circulating F₁ STM pattern. Thus all the network's real-time pattern processing operations, including top-down F₂ → F₁ feedback, the fast nonlinear feedback dynamics within F₁, and the slow LTM changes during learning must be organized to maintain the original F₁ → F₂ category choice, unless F₂ is actively reset by the orienting subsystem.

H. Contrast Enhancement, Noise Suppression, and Mismatch Attenuation by Nonlinear Signal Functions

A given class of analog signals may be embedded in variable levels of background noise (Fig. 1). A combination of normalization and nonlinear feedback processes within F₁ determines a noise criterion and enables the system to separate signal from noise. In particular, these processes contrast enhance the F₁ STM pattern, and hence also the learned LTM patterns. The degree of contrast enhancement and noise suppression is determined by the degree of nonlinearity in the feedback signal functions at F₁.

A nonlinear signal function operating on the sum of normalized bottom-up and top-down signals also correlates these signals, just as squaring a sum A + B of two L₂-normalized vectors generates 2(1 + A · B). Nonlinear feedback signaling hereby helps to attenuate the total activation of F₁ in response to mismatched bottom-up input and top-down expectation patterns, as well as to contrast enhance and noise suppress bottom-up input patterns. Figure 8(e) shows that the absence of nonlinearity in the F₁ feedback loop can lead to all subpatterns of a pattern being coded in the same category in conditions of low vigilance.

I. Rapid Self-Stabilization

A learning system that is unstable in general can be made more stable by making the learning rate so slow that LTM traces change little on a single input trial. In this case, many learning trials are needed to encode a fixed set of inputs. Learning in an ART system needs to be slow relative to the STM processing rate (Sec. V), but no restrictions are placed on absolute rates. Thus ART 2 is capable of stable learning in the fast learning case, in which LTM traces change so quickly that they can approach new equilibrium values on every trial. The ART 2 simulations in this article were all carried out in fast learning conditions, and rapid code self-stabilization occurs in each case. Self-stabilization is also sped up by the action of the orienting subsystem, but can also occur rapidly even without it (Figs. 4 and 8).

J. Normalization

Several different schemes may be used to normalize activation patterns across F₁. In this paper we used nonspecific inhibitory interneurons (schematized by large black disks in Fig. 6). Each such normalizer uses O(M) connections where M is the number of nodes to be normalized. Alternatively, a shunting on-center off-surround network could be used as a normalizer,[9] but such a network uses O(M²) connections.

K. Local Computations

ART 2 system STM and LTM computations use only information available locally and in real time. There are no assumptions of weight transport, as in backpropagation, nor of an a priori input probability distribution, as in simulated annealing. Moreover, all ART 2 local equations have a simple form (Sees. V–VIII). It is the architecture as a whole that endows the model with its desirable emergent computational properties.

V. ART 2 STM Equations: F₁

The potential, or STM activity, V_i of the ith node at any one of the F₁ processing stages obeys a membrane equation[10] of the form

∊ \frac{d}{d t} V_{i} = - A V_{i} + (1 - B V_{i}) J_{i}^{+} - (C + D V_{i}) J_{i}^{-}

(i = 1…M). Term

J_{i}^{+}

is the total excitatory input to the ith node and

J_{i}^{-}

is the total inhibitory input. In the absence of all inputs, V_i decays to 0. The dimensionless parameter ∊ represents the ratio between the STM relaxation time and the LTM relaxation time. With the LTM rate O(1), then

0 < ∊ ≪ 1 .

Also, B ≡ 0 and C ≡ 0 in the F₁ equations of the ART 2 example in Fig. 6. Thus the STM equations, in the singular form as ∊ → 0, reduce to

V_{i} = \frac{J_{i}^{+}}{A + D J_{i}^{-}} .

In this form, the dimensionless Eqs. (4)–(9) characterize the STM activities, p_i, q_i, u_i, υ_i, w_i, and x_i, computed at F₁:

p_{i} = u_{i} + \sum_{j} g (y_{i}) z_{j i}

q_{i} = \frac{p_{i}}{e + ‖ p ‖},

u_{i} = \frac{υ_{i}}{e + ‖ v ‖},

υ_{i} = f (x_{i}) + b f (q_{i}),

w_{i} = I_{i} + a u_{i},

x_{i} = \frac{w_{i}}{e + ‖ w ‖},

where

\overset{̅}{‖} V \overset{̅}{‖}

denotes the L₂ norm of a vector V and where y_j is the STM activity of the jth F₂ node. The nonlinear signal function f in Eq. (7) is typically of the form

f (x) = {\begin{array}{l} \frac{2 θ x^{2}}{(x^{2} + θ^{2})} & if 0 \leq x \leq θ, \\ x & if x \geq θ, \end{array}

which is continuously differentiable, or

f (x) = {\begin{array}{l} 0 & if 0 \leq x < θ, \\ x & if x \geq θ, \end{array}

which is piecewise linear. The graph of function f(x) in Eq. (10) may also be shifted to the right, making f(x) = 0 for small x, as in Eq. (11). Since the variables x_i and q_i are always between 0 and 1 [Eqs. (5) and (9)], the function values f(x_i) and f(q_i) also stay between 0 and 1. Alternatively, the signal function f(x) could also be chosen to saturate at high x values. This would have the effect of flattening pattern details like those in category 17 of Fig. 1, sitting on the top of an activity peak.

VI. ART 2 STM Equations: F₂

The category representation field F₂ is the same in ART 2 as in ART 1 ([Ref. 3]). The key properties of F₂ are contrast enhancement of the filtered F₁ → F₂ input pattern, and reset, or enduring inhibition, of active F₂ nodes whenever a pattern mismatch at F₁ is large enough to activate the orienting subsystem.

Contrast enhancement is carried out by competition within F₂. Choice is the extreme case of contrast enhancement. F₂ makes a choice when the node receiving the largest total input quenches activity in all other nodes. In other words, let T_j be the summed filtered F₁ → F₂ input to the jth F₂ node:

T_{j} = \sum_{i} p_{i} z_{i j}

(j = M + 1…N). Then F₂ is said to make a choice if the Jth F₂ node becomes maximally active, while all other nodes are inhibited, when

T_{J} = max {T_{j} : j = M + 1 \dots N} .

F₂ reset may be carried out in several ways, one being use of a gated dipole field network in F₂. When a nonspecific arousal input reaches an F₂ gated dipole field, nodes are inhibited or reset (Sec. VIII) in proportion to their former STM activity levels. Moreover this inhibition endures until the bottom-up input to F₁ shuts off. Such a nonspecific arousal wave reaches F₂, via the orienting subsystem, when a sufficiently large mismatch occurs at F₁.

When F₂ makes a choice, the main elements of the gated dipole field dynamics may be characterized as

g (y_{J}) = {\begin{array}{l} d & if T_{J} = max {T_{j} : the j th F_{2} node has not been reset on the current trial}, \\ 0 & otherwise . \end{array}

Equation (14) implies that Eq. (4) reduces to

p_{i} = {\begin{array}{l} u_{i} & if F_{2} is inactive, \\ u_{i} + d z_{J i} & if the J th F_{2} node is active . \end{array}

VII. ART 2 LTM Equations

The top-down and bottom-up LTM trace equations for ART 2 are given by

top - down (F_{2} \to F_{1}) : \frac{d}{d t} z_{j i} = g (y_{j}) [p_{i} - z_{j i}],

bottom - up (F_{1} \to F_{2}) : \frac{d}{d t} z_{i j} = g (y_{j}) [p_{i} - z_{i j}] .

If F₂ makes a choice, Eqs. (14)–(17) imply that, if the Jth F₂ node is active, then

\frac{d}{d t} z_{J i} = d [p_{i} - z_{J i}] = d (1 - d) [\frac{u_{i}}{1 - d} - z_{J i}],

\frac{d}{d t} z_{i J} = d [p_{i} - z_{i J}] = d (1 - d) [\frac{u_{i}}{1 - d} - z_{i J}],

with 0 < d < 1. For all j ≠ J, dz_ji/dt = 0 and dz_ij/dt = 0. Sections IX and XI give admissible bounds on the initial values of the LTM traces.

VIII. ART 2 Reset Equations: the Orienting Subsystem

Since a binary pattern match may be computed by counting matched bits, ART 1 architectures do not require patterned information in the orienting subsystem (Fig. 2). In contrast, computation of an analog pattern match does require patterned information. The degree of match between an STM pattern at F₁ and an active LTM pattern is determined by the vector r = (r₁…r_M), where for the ART 2 architecture of Fig. 6,

r_{i} = \frac{u_{i} + c p_{i}}{e + ‖ u ‖ + ‖ c p ‖} .

The orienting subsystem is assumed to reset F₂ whenever an input pattern is active and

\frac{ρ}{e + ‖ r ‖} > 1,

where the vigilance parameter ρ is set between 0 and 1.

For simplicity, we will henceforth consider an ART 2 system in which F₂ makes a choice and in which e is set equal to 0. Thus $\overset{̅}{‖} x \overset{̅}{‖} = \overset{̅}{‖} u \overset{̅}{‖} = \overset{̅}{‖} q \overset{̅}{‖} = 1$ . Simulations use the piecewise linear signal function f in Eq. (11).

IX. Match-Reset Trade-Off: Choice of Top-Down Initial LTM Values

Vector r gives rise to all the properties required to satisfy the match-reset trade-off described in Sec. IV. Note first that, when the Jth F₂ node is active, Eq. (20) implies that

‖ r ‖ = \frac{{[1 + 2 ‖ c p ‖ cos (u, p) + {‖ c p ‖}^{2}]}^{½}}{1 + ‖ c p ‖},

where cos (u,p) denotes the cosine of the angle between the vector u and the vector p. Also, by Eq. (15), the vector p equals the sum u + dz_J, where z_J ≡ (z_J₁…z_JM) denotes the top-down vector of LTM traces projecting from the Jth F₂ node. Since

\overset{̅}{‖} u \overset{̅}{‖} = 1

, the geometry of the vector sum p = u + dz_J implies that

‖ p ‖ cos (u, p) = 1 + ‖ d z_{J} ‖ cos (u, z_{J}) .

Also,

‖ p ‖ = {[1 + 2 ‖ d z_{J} ‖ cos (u, z_{J}) + {‖ d z_{J} ‖}^{2}]}^{½} .

Equations (22)–(24) imply that

‖ r ‖ = \frac{{[{(1 + c)}^{2} + 2 (1 + c) ‖ c d z_{J} ‖ cos (u, z_{J}) + {‖ c d z_{J} ‖}^{2}]}^{½}}{1 + {[c^{2} + 2 c ‖ c d z_{J} ‖ cos (u, z_{J}) + {‖ c d z_{J} ‖}^{2}]}^{½}} .

Both numerator and denominator equal

1 + c + \overset{̅}{‖} c d z_{J} \overset{̅}{‖}

when cos(u,z_J) = 1. Thus

\overset{̅}{‖} r \overset{̅}{‖} = 1

when the STM pattern u exactly matches the LTM pattern z_J, up to a constant multiple.

Figure 7 graphs $\overset{̅}{‖} r \overset{̅}{‖}$ as a function of $\overset{̅}{‖} c d z_{J} \overset{̅}{‖}$ for various values of cos(u,z_J). The Jth F₂ node remains active only if $ρ \leq \overset{̅}{‖} r \overset{̅}{‖}$ . Since ρ < 1, Fig. 7 shows that this will occur either if cos(u,z_J) is close to 1 or if $\overset{̅}{‖} z_{J} \overset{̅}{‖}$ is close to 0. That is, no reset occurs if the STM vector u is nearly parallel to the LTM vector z_J or if the top-down LTM traces z_Ji are all small. By Eq. (18), z_J becomes parallel to u during learning, thus inhibiting reset. Reset must also be inhibited, however, while a new category is being established. Figure 7 shows that this can be accomplished by making all $\overset{̅}{‖} z_{j} \overset{̅}{‖}$ small before any learning occurs; in particular, we let the top-down initial LTM values satisfy

z_{j i} (0) = 0,

for i = 1…M and j= M + 1…N.

Condition (26) ensures that no reset occurs when an uncommitted F₂ node first becomes active. Hence learning can begin. Moreover, the learning rule (18) and the LTM initial value rule (26) together imply that z_J remains parallel to u as learning proceeds, so $\overset{̅}{‖} r (t) \overset{̅}{‖} \equiv 1$ . Thus no reset ever occurs during a trial in which an uncommitted F₂ node is first activated.

X. Learning Increases Mismatch Sensitivity and Confirms Category Choice

Figure 7 suggests how to implement the property that learning increases sensitivity to mismatches between bottom-up and top-down patterns. Figure 7 indicates that, for fixed cos(u,z_J), $\overset{̅}{‖} r \overset{̅}{‖}$ is a decreasing function of $\overset{̅}{‖} c d z_{J} \overset{̅}{‖}$ for $\overset{̅}{‖} c d z_{J} \overset{̅}{‖} \leq 1$ . In fact, in the limit as c → 0, the minimum of each curve approaches the line $\overset{̅}{‖} c d z_{J} \overset{̅}{‖} = 1$ . By Eqs. (18) and (26), $\overset{̅}{‖} z_{J} \overset{̅}{‖} < 1 / (1 - d)$ and $\overset{̅}{‖} z_{J} \overset{̅}{‖} \to 1 / (1 - d)$ during learning. Therefore implementation of the property that learning increases mismatch sensitivity translates into the parameter constraint

\frac{c d}{1 - d} \leq 1 .

The closer the ratio cd/(1 − d) is chosen to 1 the more sensitive the system is to mismatches, all other things being equal.

Parameter constraint (27) helps to ensure that learning on a given trial confirms the initial category choice on that trial. To see this note that, if an established category is chosen, $\overset{̅}{‖} z_{J} \overset{̅}{‖}$ is close to 1/(1 − d) at the beginning and end of a fast learning trial. However $\overset{̅}{‖} z_{J} \overset{̅}{‖}$ typically decreases and then increases during a learning trial. Therefore if cd/(1 − d) were >1, the reset inequality (21) could be satisfied while $\overset{̅}{‖} z_{J} \overset{̅}{‖}$ was decreasing. Thus, without (27), it would be difficult to rule out the possibility of unexpected F₂ reset in the middle of a learning trial.

XI. Choosing a New Category: Bottom-Up LTM Initial Values

Section IX dicusses the fact that the top-down initial LTM values z_ij(0) need to be chosen small, or else top-down LTM read-out by an uncommitted node could lead to immediate F₂ reset rather than learning of a new category. The bottom-up LTM initial values z_ij (0) also need to be chosen small, but for different reasons.

Let z^J = (z₁_J…z_MJ) denote the bottom-up vector of LTM traces that project to the Jth F₂ node. Equation (19) implies that $\overset{̅}{‖} z^{J} \overset{̅}{‖} \to 1 / (1 - d)$ during learning. If $\overset{̅}{‖} z^{J} (0) \overset{̅}{‖}$ were chosen greater than 1/(1 − d), an input that first chose an uncommitted node could switch to other uncommitted nodes in the middle of a learning trial. It is thus necessary to require that

‖ z^{J} (0) ‖ \leq \frac{1}{1 - d} .

Inequality (28) implies that if each z^J (0) is uniform, each LTM trace must satisfy the constraint

z_{i j} (0) \leq \frac{1}{(1 - d) \sqrt{M}}

for i = 1…M and j = M + 1…N. Alternatively, random numbers or trained patterns could be taken as initial LTM values. If bottom-up input is the sole source of F₂ activation, at least some z_iJ (0) values need to be chosen positive if the Jth F₂ node is ever to become active.

Choosing equality in (29) biases the ART 2 system as much as possible toward choosing uncommitted nodes. A typical input would search only those nodes with which it is fairly well matched, and then go directly to an uncommitted node. If no learned category representation forms a good match, an uncommitted node will be directly accessed. Setting the initial bottom-up LTM trace values as large as possible, therefore, helps to stabilize the ART 2 network by ensuring that the system will form a new category, rather than recode an established but badly mismatched one, when vigilance is too low to prevent recoding by active reset via the orienting subsystem. Thus construction of the instability example in Fig. 8(c) requires, in addition to the removal of the orienting subsystem and the internal feedback at F₁, that the initial bottom-up LTM trace values be significantly less than the maximum allowed by condition (29).

XII. Stability-Plasticity Trade-Off

ART 2 design principles permit arbitrary sequences of patterns to be encoded during arbitrarily long input trials, and the ability of the LTM traces to learn does not decrease with time. Some internal mechanism must therefore buffer established ART category structures against ceaseless recoding by new input patterns. ART 1 architectures buffer category structures by means of the 2/3 Rule for pattern matching (Fig. 5). During matching, an F₁ node in ART 1 can remain active only if it receives significant inputs both bottom-up and top-down. ART 1 implements the 2/3 Rule using an inhibitory attentional gain control signal that is read out with the top-down LTM vector (Fig. 2).

ART 2 architectures implement a weak version of the 2/3 Rule in which, during matching, an F₁ node can remain active only if it receives significant top-down input. It is possible, however, for a node receiving large top-down input to remain stored in memory even if bottom-up input to that node is absent on a given trial. The corresponding feature, which had been encoded as significant by prior exemplars, would hence remain part of the category representation although unmatched in the active exemplar. It would, moreover, be partially restored in STM. During learning, the relative importance of that feature would decline, but it would not necessarily be eliminated. However, a feature consistently absent from most category exemplars would eventually be removed from the category's expectation pattern z_J. The ART 2 matching rule implies that the feature would then not be relearned; if present in a given exemplar, it would be treated as noise.

All parts of the F₁ feedback loop in Fig. 6 work together to implement this ART 2 matching rule. The five simulations in Fig. 8 illustrate the roles of different components of the ART 2 system. Each column shows the ART 2 response to a sequence of four input patterns (A, B, C, and D) presented in the order ABCAD on trials 1–5 and again on trials 6–10. The ART 2 system dynamics established on the second round are stable, and thus would be repeated indefinitely if the same input sequence was repeated. Parameters c and d are held fixed throughout. The simulations explore the role of the remaining parameters, a, b, θ, and ρ.

Figure 8(a) shows a simulation with parameters a, b, and θ in a normal range, but with the vigilance parameter, ρ, set so high that the four inputs establish four categories. Two graphs are depicted for each trial: the top graph shows the input pattern (I = A, B, C, or D) and the bottom graph shows the LTM expectation pattern (z_J) at the end of the trial. The category number is shown beside the graph of z_J. On trial 1, input A establishes category 1. Note that pattern A is contrast enhanced in LTM, due to the fact that the pattern troughs are below the noise level defined by the signal threshold θ [Eqs. (7) and (11)]. In fact, θ is set equal to $1 / \sqrt{M}$ . This is the level at which uniform patterns are treated as pure noise but any nonuniform pattern can be contrast enhanced and stored in STM.

On trial 2, pattern B, which shares all its features with A, first searches category 1. The high vigilance level leads to F₂ reset, and B establishes the new category 2. On trial 3, pattern C also searches category 1; having nothing in common with pattern B, it then goes directly to an uncommitted node and establishes category 3. When A is again presented on trial 4, it directly accesses its original category 1. On trial 5, pattern D searches category 3, then category 1, then establishes the new category 4. Learning is stabilized on the first trial. Thus, on the second set of trials, A, B, C, and D choose the same categories as before, but without any search. Hereafter, each input directly accesses its category node. The bottom portion of the column summarizes the category structure established on trials 6–10. Pattern A is shown twice because it is presented twice every 5 trials. The categorization is stable, or consistent, in the sense that each pattern recognizes its unique category every time it appears.

For the four remaining simulations in Figs. 8(b)–(e), the vigilance parameter ρ is chosen so small that no search can ever occur. Whatever category is chosen first by the bottom-up input must accept and learn the matched F₁ STM pattern for as long as the input remains active. By eliminating reset by choosing vigilance low, one can directly test how much top-down matching can accomplish on its own. For example, in Fig. 8(b), low vigilance enables pattern B to be accepted on trial 2 into the category 1 that was established by pattern A on trial 1. By the weak 2/3 Rule, the critical feature pattern learned in response to B causes the collapse of LTM traces that do not correspond to B. When pattern A is presented again on trial 4, it is recoded into the category 2 that was established by pattern C on trial 3, since A is more similar to the critical feature pattern established by pattern C than to the critical feature pattern established jointly by pattern B and itself. Thereafter, the code is stable under periodic presentation of the sequence ABCAD.

Note, however, that patterns A and D are classified together, whereas B is not, even though B is more similar to A than is D. This is a consequence of eliminating reset and of fast learning during periodic presentation. In particular, the critical feature pattern learned on trial 10 illustrates the tendency for D to attenuate LTM traces outside its range. Were this tendency strengthened by increasing contrast or changing presentation order, D would have also been classified separately from A. This simulation thus shows how the ART 2 system can self-stabilize its learning in the absence of reset. In combination with Fig. 8(a), it also illustrates how active reset and search can generate stable categories which better reflect the similarity relationships among the input patterns.

Figure 8(b) also illustrates some finer details of ART 2 matching properties. Here, the F₁ feedback parameters a and b [see Eqs. (7) and (8) and Fig. 6] are large enough so that a feature once removed from the category representation (z_J) is not reinstated even if present in a category exemplar (I). Thus on trial 4, features present in the right-hand portion of pattern A are not encoded in the LTM pattern of category 2, due to the weak 2/3 Rule. However on trial 5, features absent from pattern D but initially coded in z_J are nevertheless able to remain coded, although they are weakened. Since these features are again present in the exemplars of category 2 on trials 6, 8, and 9, they are periodically restored in LTM.

Finally, compare trial 7 in Fig. 8(b) with trial 7 in Fig. 8(a). In each case pattern B has been established as the sole exemplar of a category. However in Fig. 8(b) category 1 had contained pattern A on one previous trial. Memory of pattern A persists in the contrast-enhanced LTM pattern in Fig. 8(b). If the input set were more complex, this difference in learning history could possibly lead to subsequent differences in category structure.

Figure 8(c) illustrates that unstable coding can occur when the feedback parameter b is set equal to zero. Pattern A is placed in category 1 on trials 1, 6, etc.; and in category 2 on trials 4, 9, etc. It jumps to a new category every time it appears. With fast learning, previous LTM patterns are washed away by subsequent input patterns. Failure of the weak 2/3 Rule on trials 4, 6, 9, etc., combined with the absence of the orienting subsystem and with the small initial bottom-up LTM values, leads to the instability. A large class of similar input sequences that share the subset-superset relationships of patterns A, B, C, and D also leads to unstable coding. However, if the class of input patterns were suitably restricted or if slow learning is imposed on each trial, satisfactory results could still be obtained. Similar unstable dynamics occur if the top-down F₂ → F₁ feedback parameter (d), instead of the internal F₁ feedback parameter (b), is chosen small (see Fig. 6).

Figure 8(d) illustrates how the lower feedback loop of the F₁ circuit in Fig. 6 also buffers learned category representations against unstable recording by bottom-up inputs. In this simulation, parameter a is set equal to 0. Similar dynamics occur if the top-down F₂ → F₁ feedback parameter (d) is small, but not so small that instability occurs. Setting a equal to 0 (or making d small) has the effect of weakening the importance of the F₂ → F₁ expectation feedback relative to the bottom-up input. In Fig. 8(d), therefore, the weak 2/3 Rule is partially violated. Note in particular the slight reinstatement on trials 4, 6, 8, and 9 of features present in I but not z_J at the start of the trial. An ART 2 system with a large and d close to 1 is better protected against potential instability than is the system of Fig. 8(d).

Finally, Fig. 8(e) illustrates the role of nonlinearity in the F₁ feedback loop of ART 2. Here, the threshold parameter θ is set equal to 0 [Eq. (11)] so that the signal function f in Eq. (7) and Fig. 6 is linear. Level F₁ therefore loses the properties of contrast enhancement and noise suppression. Even though the feedback parameters a, b, and d are all large, trial 2 shows that mismatched features in I, while attenuated, are never eliminated. The result is that, given the zero vigilance value, completely mismatched patterns, such as B and D, can be placed in the same category because they are parts of the superset pattern A that established the category.

XIII. Alternative ART 2 Architectures

Two alternative ART 2 models are shown in Figs. 9 and 10. In Fig. 9, the orienting subsystem pattern (r) is also part of F₁. In this model, q = p − u so that q = dz_J if the Jth F₂ node is active. Thus r directly computes the cosine of the angle between u and z_J. In contrast, vector r in the F₂ model of Fig. 6 indirectly computes this angle by computing the angle between u and p, which is a linear combination of u and z_J. In addition the nonlinear signal function f appears twice in the lower F₁ loop in Fig. 9; in Fig. 6, f appears once in each F₁ loop, so that all matched input pathways projecting to any given node have the same signal function. Dynamics of the two ART 2 systems are similar. Equations for the ART 2 model of Fig. 9 are given in [Ref. 5].

The ART 2 model in Fig. 10 is also similar to the one in Fig. 6, except here the input vector I is the output of a preprocessing stage that imitates the lower and upper loops of F₁. This allows I itself to be used as an input to the orienting subsystem, rather than the vector u, which is more like the architecture of ART 1 (Fig. 2). The advantage of this is that I does not change when F₂ becomes active, and so provides a more stable input to the orienting subsystem throughout the trial than does u in Fig. 6. Figure 11 summarizes one category structure established by the ART 2 system of Fig. 10. All parameters, including vigilance, are the same in the simulations of Fig. 3 and 11. The input patterns depicted in Fig. 11 are the result of preprocessing the inputs of Fig. 3. Table I shows which categories of Fig. 3 correspond to categories in Fig. 11. Except for categories 1 and 7 of Fig. 3, which are each split into two in Fig. 11, the category structure generated by the two models is identical.

We wish to thank Cynthia Suchta and Carol Yanakakis for their valuable assistance in the preparation of the manuscript.

This research was supported in part by the Air Force Office of Scientific Research (AFOSR F49620-86-C-0037 and AFOSR 85-0149), the Army Research Office (ARO DAAG-29-85-K-0095), and the National Science Foundation [NSF DMS-86-11959 (G.A.C.) and NSFIRI-84-17756(S.G.)].

Figures and Table

Fig. 1 Category grouping of fifty analog input patterns into thirty-four recognition categories. Each input pattern I is depicted as a function of i (i = 1…M), with successive I_i values connected by straight lines. The category structure established on one complete presentation of the fifty inputs remains stable thereafter if the same inputs are presented again.

Download Full Size | PDF

Fig. 2 Typical ART 1 architecture. Rectangles represent fields where STM patterns are stored. Semicircles represent adaptive filter pathways and arrows represent paths which are not adaptive. Filled circles represent gain control nuclei, which sum input signals. Their output paths are nonspecific in the sense that at any given time a uniform signal is sent to all nodes in a receptor field. Gain control at F₁ and F₂ coordinates STM processing with input presentation rate.

Download Full Size | PDF

Fig. 3 Lower vigilance implies coarser grouping. The same ART 2 system as used in Fig. 1 has here grouped the same fifty inputs into twenty recognition categories. Note, for example, that categories 1 and 2 of Fig. 1 are here joined in category 1; categories 14, 15, and 32 are here joined in category 10; and categories 19–22 are here joined in category 13.

Download Full Size | PDF

Fig. 4 Category learning by an ART 2 model without an orienting subsystem, (a) The same ART 2 system as used in Figs. 1 and 3, but with vigilance level set equal to zero, has here grouped the same fifty inputs into six recognition categories after one presentation of each pattern. Without the full ART 2 system's ability to reset on mismatch, transitory groupings occur, as in category 1. (b) By the third presentation of each input, a coarse but stable category structure has been established.

Download Full Size | PDF

Fig. 5 Search for a correct F₂ code, (a) The input pattern I generates the specific STM activity pattern X at F₁ as it nonspecifically activates A. Pattern X both inhibits A and generates the output signal pattern S. Signal pattern S is transformed into the input pattern T, which activates the STM pattern Y across F₂. (b) Pattern Y generates the top-down signal pattern U which is transformed into the template pattern V. If V mismatches I at F₁, a new STM activity pattern X* is generated at F₁. The reduction in total STM activity which occurs when X is transformed into X* causes a decrease in the total inhibition from F₁ to A. (c) Then the input-driven activation of A can release a nonspecific arousal wave to F₂, which resets the STM pattern Y at F₂. (d) After Y is inhibited, its top-down template is eliminated, and X can be reinstated at F₁. Now X once again generates input pattern T to F₂, but since Y remains inhibited T can activate a different STM pattern Y* at F₂. If the top-down template due to Y* also mismatches I at F₁, the rapid search for an appropriate F₂ code continues.

Download Full Size | PDF

Fig. 6 Typical ART 2 architecture. Open arrows indicate specific patterned inputs to target nodes. Filled arrows indicate nonspecific gain control inputs. The gain control nuclei (large filled circles) nonspecifically inhibit target nodes in proportion to the L₂ norm of STM activity in their source fields [Eqs. (5), (6), (9), (20), and (21)]. When F₂ makes a choice, g(y_J) = d if the Jth F₂ node is active and g(y_J) = 0 otherwise. As in ART 1, gain control (not shown) coordinates STM processing with an input presentation rate.

Download Full Size | PDF

Fig. 7 Graph of ‖r‖ as a function of ‖cdz_J‖ for values of cos (u, z_J) between 0 and 1 and for c = 0.1 and d = 0.9. F₂ reset occurs whenever ‖r‖ falls below the vigilance parameter ρ.

Download Full Size | PDF

Fig. 8 ART 2 matching processes. The ART 2 system of Fig. 6 was used to generate the five simulations shown in columns (a)–(e). Each column shows the first ten simulation trials, in which four input patterns (A, B, C, D) are presented in order ABCAD on trials 1–5, and again on trials 6–10. Details are given in the text (Sec. XII). (a) The full ART 2 system, with ρ = 0.95, separates the four inputs into four categories. Search occurs on trials 1–5; thereafter each input directly accesses its category representation. Parameters a = 10, b = 10, c = 0.1, d = 0.9, θ = 0.2, and M = 25. The initial z_ij (0) values, 1, are half of the maximum, 2, allowed by constraint (27). The piecewise linear signal function (11) is used throughout, (b) Vigilance is here set so low (ρ = 0) that no search can ever occur. The coarse category structure established on trials 6–10 is, however, stable and consistent. All system parameters except ρ are as in (a). (c) With b = 0, the ART 2 system here generates an unstable, or inconsistent, category structure. Namely, input A goes alternatively to categories 1 and 2, and will continue to do so for as long as the sequence ABCAD repeats. All parameters except b are as in (b). Similar instability can occur when d is close to 0. (d) With a = 0, the ART 2 matching process differs from that which occurs when a is large; namely, the input pattern I is stronger, relative to the top-down pattern z_J, than in (b). All parameters except a are as in (b). Similar processing occurs when d is small (∼0.1) but not close to 0. (e) With θ = 0, the F₁ signal function f becomes linear. Without the noise suppression/contrast enhancement provided by a nonlinear f, the completely different inputs B and D are here placed in a single category. All parameters except θ are as in (b).

Download Full Size | PDF

Fig. 9 Alternative ART 2 architecture.

Download Full Size | PDF

Fig. 10 Alternative ART 2 architecture.

Download Full Size | PDF

Fig. 11 Recognition category summary for the ART 2 system in Fig. 10. System parameters and vigilance level are the same as in Fig. 3, which was generated using the ART 2 model of Fig. 6. Because of the constant I input to the orienting subsystem, the ART 2 system of Fig. 10 is here seen to be slightly more sensitive to pattern mismatch at a given vigilance level, all other things being equal.

Download Full Size | PDF

Table I. Corresponding Categories

View Table

References

1. S. Grossberg, “Adaptive Pattern Classification and Universal Recoding, II: Feedback, Expectation, Olfaction, and Illusions,” Biol. Cybern. 23, 187 (1976). [PubMed]

2. G. A. Carpenter and S. Grossberg, “Category Learning and Adaptive Pattern Recognition: a Neural Network Model,” in Proceedings, Third Army Conference on Applied Mathematics and Computing, ARO Report 86-1 (1985), pp. 37–56.

3. G. A. Carpenter and S. Grossberg, “A Massively Parallel Architecture for a Self-Organizing Neural Pattern Recognition Machine,” Comput. Vision Graphics Image Process. 37, 54 (1987). [CrossRef]

4. S. Grossberg, “Competitive Learning: from Interactive Activation to Adaptive Resonance,” Cognitive Sci. 11, 23 (1987). [CrossRef]

5. G. A. Carpenter and S. Grossberg, “ART 2: Stable Self-Organization of Pattern Recognition Codes for Analog Input Patterns,” in Proceedings First International Conference on Neural Networks, San Diego (IEEE, New York, 1987).

6. G. A. Carpenter and S. Grossberg, “Invariant Pattern Recognition and Recall by an Attentive Self-Organizing ART Architecture in a Nonstationary World,” in Proceedings First International Conference on Neural Networks, San Diego (IEEE, New York, 1987).

7. K. Hartley, “Seeing the Need for ART”, Sci. News 132, 14 (1987). [CrossRef]

8. P. Kolodzy, “Multidimensional Machine Vision Using Neural Networks,” in Proceedings, First International Conference on Neural Networks, San Diego (IEEE, New York, 1987).

9. S. Grossberg, Studies of Mind and Brain: Neural Principles of Learning, Perception, Development, Cognition, and Motor Control (Reidel, Boston, 1982).

10. A. L. Hodgkin and A. F. Huxley, “A Quantitative Description of Membrane Current and Its Applications to Conduction and Excitation in Nerve,” J. Physiol. London 117, 500 (1952).

ART 2: self-organization of stable category recognition codes for analog input patterns

Abstract

I. Adaptive Resonance Architectures

II. ART 1: Binary Input Patterns

III. ART 2: Analog Input Patterns