The rationale behind the prescription is based on the desire to capture, in the value of the weights, local correlations between node outputs when the net is in one of the required stable states. Recall that these correlations also gave rise to the energy description and the same kind of arguments will be used again.
Consider two nodes which, on average over the required pattern set, tend to take on the same value. That is, they tend to form either the pair (0, 0) or (1, 1). The latter pairing will be reinforced by there being a positive weight between the nodes, since each one is then making a positive contribution to the others activation which will tend to foster the production of a `1' at the output. Now suppose that the two nodes, on average, tend to take on opposite values. That is they tend to form either the pair (0, 1) or (1, 0). Both pairings are reinforced by a negative weight between the two nodes, since there is a negative contribution to the activation of the node which is `off' from the node which is `on', supporting the former's output state of `0'. Note that, although the pairing (0, 0) is not actively supported by a positive weight per se, a negative weight would support the mixed output pair-type just discussed.
These observations may be encapsulated mathematically in the following way. First we introduce an alternative way of representing binary quantities. Normally these have been denoted by 0 or 1. In the polarised or spin representation they are denoted by -1 and 1 respectively, so there is the correspondence . Now let be components of the pth pattern to be stored where these are in the spin representation. Consider what happens if the weight between the nodes i and j is given by
Where the sum is over all patterns p to be stored. If, on average, the two components take on the same value then the weight will be positive since we get terms like and predominating. If, on the other hand, the two components, on average, take on opposite values we get terms like and predominating which gives a negative weight. This is just what was required according to the arguments given above. Equation (1) is therefore the storage prescription used with Hopfield nets. Note that, the same weights would accrue if we had tried to learn the inverse of the patterns formed by taking each component of every pattern and changing it to the opposite value. The net therefore, always learns the patterns and their inverses.