Clever Algorithms: Nature-Inspired Programming Recipes

A book by Jason Brownlee

Self-Organizing Map

Self-Organizing Map, SOM, Self-Organizing Feature Map, SOFM, Kohonen Map, Kohonen Network.

Taxonomy

The Self-Organizing Map algorithm belongs to the field of Artificial Neural Networks and Neural Computation. More broadly it belongs to the field of Computational Intelligence. The Self-Organizing Map is an unsupervised neural network that uses a competitive (winner-take-all) learning strategy. It is related to other unsupervised neural networks such as the Adaptive Resonance Theory (ART) method. It is related to other competitive learning neural networks such as the the Neural Gas Algorithm, and the Learning Vector Quantization algorithm, which is a similar algorithm for classification without connections between the neurons. Additionally, SOM is a baseline technique that has inspired many variations and extensions, not limited to the Adaptive-Subspace Self-Organizing Map (ASSOM).

Inspiration

The Self-Organizing Map is inspired by postulated feature maps of neurons in the brain comprised of feature-sensitive cells that provide ordered projections between neuronal layers, such as those that may exist in the retina and cochlea. For example, there are acoustic feature maps that respond to sounds to which an animal is most frequently exposed, and tonotopic maps that may be responsible for the order preservation of acoustic resonances.

Strategy

The information processing objective of the algorithm is to optimally place a topology (grid or lattice) of codebook or prototype vectors in the domain of the observed input data samples. An initially random pool of vectors is prepared which are then exposed to training samples. A winner-take-all strategy is employed where the most similar vector to a given input pattern is selected, then the selected vector and neighbors of the selected vector are updated to closer resemble the input pattern. The repetition of this process results in the distribution of codebook vectors in the input space which approximate the underlying distribution of samples from the test dataset. The result is the mapping of the topology of codebook vectors to the underlying structure in the input samples which may be summarized or visualized to reveal topologically preserved features from the input space in a low-dimensional projection.

Procedure

The Self-Organizing map is comprised of a collection of codebook vectors connected together in a topological arrangement, typically a one dimensional line or a two dimensional grid. The codebook vectors themselves represent prototypes (points) within the domain, whereas the topological structure imposes an ordering between the vectors during the training process. The result is a low dimensional projection or approximation of the problem domain which may be visualized, or from which clusters may be extracted.

Algorithm (below) provides a high-level pseudocode for preparing codebook vectors using the Self-Organizing Map method. Codebook vectors are initialized to small floating point values, or sampled from the domain. The Best Matching Unit (BMU) is the codebook vector from the pool that has the minimum distance to an input vector. A distance measure between input patterns must be defined. For real-valued vectors, this is commonly the Euclidean distance:

$dist(x,c) = \sum_{i=1}^{n} (x_i - c_i)^2$

where $n$ is the number of attributes, $x$ is the input vector and $c$ is a given codebook vector.

The neighbors of the BMU in the topological structure of the network are selected using a neighborhood size that is linearly decreased during the training of the network. The BMU and all selected neighbors are then adjusted toward the input vector using a learning rate that too is decreased linearly with the training cycles:

$c_i(t+1) = learn_{rate}(t) \times (c_i(t) - x_i)$

where $c_i(t)$ is the $i^{th}$ attribute of a codebook vector at time $t$, $learn_{rate}$ is the current learning rate, an $x_i$ is the $i^{th}$ attribute of a input vector.

The neighborhood is typically square (called bubble) where all neighborhood nodes are updated using the same learning rate for the iteration, or Gaussian where the learning rate is proportional to the neighborhood distance using a Gaussian distribution (neighbors further away from the BMU are updated less).

Input: InputPatterns, $iterations_{max}$, $learn_{rate}^{init}$, $neighborhood_{size}^{init}$, $Grid_{width}$, $Grid_{height}$
Output: CodebookVectors
CodebookVectors $\leftarrow$ InitializeCodebookVectors{$Grid_{width}$, $Grid_{height}$, InputPatterns}
For ($i=1$ To $iterations_{max}$)
    $learn_{rate}^{i}$ $\leftarrow$ CalculateLearningRate{$i$, $learn_{rate}^{init}$}
    $neighborhood_{size}^{i}$ $\leftarrow$ CalculateNeighborhoodSize{$i$, $neighborhood_{size}^{init}$}
    $Pattern_i$ $\leftarrow$ SelectInputPattern{InputPatterns}
    $Bmu_i$ $\leftarrow$ SelectBestMatchingUnit{$Pattern_i$, CodebookVectors}
    Neighborhood $\leftarrow$ $Bmu_i$
    Neighborhood $\leftarrow$ SelectNeighbors{$Bmu_i$, CodebookVectors, $neighborhood_{size}^{i}$}
    For ($Vector_i$ $\in $Neighborhood)
        For ($Vector_{i}^{attribute}$ $\in$ $Vector_i$)
            $Vector_{i}^{attribute}$ $\leftarrow$ $Vector_{i}^{attribute}$ $+$ $learn_{rate}^{i}$ $\times$ ($Pattern_{i}^{attribute}$ $-$ $Vector_{i}^{attribute}$)
        End
    End
End
Return (CodebookVectors)

Pseudocode for the SOM.

Heuristics

The Self-Organizing Map was designed for unsupervised learning problems such as feature extraction, visualization and clustering. Some extensions of the approach can label the prepared codebook vectors which can be used for classification.
SOM is non-parametric, meaning that it does not rely on assumptions about that structure of the function that it is approximating.
Real-values in input vectors should be normalized such that $x \in [0,1)$.
Euclidean distance is commonly used to measure the distance between real-valued vectors, although other distance measures may be used (such as dot product), and data specific distance measures may be required for non-scalar attributes.
There should be sufficient training iterations to expose all the training data to the model multiple times.
The more complex the class distribution, the more codebook vectors that will be required, some problems may need thousands.
Multiple passes of the SOM training algorithm are suggested for more robust usage, where the first pass has a large learning rate to prepare the codebook vectors and the second pass has a low learning rate and runs for a long time (perhaps 10-times more iterations).
The SOM can be visualized by calculating a Unified Distance Matrix (U-Matrix) shows highlights the relationships between the nodes in the chosen topology. A Principle Component Analysis (PCA) or Sammon's Mapping can be used to visualize just the nodes of the network without their inter-relationships.
A rectangular 2D grid topology is typically used for a SOM, although toroidal and sphere topologies can be used. Hexagonal grids have demonstrated better results on some problems and grids with higher dimensions have been investigated.
The neuron positions can be updated incrementally or in a batch model (each epoch of being exposed to all training samples). Batch-mode training is generally expected to result in a more stable network.
The learning rate and neighborhood size parameters typically decrease linearly with the training iterations, although non-linear functions may be used.

Code Listing

Listing (below) provides an example of the Self-Organizing Map algorithm implemented in the Ruby Programming Language. The problem is a feature detection problem, where the network is expected to learn a predefined shape based on being exposed to samples in the domain. The domain is two-dimensional $x,y \in [0,1]$, where a shape is pre-defined as a square in the middle of the domain $x,y \in [0.3,0.6]$. The system is initialized to vectors within the domain although is only exposed to samples within the pre-defined shape during training. The expectation is that the system will model the shape based on the observed samples.

The algorithm is an implementation of the basic Self-Organizing Map algorithm based on the description in Chapter 3 of the seminal book on the technique [Kohonen1995]. The implementation is configured with a $4 \times 5$ grid of nodes, the Euclidean distance measure is used to determine the BMU and neighbors, a Bubble neighborhood function is used. Error rates are presented to the console, and the codebook vectors themselves are described before and after training. The learning process is incremental rather than batch, for simplicity.

An extension to this implementation would be to visualize the resulting network structure in the domain - shrinking from a mesh that covers the whole domain, down to a mesh that only covers the pre-defined shape within the domain.

def random_vector(minmax)
  return Array.new(minmax.size) do |i|
    minmax[i][0] + ((minmax[i][1] - minmax[i][0]) * rand())
  end
end

def initialize_vectors(domain, width, height)
  codebook_vectors = []
  width.times do |x|
    height.times do |y|
      codebook = {}
      codebook[:vector] = random_vector(domain)
      codebook[:coord] = [x,y]
      codebook_vectors << codebook
    end
  end
  return codebook_vectors
end

def euclidean_distance(c1, c2)
  sum = 0.0
  c1.each_index {|i| sum += (c1[i]-c2[i])**2.0}
  return Math.sqrt(sum)
end

def get_best_matching_unit(codebook_vectors, pattern)
  best, b_dist = nil, nil
  codebook_vectors.each do |codebook|
    dist = euclidean_distance(codebook[:vector], pattern)
    best,b_dist = codebook,dist if b_dist.nil? or dist<b_dist
  end
  return [best, b_dist]
end

def get_vectors_in_neighborhood(bmu, codebook_vectors, neigh_size)
  neighborhood = []
  codebook_vectors.each do |other|
    if euclidean_distance(bmu[:coord], other[:coord]) <= neigh_size
      neighborhood << other
    end
  end
  return neighborhood
end

def update_codebook_vector(codebook, pattern, lrate)
  codebook[:vector].each_with_index do |v,i|
    error = pattern[i]-codebook[:vector][i]
    codebook[:vector][i] += lrate * error
  end
end

def train_network(vectors, shape, iterations, l_rate, neighborhood_size)
  iterations.times do |iter|
    pattern = random_vector(shape)
    lrate = l_rate * (1.0-(iter.to_f/iterations.to_f))
    neigh_size = neighborhood_size * (1.0-(iter.to_f/iterations.to_f))
    bmu,dist = get_best_matching_unit(vectors, pattern)
    neighbors = get_vectors_in_neighborhood(bmu, vectors, neigh_size)
    neighbors.each do |node|
      update_codebook_vector(node, pattern, lrate)
    end
    puts ">training: neighbors=#{neighbors.size}, bmu_dist=#{dist}"
  end
end

def summarize_vectors(vectors)
  minmax = Array.new(vectors.first[:vector].size){[1,0]}
  vectors.each do |c|
    c[:vector].each_with_index do |v,i|
      minmax[i][0] = v if v<minmax[i][0]
      minmax[i][1] = v if v>minmax[i][1]
    end
  end
  s = ""
  minmax.each_with_index {|bounds,i| s << "#{i}=#{bounds.inspect} "}
  puts "Vector details: #{s}"
  return minmax
end

def test_network(codebook_vectors, shape, num_trials=100)
  error = 0.0
  num_trials.times do
    pattern = random_vector(shape)
    bmu,dist = get_best_matching_unit(codebook_vectors, pattern)
    error += dist
  end
  error /= num_trials.to_f
  puts "Finished, average error=#{error}"
  return error
end

def execute(domain, shape, iterations, l_rate, neigh_size, width, height)
  vectors = initialize_vectors(domain, width, height)
  summarize_vectors(vectors)
  train_network(vectors, shape, iterations, l_rate, neigh_size)
  test_network(vectors, shape)
  summarize_vectors(vectors)
  return vectors
end

if __FILE__ == $0
  # problem configuration
  domain = [[0.0,1.0],[0.0,1.0]]
  shape = [[0.3,0.6],[0.3,0.6]]
  # algorithm configuration
  iterations = 100
  l_rate = 0.3
  neigh_size = 5
  width, height = 4, 5
  # execute the algorithm
  execute(domain, shape, iterations, l_rate, neigh_size, width, height)
end

Self-Organizing Map in Ruby

Download: som.rb.

References

Primary Sources

The Self-Organizing Map was proposed by Kohonen in 1982 in a study that included the mathematical basis for the approach, summary of related physiology, and simulation on demonstration problem domains using one and two dimensional topological structures [Kohonen1982]. This work was tightly related two other papers published at close to the same time on topological maps and self-organization [Kohonen1981] [Kohonen1982b].

Learn More

Kohonen provides a detailed introduction and summary of the Self-Organizing Map in a journal article [Kohonen1990a]. Kohonen et al. provide a practical presentation of the algorithm and heuristics for configuration in the technical report written to accompany the released SOM-PAK implementation of the algorithm for academic research [Kohonen1996a]. The seminal book on the technique is "Self-Organizing Maps" by Kohonen, which includes chapters dedicated to the description of the basic approach, physiological interpretations of the algorithm, variations, and summaries of application areas [Kohonen1995].

Bibliography

[Kohonen1981]	T. Kohonen, "Automatic formation of topological maps of patterns in a self-organizing\n\tsystem", in Proceedings of 2nd Scandinavian Conf. on Image Analysis, 1981.
[Kohonen1982]	T. Kohonen, "Self-organized formation of topologically correct feature maps", Biological Cybernetics, 1982.
[Kohonen1982b]	T. Kohonen, "Clustering, Taxonomy, and Topological Maps of Patterns", in International Conference on Pattern Recognition, 1982.
[Kohonen1990a]	T. Kohonen, "The self-organizing map", Proceedings of the IEEE, 1990.
[Kohonen1995]	T. Kohonen, "Self-Organizing Maps", Springer, 1995.
[Kohonen1996a]	T. Kohonen and J. Hynninen and J. Kangas and J. Laaksonen, "SOM–PAK: The Self-Organizing Map Program Package", Technical Report A31, Helsinki University of Technology, Laboratory of Computer and Information\n\tScience, 1996.

Please Note: This content was automatically generated from the book content and may contain minor differences.