Roadmap of the Indexer:

Boxes are programs, brackets denote file names, respectively the corresponding configuration
directives which set the names. See doc/file-formats for formats of the files.

		  	       	       	 buckets
		  	      		    |
+-------------------------------------------+
|		  	      		    |
|		       	      	       +---------+
|                                      | Scanner |-------->[Parameters] (also touched by mklex)
|		       	      	       +---------+
|      	       	       	       	       	|||||||||
|    ##################################################################################################################
|    |	       	|     	       	|	   |                |                        |          |            |        |
| [Links] [Fingerprints] [Attributes] [LabelsByID]      [URLList]                [Merges] [Checksums] [Signatures] [LinkTexts]
|    |          |      	 [Notes]|      	   |   	            |                        |          |            |        |
|    | 	    +--------+ 	+-------|----------|----------------|------------------------|------+   +----+       +----+   |
|    |	    | fpsort | 	| +-----|----------|----------------|------+       +-------1>|      |        |            |   |
|    | 	    +--------+ 	| |    	|      	   |   	     +------+---+--|-----+ |   	     | 	    |  	     | 	       	  |   |
|    | 	        |      	| |    	|          |         |      |   |  |     | | +------>| +---------+ +-----------+  |   |
|    |	  	|      	| |   	|          | +------------+ | +------+   | | |       | | mergefp | | mergesums |  |   |
|    |	 [Fingerprints] | |   	|          | | sitefinder | | | back |---|-+ |  +---<+ +---------+ +-----------+  |   |
|    | 	       	|      	| |     |          | +------------+ | |linker|   | | |  |    |      |        |            |   |
|    +------+  	+-------+ |     |          |    |     |     | +------+   | +-|--|--2>|      |        | +------------+ |
|           |  	|      	  |    	|<---------|----+  [Sites]  |     | 2    |   |  |    |<-----+        | | mergesigns | |
|     	 +---------+   	  |    	|          |            +-------+ | |    |   |  |    |<--------------+ +------------+ |
|	 | mkgraph |   	  |    	|          | [Catalog]->| oook  |-|-|->--|---+  |    |                       |    |   |
|	 +---------+   	  |     |          |            +-------+ | |    |      |    |<----------------------|----+   |
|             |           |     |          |              | | |   | |    |      |    |                       |        |
|	      |	       	  |     |<---------|--------------+ | +---|-|-+  |      |    |            +----------|--------+
|        [LinkGraph]   	  | +-->|          |<---------------+     | | | +----------+ |            |          |
|      	      |	       	  | |  	+>---------|-------------------->-|-|-|-| keywords | | 	       	  |	 [Matches]
|             |           | |   |          |                      | | | +----------+ |            |
|	      |	      	  | |	|	   |<---------------------|-|-|----+  |	     |		  |
|	      |	       	  | |   |<-12------|----------------------+ | |       |      |            |
|	      +->---------+ |  	|          |<--2--------------------+ |<------+   +->|      +----------+
|             |             |   |          |             +--------+   |           |  +------| reftexts |
|	      |	      	    |  	|<---------|-------------| merger |---|-----------+  |      +----------+
|      	 +---------+   	    |   |          |             +--------+   |              |            |
|	 | weights |  	    |  	|          |<-------------------------|--------------|------------+
|      	 +---------+  	    |  	|          |  +-----------------+-----|--------------+
|      	    | |	       	    |  	+----------|--|---------------+	|     |
|      	    | +-------------+  	|          |  |        	      |	| [Keywords]
|	    |	      		|      	   |  |               |	|
|       [Weights]               |      	   |  |               | |
|	     	      		|     	   |  |               |	|
|	     	      		|     	   |  |               |	|
|      	       	       	       	|      	   |  |        	      |	|
|		      		|     	   |  |	      	      |	|
|		      		|     +-----------+   +----------+
|      	       	       	       	|     | labelsort |   |	feedback |
|		      		|     +-----------+   +----------+
|		      		|           |	      	   |
|      	       	       	   [Attributes]  [Labels]      [Feedback]
|		      	       	|    	    |
+------+------------------+     |           |
       |  +---------------|-----+           |
       |  |  +------------|--+--|-----------+
       |  |  | 	          |  |  |
     +---------+        +-----------+                  +-------+
     | 	mklex  |        |           |                  |       |-------->[StringMap]
     +---------+        |           |-->[StringIndex]->| ssort |-------->[StringHash]
       	  |    	        |           |                  |       |---+
       [LexRaw]	        |  chewer   |                  +-------+   |
          |        +--->|           |                              +---->[References]
  +------------+   |    |           |                  +-------+   |
  |  lexorder  |   |    |           |-->[WordIndex]--->|       |---+	      +------+
  +------------+   |    |           |                  | wsort |--[LexWords]->| lex  |-->[Stems]
    |        |     |    +-----------+              +-->|       |        +---->| sort |-->[Lexicon]
    |   [LexOrder] |      |   |   |                |   +-------+        |     +------+
[StemsOrder] | 	   |  [Cards] | [CardAttributes]   |   		       	|
    |	     +-----+	      |	       	       	   |   +-------+	|
    |	     | 	    	      +---[CardPrints]-------->| psort |------------>[CardPrints]
    |	     |	    		       	       	   |   +-------+ 	|
    |  	     +-------------------------------------+   	   		|
    +-------------------------------------------------------------------+

Terminology:

object		document as supplied by the gatherer
bucket		the place inside gatherer's database where the object lives
fingerprint	a 96-bit hash we use instead of URL's for convenient handling
checksum	a 128-bit hash of document contents (we assume different documents
		always get different hashes)
index card	a sequence of labels describing an object in the index
label		a piece of data attached to an object, corresponds to object
		attribute inside the gatherer (both carry one-letter identifiers
		from the same set, see doc/objects for a list). The label files
		we build contain only so called "movable" labels which correspond
		to a single occurence of the document (i.e., an URL), not the
		document itself.
attributes	a small structure containing basic information about the index card
		which is intended to be accessed very quickly (we assume attributes
		of all the cards fit in the memory at once)
word		a text word together with its category
string		anything we index and which is not a word (i.e., an URL) + category
lexicon		a list of all known words including some data about them
