Sherlock Search Engine Interface
********************************

The protocol is very simple: connect to a TCP port defined in the configuration
file, send a single line of query and receive a reply followed by connection
close.

Query Language
~~~~~~~~~~~~~~
query = [selector] global-option* (expr | CONTROL "command")

selector =
   LIST [set]				list matches (only ID and Q)
 | SHOW [set]				select matches to show
 | STATS				show only statistics, no matches

global-option =
   DEBUG num				set debugging message mask (see below for list of flags)
 | CONTEXT n				ContextChars (per-query values of config switches; see cf/sherlock for description)
 | CONTEXT FULL				show as much context as possible
 | TITLELEN n				TitleChars
 | INTERVALS n				Intervals
 | SITEMAX n				SiteMax
 | PARTIAL n				PartialAnswer
 | APPROX n				AllowApprox
 | URLS n				ShowURLs
 | DB name [,...]			select databases to search in
 | SITE #id				search only in this site
 | EXPLAIN #oid				explain matching of a given document
 | <attr> (<|>|=|<=|>=|<>) value	matching of attributes (<> works only with set-matched attributes)
 | <attr> (=|<>) { set }		set matching of attributes (not all attributes support it, when in doubt, consult index.h and custom.h)
 | SORTBY [-] (<attr> | SITE | CARDID)	sort by given attribute on Q ties, reverse if "-"
 | SORTBY ... ONLY			sort only by given attribute, don't calculate Q
 | local-option				default value of a local option

local-option =
   ACCENTS n				AccentMode
 | MORPH n				Morphing
 | SPELL n				Spelling
 | SYN n				Synonyming
 | SYNEXP { set }			expand synonymic variants specified by their numbers
 | '/' weight				word weight

attr =
   <custom-attribute>			custom attributes as defined in custom.h
   AGE					document age in seconds
   FILETYPE				file type (values are either type names or numeric type ID's)
   LANG					document language (values are either language codes or numeric language ID's)
   AREA					index area (requires CONFIG_AREAS)

expr =
   molecule
 | expr AND expr			usual boolean operators
 | expr OR expr

molecule =
   NOT* ( expr ) local-option*
 | NOT* ANY				matches any document
 | atom
 | atom . atom . ...			simple search expression

atom =
   (NOT* | MAYBE) [typelist] "word-or-phrase" local-option*

set = range [, ...]
range = n | [n] '..' [n]

value =					attributes can have various values ...
   "string"				some are strings
 | number				some are numbers
 | { set }				some are sets of strings/numbers

Reply Format
~~~~~~~~~~~~
reply = status (header \n (card \n)+ ("+\n" footer)?)

status:
+code reply		Positive reply (always contains header)
-code reply		Negative reply (fatal error) (header needn't be present)

header:
<db-header>		(once for each database we did look for answer in)
Cage			Cached reply with specified age in seconds
Nnum			Total number of objects in the heap
Vversion		Search server version

db-header:
(D
Dname			Results for database <name> follow (this is always the first item)
Wword refs reflen matches stat key
			Found <matches> matching words and <refs> references with total size <reflen> KB
			for input word <word> with status <stat> (0=ok, else error code)
			(<key>,<word>) uniquely determines the word for purposes of merging
			word notes in the front-end.
P"phrase" refs		Found <refs> documents containing <phrase>
n"near" refs		Found <refs> documents containing near match for <near>
Tnum			Total number of matching objects
ttype1=num1 ...		Total number of matching objects for each file type (even counting those not matching the required type if CONFIG_COUNT_ALL_FILETYPES)
Sright wrong pts	Spelling checker: suggesting word <right> with <pts> similarity points instead of <wrong>
Ysyn var orig id	Synonymum: <syn> is a synonymum with id <id> for a variant <var> of query word <orig>
llemma orig		Found lemma <lemma> for word <orig>
N,I,L			Database parameters (see the "databases" control command below)
-code reply		Error occured during search in this database, but partial answers are enabled
)

footer:
ttime			Query processing time in milliseconds
T...			Detailed timing information (debugging only; format unspecified)

card: see doc/objects
X and M are tagged with XML-like tags (see below)

Control commands
~~~~~~~~~~~~~~~~

databases		Show a list of databases

   reply = (<db> \n)+
   db:
	Dname		Database name
	Ncards		Number of cards
	Iobjects	Number of input objects (i.e., cards+dups+redirects+anything thrown away)
	Ltime		Time (time_t) of last modification
	Ssize		Database size in kilobytes
	Wcount		Number of words indexed
	Ccount		Number of complexes indexed
	Ucount		Number of strings (URL's and similar stuff) indexed

XML-like formatted text
~~~~~~~~~~~~~~~~~~~~~~~

<block char1=position char2=position context=length> ... </block>
			envelope around one interval
<text>, <emph>, <small>, <title>, <smallhead>, <bighead>
			text style
<alt>			alternative description of an image
<title>, <keyword>, <meta>
			title type
<best>word</best>	the best matched words
<found>word</found>	hilited word
<break>			paragraph break
<ref id=number> ... </ref>
			this text is a hyperlink

To simplify the front-end parser, all meta-attributes (denoted by M)
are guaranteed to fit into one line.  Beware that the length of the
line can be of order one thousand.  The text of the card (denoted by X)
is formatted into lines of approximate length 80.  Line-breaks occure
solely on spaces.

Attributes generated by debug switches
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.n	near-matcher info (2)
.s	simple query transform: original query (2)
.t	simple query transform: after word analysis (2)
.x	simple query transform: result (2)
.A	query after analysis pass (2)
.C	custom attribute (16)
.E	explaination (EXPLAIN)
.I	initial form of query (2)
.K	secondary sort key (16)
.M	morphological expansion (8)
.N	normalized query (2)
.O	optimistic version of query (2)
.P	phrase info (2)
.R	result note (16)
.S	query just before analysis (2)
.W	word info (2)
.U	magic word merges (8)
.X	various stages of phrase processing (8)
.Y	synomic expansion (8)
.Z	spelling checker (8)
.<SP>	general message

Error Codes
~~~~~~~~~~~
000	OK
0xx	Various shortcuts [sherlockd internal]
100	Refusing to talk to you
101	Request too long
102	Parse error
103	Too many words
104	All documents match
105	Too many word matches
106	Too many documents requested
107	Input timeout
108	Read error
109	Invalid command
110	Phrase too complex
111	Pattern too long
112	Invalid URL
113	Maximum wildcard zone size exceeded
114	Wildcard prefix too short
115	Word too long
116	Word not indexed
117	Too many references
118	Boolean expression too complex (after internal expansions)
119	Simple search expression contains only non-indexed words
200	All search servers are down [mux]
201	Multiplexer overloaded (MaxChildren reached) [mux]
202	Input timeout [mux]
203	SearchServer timeout [mux]
204	All search servers are overloaded [mux]
3xx	Used internally by sherlockd for non-fatal versions of 1xx
9xx	Reserved for internal use by front-ends
