[crossfire] Protocol & compression.

Tue Mar 28 01:26:15 CST 2006

tchize wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Just a note about the suggested
> s->c compressed <data>
> c->s compressed <data> (yeah imho shoud go in both directions)

  I'm not sure if there is anything to really gain by the client compressing the 
data.  I can't really think of many times the client is actually sending enough 
data that compression will do any good what so ever.  The one exception might be 
very long chat/shout/say messages, but even then, that seems fairly unlikely.

  Granted, probably wouldn't be that hard to add, but does add some at least 
minimal complication, and if it doesn't gain us anything, I'd rather avoid that.

> 
> if we assume data contains a compressed list of 1 or more commands, i
> think the question on what need and what need not to be compressed os
> not immediate and we can have various attempt with various server /
> client version. The fact of assuming any command can be compressed by
> algorithm X but that not all commands will be compressed, is enough to
> write this protocol add-on. Then it's a matter of tuning the
> compression triggers, but this can be done without breaking
> interaction with previous versions. So imho, the 'when' to compress
> should not be fixed in protocol, only the how is to be fixed.

  correct.  That was my assumption - the server would figure out what it thinks 
is worthy of compression.  A server with gobs of bandwidth but not a lot of cpu 
time could decide that nothing is worth compressing.

> One of the most important question is which algorithm do we use? You
> said you gave a try with zlib, but which algorithm does it uses? Does
> it involves a dictionnary? If yes, do we plan to reset the dictionnary
> content between each compress command or do we plan to keep it from
> begin to end?

  I used the 'compress' function of zlib:
http://www.zlib.net/manual.html#compress

  Since each call to compress is self contained, the compressed data then 
includes all the dictionary or other info necessary.  Note that given crossfire 
will be working on multiple sockets, we can't use a library that holds any 
state.  And if there are new structures needed to hold state, this starts to 
increase the complication level some.

> 
> My opinion currently is
>  - assume client can receive any data in compressed mode, but that not
> all datas are compressed (the sender has choice for every command)
>  - assume the compression stream as something interleaved with not
> compressed stream, identified by a specific marker (compressed)
>  - assume the compressed stream continuous (same 'zlib/any chosen
> algorithm' session, usefull considering number of repeated text messages)

  This point is the real gotcha however - since the server can choose what data 
to compress, the question then becomes what portion of the commands are being 
compressed?

  If less than say 50%, then we're better of going with explicit 'this data is 
compressed' than continual stream, as we're then spending more bandwidth on the 
transitions than we would just be prefixing.

  Also note that the real question comes where do we do the compression.  It is 
trivially easy to modify Send_With_Handling to take another flag (compressible) 
and compress the data and send it along.  It is harder to do it as a stream 
method, especially if we want to turn of compression.

  This is because if you want to do it as a stream, easiest approach then is to 
do the compression before writing to the actual socket.  But by the point, all 
we have in the input buffer is just a bunch of bytes - we have no idea if some 
should or should not be compressed.

  If we were to go that approach, the socket code basically has to be rewritten. 
  The best method would be to basically have a list of SockList structs, and add 
to that a field for the compress byte.

  Then when we go to send the actual data, the server could parse all the 
SockList, and based on current state of socket and the compress flag, figure out 
what to do (if for example, socket is in non compressed state and socklist says 
don't compress, just send it along.  states don't match, send command to 
transition state.  Compress all socklists with compress flag set as a single 
stream, etc.

  But this starts to lead down a pretty slippery slope.  In that model, you 
almost want to start reorganizing the packets - for example, right now, with the 
map commands, you are going to often get images sent (non compressible) right 
next to smoothing information (compressible).  Ideally, you want to organize all 
the face data together as one non compressible block, and all the smooth 
information as a compressible block.  But re-arranging order of commands starts 
to get tricky - the client is currently coded (and says so in the spec) that it 
will get image info before that image is referenced, so we can't re-order image 
commands after map command.  But knowing what to reorder and what not is where 
this cans of worms is.

  As I said before, at current time, I'm much more interested in code 
cleanliness and simple code than getting things too complicated.  Especially 
because complicated code tends to take longer to write, and if it never gets 
done, there isn't much point.

  If we are going to do stream compression, I'd say we just compress everything 
we send to the client, and don't care about the cpu time and/or data that 
doesn't compress well.  That is the simpler approach, and could get done in 
relatively easily.

>  - assume setup negociate the algorithm (client say i support x,y,z
> then server send ok, let's go for y algorithm, this is how http does it)

  Yes, but IMO, I don't see the need for more than one algorithm.  I'd say we 
just standardize on zlib.

  Sure, there may be other libraries which do marginally better.  But once 
again, is it worth while to have another compression method that might be 
marginally better (in terms of code complication, library support, etc).  If 
there is some library that is clearly better, or one is written, we could easily 
enough add support for it at that point.  One could very well see things like:

gzstart
....
gzend

and

bz2start
....
bz2end

  And so on.   In many cases, the support for different methods is more for 
backwards compatiblity (at the time, A was best, but now B is better).  We're 
not at that point, since we don't even have A.

  I'd also think that we could use the method that is done right now for the map 
commands.  Basically, it goes like:

C->S: setup map2cmd 1
S->C: setup map2cmd false
C->S: setup map1acmd
S->C: setup map1acmd false
C->S: setup map1cmd
S->C: setup map1cmd true

  (if the client sends true back for an earlier revision, the client stops its 
fallback method).  That logic would find for compress methods down the road 
(setup compression gz, if that doesn't work, setup compression bz2, setup 
compression whatever).