[crossfire] Protocol & compression.

Wed Mar 29 00:57:21 CST 2006

tchize wrote:

>>>
>>  This point is the real gotcha however - since the server can choose what data 
>> to compress, the question then becomes what portion of the commands are being 
>> compressed?
>> <sniped most>
>>
> I was unclear. I just meaned, the server would do something like
> <command x> and his datas
> <compressed> <datas>  (--> decompress to <command y> and it's data)
> <command y> and his datas

  Ok.  That is what I was thinking.  What I thought you were talking about was 
something like:

compress_start
command1 (compressed)
command2 (compressed)
...
compress_end

  type of setup, which then leads to the question how many of that data should 
be compressed.

>>  If we are going to do stream compression, I'd say we just compress everything 
>> we send to the client, and don't care about the cpu time and/or data that 
>> doesn't compress well.  That is the simpler approach, and could get done in 
>> relatively easily.
>>  
>>
> I don't agree, we can just have a flag telling 'can compress' when we
> send a command, and the socket writer will decide if it encapsulate it
> in a compression. However, once again if client assume data can be
> compressed or uncompressed, we can implement selective compression later
> and for now compress everything.

  To me, there are really 3 ways to deal with the compression:

1) Have Send_With_Handling compress the packet, if so requested.

Pros: Very simple to do - just a few lines of code to add.

Cons: Each compress is only 1 command, so multiple small commands wouldn't be 
combined together and compresed to save more space.

2) Middle approach - have Send_With_Handling queue all data that can be 
compressed.  The instance it gets called with data that shouldn't be compressed, 
it compresses everything it has queued up and sends that, then sends the 
uncompressed command.  There would also need to be a seperate flush_queue() that 
is called at the end of each tick to also flush this queued data.

Pros: Lets us combine various small packets into a single larger block to 
compress, thus getting better results (think a bunch of drawinfos).

Cons: Adds some level of complication, but not a huge amount (need another
buffer to store data to compress - this could be made a little simpler by having 
logic such that if the buffer would overflow, we compress that and then put the 
new block in that buffer to compress.  Also, if there are a lot of compressible 
packets interleaved with non compressible (image/smooth/image/smooth), you have 
small blocks to try and compress again.

3) Compress everything sent.  This should be done at a lower level (when we 
actually write to the socket).

Pros: Everything is compressed - interleaved data isn't a problem.

Cons: Stille harder to do - if we compress a block of data, but can only write 
half to the socket, we have to put the other have in a 'this data is compressed 
and send it next' buffer (current logic just moves the pointer in the ring 
buffer).  We may end up compressing more data than we need - isn't really a 
convenient way to turn on/off compression.

  Now my personal thought is point 1 is really easy to do, and doesn't really in 
any way prevent point 2 from happening or make it harder to do.  I'd personally 
start with the easiest solution and then start moving to more complicated 
solutions after we see how that does.

  For example, if hypothetically, it gets us 90% of the compression we could get 
compared to method #3, we may say 'that's good enough'  If it gets us 50%, then 
clearly we need to do more work.

> Don't agree, we might now be ok with zlib, but people might give a try
> to a few other algorithms and in 6 month someone comes with an algorithm
> that get 20% better compression with 5% less cpu overhead. That day we
> don't want to have an awfull hack in client/server protocol to handle
> this new algorithm. As you said, it's better to have a clean code, that
> also mean a clean protocol :)

  but I don't consider the setup commands to figure out compression a bad hack. 
  That code is all there already.

  The only really question would be should just a general 'compress' be used no 
matter what the compression method, or should each method have its own prefix 
(gz, bz2, lzop, whatever).  I personally lean to the second - client knows what 
compress method it will be getting.  But also, it then allows us to mix 
compression methods on the same socket.  Suppose the client supports every 
compression method the server does.  It could be that through experimentation, 
we know the method gz works best on maps, bz2 best on text, lzop best on 
something else.  So for best results, the code could perhaps optimize for that 
(but then as I type this in, that starts to get pretty complicated).

> 
> As currently all commands are words and data are either binary either
> text, i would suggest to use a very small command for compress header
> (so we don't lose all gain of compression :/) a simple character like #
> or @ or & should be enough
> 
> The would end, in
> S -> C # <compressed datas>

  Yes, a 1 or 2 character method would be best.  perhaps use Z, since z seems to 
be the standard letter for compression.  Then, it could be something like:

zg - zlib (gzip)
zb - bzip2
zo - lzop, etc.

> 
> I think it's good to have this
> C -> S: setup compress zlib,bzip2,rle
> S -> C: setup compress zlib

  Having a comma separated list in the setup command is different than current 
semantics.

  That setup, I think you could do:

C->S: setup compress zlib compress bzip2 compress rle

  And since the server process in that order, it would basically use that as 
preference (right most being one to use).  Hmmm.  I'll have see about doing that 
with the map commands