[crossfire] redoing sound

Mon Jun 18 02:01:01 CDT 2012

  This is mostly just a capturing of what was discussed on IRC - this is mostly 
a record of what was discussed just so there is a record.

  The basic problem is that the sound logic is mostly hard coded into the 
server, so adding a new sound is not a trivial matter - this attempts to fix 
that.  Note that this does not 100% of cases for sound, but hopefully gets 90%. 
  The other 10% are probably getting into fairly specific situations which are 
hard to generalize, and could be done with plugin scripts.

== Server/Arch sound definition ==

  Sounds we be specified in the archetype/object, in the format of:

sound <event type> <chance> <volume> <range> <name1>...<name n>

Multiple sound lines could be specified for different events.  If the same event 
is specified, the new line overrides the old one (this would typically be used 
in cases where an objects is overriding the archetype value)

  <event type> would match the events in plugin.h.  This keeps things simpler, 
as the different events are already defined, and also makes it easier to code as 
one knows where to look for the different events.  Only the events that 
correspond to objects could be specified - not sure what, if anything, to do for 
global events.  Note that a special event type like 'continuous' probably needs 
to be added for objects that constantly make sound, like fountains.

<chance> is the chance to play a sound when the event is triggered, in 
percentage.  If an object sets this to 0, that is basically saying to clear the 
event.  This can mostly be used to control the amount of sound - as an example, 
orcs are often used in large numbers, so sounds associated with them may have a 
20% chance value, on the basis that if there are a bunch of orcs, you get a few 
sounds played, not 20 sounds played.

<volume> is relative volume to play the sound out, with 100 being normal volume. 
  Where I see this being useful is where the same sound is being used for 
similar events.  For example, the small/medium/large fireball may all use the 
same sound, small being volume 75, medium 100, large 125.  Max value here would 
be 200.

<range> is how many spaces the sound can be heard for.  For simplicity, walls 
will be ignored, so it is just a simple 'is this object within X spaces'.  But 
this is also used by the client to determine how loud the object should be, for 
example, an object with a 20 sound range should be louder than an object with a 
5 sound range even if both objects are 5 spaces away.

<name> is the name of one or more sound files (excluding the suffix).  Which one 
played is determined randomly, with equal chance.  This is mostly used to add 
variety.

I can quickly think of a few cases this does not cover, but would probably be 
easy to do in scripts - ability to weigh the sounds differently, so that 99% the 
first sound is played for an event, and 1% the second sound.  Also, being very 
specific on when sounds are played, eg, every fifth time this object is applied, 
play a sound, not 20% of the time.  Specific combinations might be another - 
when this object is used against this monster, this special sound is used.  All 
of these are specialized enough that having general logic is overkill, but not 
that hard to do in scripts, and is likely uncommon enough that doing it in 
scripts would not be a big performance problem.

== Protocol to Client ==

  The sound2 command is currently defined in the protocol as:
S->C: sound2 <x><y><dir><volume><type><action><name>

x/y are offset from player, direction is direction the sound is moving/facing, 
volume is just that, type is major sound type, action is the action, and name is 
the name of the sound.

  I would suggest this for replacement:
S->C: sound3 <x><y><tag><dir><volume><range><name>

<tag> is added so that the client can keep track if the sound associated with an 
object is changing (for example, the sound associated with a monster).  It might 
also want to use it to sound elimination - for example, if just last tick that 
object attacked and that sound is still playing, you may not want to play it 
again - but this is really up to the client to determine what to do.

Range is needed, as the client has to know how far the sound can be heard to be 
able to determine what the volume is.  The volume attribute is passed through as 
is defined in the archetype (it is up to the client to determine actual value 
based on offset and max range).  The type and action go away - in the sound 2, 
the action would be something like 'turn handle' and the name would be the 
object name, and the client would then figure out appropriate sound to play in 
that case.  Since in this revised system, we know exactly what sound to play (it 
being specified in the object) and each action has a sound associated with it, 
having both of those seems unnecessary.

  Note that since all object defined sounds correspond to an object (by 
definition), these are all played only on the local map.

== Client Sound Retrieval ==

Ideally, most sounds the client needs should be included in the client bundle or 
one that goes with it.  But if a server adds a sound, the client has to know how 
to get it.

Through a requestinfo made by the client, the server will return a URL (perhaps 
several) were the sound can be retrieved from.  I'm not quite sure the exact 
format of this - ideally, there should be some primary/secondary method.

What I mean by that is that if I am running my own server at home with limited 
bandwidth, I might want to say a global repository (which has fast connection 
and all the base sounds) is where the client should go first, but if the sound 
isn't found there (because I have added a custom sound), than try this URL which 
is on my home system.

  But it also seems likely that a setup might be several fast but basic servers, 
and several slow but complete servers for sounds, so it might be nice to be able 
to note that somehow.

  The one thing I would note is that if one has those 2 classes of servers, it 
should be considered that all servers in class have the same data.  What I mean 
by that is that if the sound file isn't found on the first fast but incomplete 
server, there is no reason trying the second, third, etc fast but incomplete 
servers, you should just move directly to the slow be complete.  But if the 
first slow but complete server does not have the file, it should be assumed that 
none of them have it (note that if it gets an error connecting to the server, or 
a bandwidth exceeded or something, then trying the next one may make sense)

So perhaps the requestinfo has something like:
basic_sound_urls=http://host1/... http://host2/...
complete_sound_url=http://myhost3/... http://myhost4/...

  and the client will just take a basic line and complete one at random to use. 
  I don't really want to make this complicated for retrieval of files, but I 
also want to try to cover what I think is a fairly likely scenario.

  Note that the URL should probably include a %s to note where to put the sound 
file, as I could see some URL like:

http://myhost3/crossfire/svn/sounds/%s&fetchfile

  or the like.

  For first phase, sound files will be wav format.  It is up to the client to 
put the .wav suffix on the name when requesting the file.  If other formats are 
added in the future (.ogg, .mp3, .whatever), the client would just request its 
preferred format, perhaps using some fallback mechanism if that first format is 
not available.

  The client will use URL attributes (hash, file size, last modification time) 
to try and detect if new versions of the sound files are available.  This needs 
further investigation to see what is easy to do.

== Other Thoughts ==

These are perhaps something for phase 2 or maybe never.

Global Sounds: There are several global events define which you probably want to 
be able to tie to sounds.  Tying in hard coded values is easy enough, but 
ideally you want these set in objects like everything else.  So maybe have an 
archetype whose sole purpose is to hold the sound information for those events.

  That said, looking at the global events, I'm not sure how many I would want 
sounds tied to sounds (or at least not played universally).  Login/logout might 
be worthwhile, but things like mapload/mapunload make little sense to emit 
sounds (that is server work which in theory should be invisible to the player)

Continuous Sounds: Continuous sounds probably get played in the map protocol - 
that is convenient in that one does not have to track if it has been sent to the 
client or the server sending it repeated for no good reason - the same logic 
that is used for images can be used for sounds.  The one limitation is that I 
think this would limit one continuous sound/space, but I don't see that as much 
an issue.

Sound Merging: As the code stands now, each time a sound is generated on a tick, 
based on the random chance, that sound is set to the client, with a limit of the 
number of sounds sent.  However, in a more ideal world, if there are 20 of the 
same sound be generated to the east, rather than send 20 sound3 protocol notes, 
the server should catch that and send just 1 much lounder sound3 for that sound. 
  But adding this does increase the complexity quite a bit, and unless too many 
sounds is an issue, doesn't seem much point to work on it.

Client Sound Management:  I could see the client having a file like
<sound name> <file to play> <volume> <text description>

This would allow the client to specify alternative sound files to play.  The 
volume would allow the player to turn off certain sounds (volume 0). The text 
descriptions could be used for players who do not want sound, but still want the 
information that conveys (you hear a dragon breathing in the distance) - these 
could get displayed in the text window.  The clients could ship with some 
skeleton version, or perhaps updates it as it gets the information, and provides 
some interface for the player to change it.  To me, this is also fairly 
specialized and of limited value, hence put in the 'perhaps never' area.