[crossfire] Code cleanlieness and server stability

Sun Feb 26 22:17:08 CST 2006

Alex Schultz wrote:
> Hi,
> 
> Lately, I was reminded a bit of how there often seems to be many bugs in 
> CVS and server stability is in some ways rather poor. I was thinking a 
> bit, and a large part of it is, runaway null pointers, and developers 
> not thinking of all the areas that they might be affecting when making 
> changes to some part. I was thinking that it would be a good idea to 
> brainstorm some ideas on how these issues could be reduced. Here are 
> some ideas that came to mind for me, feel free to suggest your own of 
> course.

  Actually, the server has been relatively stable (in terms of crashes) 
recently, IMO.

  The null pointer crashes, while annoying, are also relatively easy to fix and 
find.  I'll include the more difficult bugs at the end.

> 
> -Encourage use of doxygen generated code documentation to verify what 
> things you may be affecting: I.e. 
> http://crossfire.real-time.com/code/server/index.html (would need a more 
> uptodate version than that to be of much help though)

  Yes - that would seem to be easy to do.  I'm not that familar with doxygen, 
but I'd also think that it should be possible to use makefiles to generate the 
data?  If so, keeping that information up to date should end up being pretty easy.

> 
> -Sometimes when a bug gets fixed, the person who caused it might not 
> know, and therefore the developer is more likely to do similar things 
> again without knowing it. It might not be very difficult to make a small 
> script that can check with cvs commit last changed a given line of code 
> or function in the code, and one could easily notify the person who made 
> the mistake of what they did wrong. This might be a bit of an 
> over-engineered thing, but right now, just brainstorming ideas.

  I'd think such a script could be tricky and/or error prone.

  For example, whitespace (or other minor formatting changes) could show foo as 
the last person that modified it, but that isn't the real source of the bug. 
The other case, which is also likely to confuse scripts, is when someone changes 
code near the bad line, but not the line itself.

  Also, the notification to this previous person would have to be carefully done 
so as not to look like a blame game.  There is also the problem that some code 
may no longer have an active developer.

  One easy thing I would suggest is that all code developers be on the 
crossfire-CVS mailing list so they see the commits, and thus can keep an eye out 
for what things have been changed/fixed, and whether that was their code.

  However, this still doesn't really fix the problem itself, which is good 
coding practices.

> 
> -Clean up the code structure: IMHO, the first step would be moving 
> functions around between files and making that organization a bit 
> clearer. Also, though plugins might have some merits, they are not the 
> way to go for achieving cleaner code. Greater separation of parts would 
> be good, but plugins will just have a bit of overhead and similar 
> separation can be achieved just moving around which code is in what 
> files. After the code is moved around between files, I think the next 
> step to do would be removing redundant code which there seems to be a 
> bit of (i.e. in many places, there are many separate implementation of 
> very similar functions that should be implemented as a single function, 
> with perhaps some extra arguments to very slight variants).

  To some extent, I agree, but this also seems to be going back to the code 
cleanup discussion.

  One thing I'd add to such a code cleanup is that there should be the 
willingness to break things (that can be fixed on the map level) for cleaner code.

  A lot of functions are overly complicated because code was put in place to fix 
map bug X, Y, and Z.

  As for other common bugs that could be cleaned up:

1) buffer overruns - snprintf, strncpy, etc, should always be used.  Buffer 
overruns are often very hard to find, because they blow away the stack.  And 
often, while the code seems safe (sprintf(buf,"abc %s", foo) because foo will 
never be longer enough be cause an overflow of buf), things change such that foo 
now can become long enough to cause that overflow.

2) Presumption that valid result will be found in time.  The recent hangs 
(infinite loops) on metalforge where caused by the movement code, but also 
because the random map code presumes that if tries enough, it will eventually 
find an unblocked space to put the item.  There are/were perhaps other cases - 
I'll eventually generate a treasure, etc.  All such loops should probably have 
some count on the number of tries, and time out at some point (and print out an 
error message).

3) Functions that stores results in a static string and return that.  Aside from 
not being thread safe,non threaded code still run into issues, like:

  cp1 = foo(abc);
  some code;
  cp2 = foo(def);
  now so something with cp1, which isn't what we expect

  The real fix here is to modify all such functions to take as a paremeter the 
buffer to store the data into.  If a null is passed as such a buffer, the 
function could do a malloc/strdup for us (although, this may not be a good 
solution - creates more code issues (up there, they did a free() on the return. 
  Do I need to do that)

4) It may be worthwhile to revisit the entire shared string functions - it saves 
some cpu and memory, but also makes things a lot harder to track down if some 
code things it can modify op->name for example.

  In addition, it isn't 100% consistently used - most code uses it, yet there is 
other code that just uses strdup (for different variables mind you), but it 
could help simplify code if the strdup method was just used everything.

  Certainly, this current mix causes problems when some code does op->name = 
strdup().