Thomas Slade

Provincial: Code Review

Provincial: Code Review

In previous posts, I’ve discussed my ongoing work on the First World War mod for Hearts of Iron IV, which has involved totally overhauling the base game’s map for a more detailed map that zooms in on Europe.

Map building is a time-consuming process, with thousands of provinces needing to be drawn up and updated as the mod develops. HoI IV ships with some map-editing utilities (the ‘Nudge’ editor), but these features are often buggy and sometimes lacking. For this reason, I decided pretty early to start working on my own toolset, the Provincial toolset. This is a collection of python scripts that make use of the numpy and skimage libraries, which are perfect for image-manipulation, as well as python’s more standard read/write libraries for file output.

In this post, I’m going to take a close look at one of my recent changes to this toolset, revising my scripting standards as I go.

HoI IV Modding for the Unfamiliar

First, a quick primer for those who are unfamiliar with HoI IV modding. Hearts of Iron is a grand strategy ‘map painter’ sort of game. The map is actually a sort of node-graph, where each node is a province which a unit can occupy. Provinces in HoI IV are quite granular: there are several-thousand in total, with as many as a few hundred in areas like France or Germany to allow for the encirclements shifting fronts seen in the second world war. Provinces are defined using a bitmap, where each province has a unique colour. It looks like this:


Provinces have many associated properties: they belong to a state, which is the larger geographic unit of the game. They each have a continent, a terrain type, a bool indicatinf if the province is coastal, and - of course - a unique RGB value. These definitions are stored in various places around the content base, but one critical file is the definitions.csv file.

Definitions.png

This contains the terrain, coastal status, continent, unique color, and ID of the province. Each province has its own line, so this file has thousands of entries. Without editing tools, a modder needs to edit this by hand, which is obviously an extremely difficult process: matching a province with its RGB text value takes 5 minutes all by itself.

Again, HoI IV does have a tool to help with this, but half way through my modding, this tool stopped working for me. Eventually I decided that enough was enough: all of the data in definition.csv can, with a little work, be obtained from the province and terrain bitmaps I already have on hand. Writing a new python script to handle this process sounded simple enough, so I decided to bite the bullet.

GenerateDefinitions.py

The addition to my Provincial library was a file called generatedefinitions.py. This file, along with the rest of the toolset, is available for view (and use!) on my GitHub, but I’ll be looking at the particulars of its script in this post anyway.

Here’s what generatedefinitions, in a nutshell, does.

GenerateDefinitions.png


How Slapdash is Too Slapdash?

When creating this library, I started by making one thing very clear to myself: this is a side project to my main objective, which was to create a really cool HoI IV mod. This is a toolset that needs to be used by me, and only me.

What this means is that my script only needed to be so structured. My coding standards are usually very strict: I’ve had too many projects hampered by technical debt and out-of-control, undocumented daisy chains of functions, so I decided long ago that clean code was important to me. However, creating a truly team-friendly, properly decoupled feature takes extra time that I didn’t really have. As a result, my Provincial library is certainly lacking in its use of OOP principles (this is part of why I’m making this post). However, adhering to functional, Don’t Repeat Yourself (DRY programming) programming, and properly commenting your work takes far less time that planning the architecture for a sustained application, so my scripts are nonetheless well-documented (I planned, from an early stage, to possibly release Provnicial for use by other modders, so ample comments are always a good thing).

Analysing the Script

The script itself is quite short, but references other scripts in the library. It’s best to view the entire script on GitHub.

To elaborate on the structure of the script:

// Header
This includes my accredation and the library the script belongs to, plus a brief blurb of the script’s purpose and some quid-pro-quos for anyone reading it (where it might go wrong, some critical usage requirements).

As my code review will demonstrate, comment documentation has become a very natural process for me. I’m of the opinion that every function, class, and even member variable deserves an explanatory comment (preferably an XML comment), especially on team projects. Comments, of course, provide context for an unfamiliar reader, but they have a second, maybe more critical role: a comment can help define the intended purpose of a script or method. In the early phases of a project, a class’s role may dramatically change from edit-to-edit. In these cases, I argue that it’s very useful to be upfront about the original definition of that class, its responsibilities, and by extension what should and should not be included. The purpose of the class may indeed change, for sure. But when that happens I prefer it to be a conscious, deliberate decision taken by the development team, demanding an update to that blurb comment and possibly to the class’s name.

My point being that this little, non-compiled piece of text at the top of each of my methods is probably more important than the code itself.

// Includes
These are python’s imports from other libraries, allowing me to use specialised utilities such as file reading/writing, and of course skimage and numpy, which are the driver behind image manipulation in this script.

I’ll start the self-criticisms off early here and say that, in hindsight, importing particular functions of a library rather than referring to that function within the library’s scope (skimage.segmentation.floodfill, for example), was probably a mistake. Many other python scripters advise that a method always be referred to in the scope of its library, to avoid naming ambiguities and make the owner of that functionality clear to the reader. I opted not to, simply to save myself time and verbosity.

// Function definitions
Python requires that functions be defined before their invocations, so these are usually at the top of the file. More on that later.

// Main Program
The actual invocation of functions and manipulation of data. Again, this needs to come after function definitions, which is a layout I’m never happy about (the initial flow of a script should be immediately visible at the top of its file, in my opinion, since 99% of the time it’s what a programmer will be looking for). But in hindsight I probably have only myself to blame.

The main program defines some global members (again, this is yucky but I’ll talk about it later), then operates on them using the functions in this script and its imported scripts (which includes a utilities file for the Provincial library). The result is a string representing the ammended definitions data, ready for output.

// Output and Logging
Finally, the resulting data is printed for the user and (if the appropriate setting has been flagged) written to the existing directory of definitions.csv. And just like that, you have your HoI IV provinces ready to use, saving you who knows how many hours.

One last thing I take care to do is include plenty of logs, and two ‘debug maps’ which show the result of the operation for the user to interpret. This is extremely useful for identifying faults and bugs which would otherwise remain invisible until HoI IV crashes on me or something. With Python’s string.format() method - allowing for easy string insertions (e.g. “Hello {}”.format(“World”) prints Hello World) - makes it very simple to get ample data out into your log. Some of the data I print includes the percentage of terrain types assigned, the percentage of coastal provinces, the number of unique provinces found …

Do’s and Don’ts

Let’s talk about some of the smarter, and dumber, characteristics of this script, acknowledging that it was a time-limited endeavor but that there’s always a little room for improvement.

Do: Comment A Lot

Not much else to add here. Comments are the lifeblood of a maintainable codebase. I’ve met programmers who insist that next-to-no comments is perfectly acceptable, but I firmly disagree. As mentioned earlier, I’m of the opinion that every class, method, and member deserves its own comment, as well as any otherwise unclear operations or hashed in fixes. This paves the way for online documentation tools, and keeps the team on the same page as to a given script’s purpose. For a tools programmer, documentation is doubly important, as it gives code-fluent designers the chance to examine the program themselves should your hands be tied, perhaps finding the issue given their greater familiarity with whatever bug is troubling them.

Do: Log A Lot

Logs.png

When creating tools, your end user is constantly at the mercy of your own oversight. Nobody every uses something the way you intend them to, but usability touches certainly help. Something as simple as logging the status of the program can assure the user that things are happening, while detailed data outputs can catch errors early before they enter the content base.

Do: Create Generalised Functions

One programming pretty naturally to me is DRY (Don’t Repeat Yourself). I simply hate repeating code.

The aim with DRY programming is, among other things, to create reuseable functions that perform universally applicable operations. In other words, if I find myself doing a similar sort of operation across the codebase, I should package that operation into a function of its own and make it available to whatever needs it. Sometimes this means elevating it to a publicly available utility function.

There are numerous cases of this in Provincial which can be seen in the library’s utility file, but one example used in this script would be the find_bounds method.

Usage of find_bounds in generatedefinitions,py

Usage of find_bounds in generatedefinitions,py

The definition of find_bounds, a relatively simple method. It uses numpy’s methods to find the upper-leftmost and lower-rightmost corner of a shape’s bounding box.

The definition of find_bounds, a relatively simple method. It uses numpy’s methods to find the upper-leftmost and lower-rightmost corner of a shape’s bounding box.

find_bounds is an operation required across several scripts in Provincial, often to help crop an image down to the area I’m interested in, dramatically reducing the time needed to iterate over pixels.

Restructuring a program to adhere to DRY isn’t always a no-brainer. Sometimes, two routines may be similar enough to warrant combining, but doing so might change the structure of the code that depends on them (for example, maybe in one class I’ve been getting the top-right bound, and in another I’ve been getting the top-left). These refactoring costs need to be considered, but DRY programming has some enormous benefits while its antithises, I would argue, has one enormous danger.

  • Creating a function for a common utility naturally saves future programmers some typing time (provided you put the method somewhere sensible, so that team members are able to find it).

  • Creating a function can define authority over a piece of logic. What this means is that one given operation has exactly one representative in the codebase. One requirement, and one solution.

  • Not adhering to DRY will result in multiple definitions for similar or identical procedures, and changes to one definition will not be reflected in others. The result can be chaotic: optimisations a programmer thought they had implemented not being reflected in some behaviours, inconsistent and confusing function chains, general headaches over what should be simple problems.

The bottom line is that functions - being packages of behaviour - should behave as we expect them to. A great codebase becomes a lexical arsenal, where many basic problems have long since been solved and can be deployed by new programmers to handle more complex issues.

Don’t: Define Ambiguous Names

This is a very simple don’t, and I’m sure many programmers are rolling their eyes. But it leads on to a more general issue with my Python scripting.

Let’s look at an example from the generatedefinitions.py script:

DuplicateNames.png

We have a case where, in the main program flow, a variable called terrain_map_flattened is being defined. It’s an appropriate variable name: it does indeed represent a flattened terrain map.

Further up the script, we have a method which is invoked later in the main program. This method is designed to read the very same flattened array. Because it’s exactly the same item, serving the same purpose, I (carelessly) gave it exactly the same name. This is a slipup occuring several times across my toolset. We have a variable defined in global scope, and a more local representative of that variable in some script’s arguments.

C++ programmers might expect some sort of compilation error for this. But no, in fact, Python allows global scopes to be overwritten by local ones. So, in the get_terrain method, terrain_map_flattened refers to the local argument, and not the global variable. Changing the local variable will not impact the global one (I can do that, if I want to, using the ‘global’ keyword within that method’s scope).

Why is this dangerous? Because:

  • If I ever changed the argument name in the method, but forgot to change one of its uses, my script would not throw a runtime warning me of my mistake. Instead, it would start operating on the global variable instead of the local one. Bugs would ensue without an obvious cause.

  • The usage is ambiguous and confusing enough to warrant this entire explanation. Code should be as logical and self-explanatory as possible.

Okay. So what should I name it instead?

BadName.png

This really won’t do. A variable’s name should, of course, refer to what that variable is. Slapping qualifiers on a variable to avoid collisions is certainly justified if we’re dealing with two different entities, but in this case, the global and local copy of the variable are one and the same. This highlights a more general problem in my scripts: as a result of avoiding an Object Oriented or Main Program approach, my names are becoming ambiguous.

This is technically a byproduct of the fact that, when I began using Python, I didn’t know how to define Python classes and didn’t really want to spend time finding out. This is a great example of technical debt spiralling out of control from a project’s foundation, if nothing else.

Summary

To round things off, I think it’s evident that my library has outgrown its origins as a ‘quick fix’ and would, in the future, require some refactors to its overall structure. I’m partial to a purely class-based application, in which every procedure is an object, if only to avoid the use of globals and ambiguous names. I would avoid packaging functions into the global scope of a given pythn script, instead requiring all non-utility functions to belong to a class.

Company of Heroes 2 Map: Pinsk Wetlands Postmortem

Company of Heroes 2 Map: Pinsk Wetlands Postmortem

Total War Unit Modding: The Other Three Kingdoms

Total War Unit Modding: The Other Three Kingdoms