David R. Brooks
The question of how to manipulate bitmap files came up when my daughter was working on a science fair project involving taking a photo with a digital camera pointed up through a tree canopy. What she needed to know was what percentage of the image was sky and what percentage was tree canopy -- leaves or branches. I'm sure there are photo editing programs that will answer this question, but it seemed as though it shouldn't be too difficult to write my own program specifically for this task. Basically, the plan is to manipulate the colors in the image, perhaps even by creating a two-color black and white image, and then determine how much is clear sky. I often use the freeware IrfanView photo editing software, which allows you to do these kinds of things. By adjusting contrast and brightness of the original image, you can exercise some control over how the software divides the image into "sky" and "not sky."
The first step is to find out how .bmp files are organized. An online search will produce many sources of information. Bitmap files are separated into three or four sections, as shown in the table below.
Section | Description |
---|---|
Header | Basic file information, 14 bytes |
Image Information Header | Information about the image, 40 bytes |
Color Information (optional) | Information about how the image encodes colors, a variable number of bytes if it's present |
Image data | The actual image, a variable number of bytes |
The next step is to try to interpret a real .bmp file. I first created a small file by selecting a small part of my desktop that contains an icon for QuickC, an old Microsoft MS-DOS C programming environment that I have used for many years. Here's the image: |
Although this sort of puts the cart before the programming horse, here are the results from
running my
program, written in QuickC. The third line shows the 14 bytes of the file header. These 14 bytes
contain 5 values:
|
In principle, it is possible to read these integer values directly by choosing appropriately declared data types that will automatically select the appropriate number of bytes. However, this may give unpredictable results with different C compilers, so I chose a more simple-minded, although probably more tedious solution: read the header one character at a time (characters occupy a single byte) and then use explicit typecasting to force C to interpret each character as an integer. For this to work, the C declaration must be unsigned char rather than char. Here's the code: |
The two important values in the header are the file size, in bytes, and the offset from the beginning of the file to the start of the image itself, in bytes. The 14 bytes are stored in a character array. The file size starts at element 2 (in C, the first element in an array is 0, not 1) and is stored from low byte to high byte, left to right. I have assumed that even large .bmp file sizes will need no more than 3 bytes. I also declared many of the integer values as type long because the int data type may not be able to represent large file sizes. For the offset value, two characters should always be enough; this value starts at character 10.
Now, let's find out how the image is organized by reading the 40-byte image information header. It contains:
Several of the values are simply absent from this file, and I do not know why IrfanView did not put them there when it created the file. I do not know what "important colors" means. In any event, the apparently missing values aren't important for our purposes. The useful values for this image are the width and height, 73 and 70 pixels, and the image size, 73•70 = 5110 pixels. The first byte in the file is at an offset of 0, so an image offset of 54 means that the image starts at the 55th byte in the file. The image is stored line-by-line. Each pixel requires 3 bytes. Each line has an end-of-line mark. So, the image part of the file equals 5110•3+70=15400. Adding the 54 bytes for the header records gives the file size, 15400+54=15454 byites. |