What's new

Windows High-Level Abstraction Under the CRT

Iconoclast

New member
I wanted to do some experiments with how functions on Windows operating systems did things differently...benchmark the <stdio> version and the NT kernel version, step through what each was doing. So I just wanted to take some notes here on that.

Normally when writing portable code between operating systems, I consider just making any choices of API portable ones (for example, avoiding use of OS-fixed methods like CreateFile instead of fopen). The thing about Windows, is that not only does it define the stdio stuff on top of the NT kernel functions (defining fopen on top of CreateFile for instance); it defines some kernel functions on top of other kernel functions and CRT methods over others. On Linux, a simple memset can intrinsically take advantage of SSE2 alignment and set 16 bytes at once, but on Windows, a memset may call other CRT functions to write each byte, which can sometimes call NT kernel functions inside them, which call loops of other functions. It's a mess.

The supposedly ultra-simple WINAPI function Sleep(DWORD), when called, on my version of Windows really just makes a jump to the lower-level SleepEx(DWORD, BOOL) under kernelbase.dll, which fills in a structure passed to NtDelayExecution before calling that. For reasons like this, portability (not just cross-OS portability, but cross-versions of Windows) varies directly against directness and the minimization of dependencies of higher-level libraries. (OpenGL is another thing that should be a low-level API, but as of Windows 7 it's dependent on GLU, DirectDraw and GDI.) I'm not even going to delve into the part about Microsoft making things worse with multiple MSVCR's.

Anyway here's what I've come up with from a practical point of view -- How much does it really matter? In some arbitrary obsession for performance, what does it really matter to use something like fputs("Hello, world!\n", stdout); instead of WriteFile under kernel32.dll? Stepping through fputs in x64 on Windows 7, I see that it repeatedly opens and closes the file stream, but at least they didn't do a bunch of fputc's inside of fputs and make it terribly worse on purpose. It's a little long to post the entire disassembly, so here are the speed results:
Code:
#include <stdio.h>
#include <time.h>

/*
 * defined in one of two ways...
 * The WINAPI WriteFile method if WIN32, or the stdio fputs method.
 */
extern int write_text(const char* text, unsigned int file);

int main(void)
{
    clock_t t1, t2;
    register unsigned int i;

    t1 = clock();
    for (i = 0; i < 32768; i++)
        write_text("Hello, world!\n", STDOUT_FILENO);
    t2 = clock();

    printf("diff:  %l\n", t2 - t1);
    return 0;
}

With the lazy help of simple single-threaded wall-clock timer, how fast does write_text execute after 32768 iterations?

If it calls standard C I/O function fputs:
Code:
* 10593
* 10178
* 10727
* 10478
* 10546
average:  10.5044 seconds

If it calls WriteFile from the Windows x64 kernel:
Code:
*  9602
*  9492
*  9612
*  9457
*  9822
average:   9.5970 seconds

Although, this might not necessarily be the best example or a good demonstration. While I look for others on occasion I'm happy if anyone else knows some more of that kind of stuff.
 
Last edited:

Top