A few days ago I was trying to track down a performance issue in a project that loads and parses JSON. I wrote a couple of benchmarks in different languages to see if it was a language issue, a library issue or something else entirely.
One thing that stood out was my BlitzMax test was parsing quickly, but it took an age to load the initial data.
Naturally I had to drop everything to try and figure out what was happening.
The file to be loaded was around 900KB, and the following function was used to load it into memory:
Function LoadFileAsString:String(url:Object) Local streamIn:TStream = ReadFile(url) Local contents:String = "" While Not(file.Eof()) contents:+ file.ReadLine() + "~n" Wend Return contents End Function
Reading the file byte-by-byte (using
ReadByte) is far slower than using
ReadLine. This approach works well enough on small files (< 20kB), but gets very slow as they get larger.
I figured loading everything into a bank and converting that to a string would be quicker. After all, a string is just a collection of bytes with a null terminator, and a BlitzMax bank is a collection of bytes.
The first approach was to read the bank using
LoadBank and then building a string using
PeekByte for each character.
The results were disappointing.
After reading the documentation, I found a few functions that looked to do exactly what I needed.
String type comes with
FromCString, a function that can convert a C-style string into a BlitzMax one. C-style strings are passed around in BlitzMax as a
Byte Ptr type.
Even better, BlitzMax banks can use
LockBank to return a pointer to their contents as a
The new code looked something like this:
Function LoadFileAsString:String(url:Object) ' Create a bank and load the file contents. Local bank:TBank = LoadBank(url) ' Ensure bank finishes with a 0 byte. ' This prevents the string from having junk bytes at the end. Local size:Int = BankSize(bank) ResizeBank(bank, size + 1) PokeByte(bank, size, 0) ' Get bank contents and convert to a string Local buffer:Byte Ptr = LockBank(bank) Local content:String = String.FromCString(buffer) ' Cleanup. UnlockBank(bank) bank = null Return content End Function
One important addition is adding a null terminator to the end of the loaded content. Because
FromCString expects a null byte terminator, it will continue to read from memory until it finds one. This can lead to junk data at the end of the string.
The following benchmark is for a 900KB text file.
|Method||Average Time (ms)|
|Method 1 (ReadLine)||933.18|
|Method 2 (LoadBank)||2.8|
Sometimes yak shaving comes in handy.