Are Strings really that Evil?

One of the many repeating themes on the Arduino forums is the use of Strings (with a capital ‘S’) against usings strings (with a little ‘s’) – the former refers to a class that encapsulates string handling and the latter refers to the use of nul terminated char arrays.

The kind of forum threads involved are generally someone who wants to use Strings and is having a problem followed by a lot of other posts telling them not to because they are ‘bad’.

So what makes them ‘bad’ and is this really a problem?

Strings and strings

C and C++ don’t have a native text or string type. However, the languages have a convention that a string is an array of characters terminated with the nul ASCII character represented by ‘\0’. So a string with the contents “abc” has four characters: ‘a’ , ‘b’ , ‘c’ , and the terminating nul ( ‘\0’ ) character – the containing array must have at least 4 elements.

The language has a set of well known string functions (prefixed by ‘str’) used to manipulate these character arrays. Examples include strcpy() to copy a string into an array, strlen() to get the length of the string and strcat() to concatenate (join) two strings together. Every string element can also be manipulated as an element of the array.

This scheme is flexible, if a bit primitive, and is straightforward when you know how it works. It has a low overhead but the programmer needs to manage every aspect of the string, especially the size of the storage array. A common bug is to overrun a char array (buffer) when concatenating 2 strings.

The C++ String class encapsulates the standard string storage and related operations and provides more user friendly syntax (such as ‘+’ to join strings). The class also includes the allocation and management of heap memory (see this previous article) to dynamically create the arrays for the characters making up the string.

At its most basic level, the String is just a definition of a pointer to allocated memory containing the data and the length of the string.

Testing for failure

In a low memory environment, allocating and freeing memory can cause heap fragmentation if used indiscriminately, so an object like a String that transacts many allocate/free operations has the potential to cause trouble.

To test how bad this could get I wrote a simple test sketch to capture and display heap memory data while using String on an Arduino Uno (2kb RAM):

#include <MemoryFree.h>

 void reportMemory(int invcount, int cntr, int nLen)
 {
   Serial.print(freeMemory());
   Serial.print(F(","));
   Serial.print(freeListSize());
   Serial.print(F(","));
   Serial.print(nLen);
   Serial.print(F(","));
   Serial.print(cntr);
   Serial.print(F(","));
   Serial.print(invcount);
   Serial.println("");
 }

 void useString(int nSize)
 {
   static int invcount = 0;
   String s = "";

   for (int i=0; i<nSize; i++)
   {
     s = s + ('0' + (i % 10));
     if (s)
       reportMemory(invcount, i, s.length());
     else
       reportMemory(invcount, 1, 0);
   }
   invcount += 50;
 }

 void setup(void) {  Serial.begin(9600); }
 void loop(void)  {  useString(500);     }

The useString() function creates a new string and loops incrementing the length of the string by one character each iteration. This would resemble receiving characters from a serial port before processing.

The data output to the serial monitor is plotted using the built-in IDE serial plotter. The resulting chart is shown below.

The purple line is the static invcount counter in useString(). This counter records each time the function is run and on the chart shows a step change when that happens – point (2) on the chart.

The blue line is the free heap memory available. It starts at around 2kB and drops linearly as memory is allocated by String. The red line is the amount of memory that is in the free list – the free memory held in the in the middle of the heap that could be reallocated.

The green line is the current length of the string and the orange line is the loop counter inside useString(). I observe without explanation that these should coincide when useString() is invoked but diverge immediately.

An interesting event happens at position (1). At this point the String collapses to contain no data as the memory allocation failed (the ‘if (s)’ test in the code). This is despite more than enough total memory available in the free list. String simply fails and discards all its data, which re-establishes the heap and it all starts again. At this point the string was barely 300 bytes long and the freelist held about 1200 bytes.

A similar memory cleanup occurs at position (2) when the function exits and the local useString() String variable is cleared out of memory.

Are Strings Evil?

Strings may not be 100% evil, but the example illustrates that String is not really suited to low memory environments. It is a ‘resource hog’ and can fail without warning. Except for the simplest of cases, the use of strings is not recommended.

2 thoughts on “Are Strings really that Evil?

  1. John Smith

    Excellent analysis. Certainly applicable to devices such as the Arduino Nano (etc.) with very limited memory (1-2kb). On platforms such as the ESP32, the programmer can become a bit more lazy as it has over 200Kb of memory to play with (all relative to how much data the code with receive of course).

    Also, it would be interesting to see how using the Arduino String classes’s ‘reserve()’ function to allocate memory up-front could help – especially if one has a good idea of how much data is likely to be received.

    My understanding of the String class is that when you concatenate, it creates a completely new String class of n (size of existing string) + x (x being the length of the String/characters to append), and then deletes the old String class. So essentially, for a moment it’ll use at least double the memory of the string being concatenated/appended. I guess that would mean, even if you have 2Kb of memory, you wouldn’t be able to append even a single character to a 1kB string.

    Like

    1. I would guess that is the only way they can do it with allocated memory – allocate a new block large enough to take the sum of the two strings before freeing up the old ‘String’ and replacing with a pointer to the new.

      Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s