(Logo)

Mofafen Blog

String Optimizations in C++

Published on April 01, 2025

This article aims to provide a clear understanding of how we can optimize the use of strings in C++ to improve performance. It does not discuss when to optimize, as that is a separate topic.

Table of Content

What is the problem with strings

std::string is not a primitive type. It has behaviors that make it expensive to use, regardless of the implementation. The main issue with strings is that they are dynamically allocated but behave as values in expressions, leading to excessive copying.

Why are strings dynamically allocated.

std::string needs to grow dynamically to accommodate its content, unlike fixed-size character arrays (char[]). To implement this flexibility, it uses dynamic memory allocation.

Small internal buffer on the stack

A string object contains a small internal buffer of fixed size, located on the stack. If the content does not exceed this size, the string does not need to allocate memory. However, once it exceeds the buffer size, the string allocates memory on the heap instead of using local storage. This mechanism is known as Small String Optimization (SSO).

Size of the SSO buffer

The size of the local buffer is implementation-dependent. However, typical values are: | Compiler | SSO Buffer Size (bytes)| | ---- | ---- | | GCC (libstdc++) | 15 | | Clang | 22 or 23 | | MSVC | 15 |

To check the size of the SSO buffer, we can incrementally add characters to a string and monitor when its capacity changes:: Compiler explorer link here.

#include  <iostream>
#include  <string>
using  namespace std;

int main() {
    std::string s;
    std::size_t capacity = s.capacity();
    for (int i = 1; i < 100; ++i) {
        s += 'a'; // Add characters one by one
        if (s.capacity() != capacity) { // Detect capacity increase
            std::cout << "SSO buffer size: " << i - 1 << " bytes\n";
            break;
        }
    }
    return  0;
}

Strings involve excessive copying

Value behavior

Assigning one string to another behaves as if each string variable has a private copy of its content.

int str1, str2;
str1 = "Hello"
str2 = str1;
str2[4] = '!';
// Will print "Hello", not "Hell!"
std::cout << str1 << std::endl;

Since strings also support mutating operations, they must behave as if they have private copies of their content. This results in copying when:

  • Strings are passed to constructors
  • Strings are assigned
  • Strings are passed as function arguments

Copy-On-Write (COW)

Before C++11, COW was used to optimize memory usage. Two strings could share the same dynamically allocated storage until one of them performed a mutating operation, at which point a new allocation was triggered.

Since C++11, COW is officially forbidden due to issues in multi-threaded environments. Modern C++ requires that std::string::data() return a mutable pointer (char*), which is incompatible with COW. Consequently, deep copies are now required.

While COW saves memory, it is unsafe in multithreaded environment when multiple threads modify the same string. It also adds overhead* for reference counting and atomic operations.

In modern C++ (11+), the standard requires string to have the method data() , that must return a mutable pointer (char*). This requirements is not compatible with COW.

Now, strings are required to use deep copy instead of COW.

// Old possible behavior with COW
#include <string>
std::string str1 = "Hello";
std::string str2 = str1; // No copy, reference count
str2[0] = 'h'; // Will trigger deep copy due to mutating operation
// Modern c++
std::string str1 = "Hello";
std::string str2 = str1; // Actual copy, no reference counting
str2[0] = 'h'; // Already independent.

How to optimize strings ?

Now that we have seen what ar ethe issues with Strings in C++, and got an overview how is composed a string, we can now dive into the ways to optimize the usage of it inside our code.

Prefer mutating operations

Mutating operations do not rely on value semantics, which means they avoid unnecessary copies. Using mutating operations instead of expensive copy-based operations is an efficient way to improve performance.

Instead of using string operations that are costly, we can opt for mutating operations that do not involve dynamic memory allocation.

Example using Value Semantics ( String concatenation operator)

#include  <iostream>
#include  <string>
using  namespace std;
string remove_symbols(std::string str) {
    string res;
    for (int i = 0; i < str.length(); ++i) {
        if (str[i] != ',' && str[i] != ';') {
        // String contatenation operation is expensive
        // The contatenated string is hold in a newly created
        // temporary string
        res = res + str[i];
        }
    }
    return res;
}
int main() {
    std::string original = "Hello, World;";
    auto res = remove_symbols(original);
    cout << res << endl;
    return  0;
}

Improved version using mutating operations

#include <iostream>
#include <string>
using namespace std;

string remove_symbols2(std::string str) {
    string res;
    for (int i = 0; i < str.length(); ++i) {
        if (str[i] != ',' && str[i] != ';') {
        res += str[i]; // This avoids the copy in the loop
        }
    }
    return res;
}

int main() {
    std::string original = "Hello";
    modifyStringInPlace(original);
    cout << "String : " << original << endl;
    return 0;
}

Running it in a benchmark quick-bench link , even using -O3 optimization flag show drastic performance improvement:

Diagram showing comparison using mutating operations

Use reserve()

Using reserve can avoid the number of reallocation. Indeed, when adding content to the string, if the capacity if the buffer is exceeded, it will reallocate a bigger buffer ( usually doubling the capacity), and do a copy of all the content to the new buffer.

To avoid these reallocation,

Not using reserve()

std::vector<int> vec;
for (int i = 0; i < 1000000; ++i) {
    vec.push_back(i); // Will cause multiple reallocations
}

Using reserve()

std::vector<int> vec;
vec.reserve(1000000); // Pre-allocate mem. for 1 million elements
for (int i = 0; i < 1000000; ++i) {
    vec.push_back(i);
}

Comparison

quick-bench link Diagram showing performance comparison using reserve

Using const string& instead of string in arguments of a function

Passing by reference avoids copying the string, and can have significant performance improve, especially for large strings.

In the following code, we force a copy inside the functions. This is done because when the function only reads the string and doesn't copy it inside, the compiler might inline the function and optimize the difference away.

By Value

// Pass by value
void pass_by_value(std::string str) {
    std::string copy = str; // Forces a copy
    benchmark::DoNotOptimize(copy);
}

By Reference

#include  <string>
// Pass by reference
void pass_by_reference(const std::string& str) {
    std::string copy = str; // Forces a copy inside
    benchmark::DoNotOptimize(copy); // Prevent optimization
}

Comparison

We use this code to benchmark: ( quick-bench link here )

// Benchmark for pass by value
static  void BM_pass_by_reference(benchmark::State& state) {
    std::string original(10000, 'x'); // 10,000 characters
    for (auto _ : state) {
        benchmark::DoNotOptimize(original);
        pass_by_reference(original);
    }
}
// Benchmark for pass by const reference
static  void BM_pass_by_value(benchmark::State& state) {
    std::string original(10000, 'x');
    for (auto _ : state) {
        benchmark::DoNotOptimize(original);
        pass_by_value(original);
    }
}
// Register benchmarks with different string sizes
BENCHMARK(BM_pass_by_reference);
BENCHMARK(BM_pass_by_value);

The result shows significant performance increase when passing by reference. Diagram showing performance comparison with const reference

Other possible optimizations (not covered)

We have covered some important optimizations for strings, but is it not exhaustive. Some other things that can be check are : - Avoid copying in returned values - Use char* or char[] instead of string - Using iterators instead of loops to avoid dereferences - Use better string library (boost) - Custom implementation (fbstring) - stringstreamto avoid value semantic - string_view - Avoid string conversion ( C-type strings and C++ string) - Use better algorithms

Conlusion

Optimizing string usage in C++ is crucial for improving the performance of applications that heavily rely on string manipulation. We’ve explored the main issues related to std::string, such as dynamic memory allocation, excessive copying, and the implications of Small String Optimization (SSO) and Copy-On-Write (COW) in different C++ standards. Through examples, we highlighted how mutating operations, proper memory allocation via reserve(), and passing strings by reference rather than by value can greatly enhance performance.

By adhering to best practices like avoiding unnecessary copies, leveraging mutating operations, and using const std::string& in function arguments, C++ developers can significantly reduce memory overhead and improve the speed of their applications. While these optimizations are essential for writing efficient code, they are not exhaustive. Other techniques, such as using string_view, char* arrays, or even third-party libraries like Boost, can provide additional performance improvements based on specific use cases.

String optimization, as discussed in this article, is a powerful way to achieve higher efficiency, especially in performance-critical applications.

References

  1. Guntheroth, Kurt. Optimized C++: Proven Techniques for Heightened Performance. Addison-Wesley, 2004.

  2. Inside STL: The String. Microsoft. https://devblogs.microsoft.com/oldnewthing/20230803-00/?p=108532

  3. Understanding Small String Optimization (SSO) in std::string. cppdepend.com. https://cppdepend.com/blog/understanding-small-string-optimization-sso-in-stdstring

  4. Dicanio, Giovanni. The C++ Small String Optimization. giodicanio.com. https://giodicanio.com/2023/04/26/cpp-small-string-optimization

  5. C++ Weekly - Ep 430 - How Short String Optimizations Work. YouTube. https://www.youtube.com/watch?v=CIB_khrNPSU

Optimization C++

Comments

No comments yet. Be the first to comment!

Leave a Comment