Published on April 01, 2025
This article aims to provide a clear understanding of how we can optimize the use of strings in C++ to improve performance. It does not discuss when to optimize, as that is a separate topic.
std::string
is not a primitive type. It has behaviors that make it expensive to use, regardless of the implementation. The main issue with strings is that they are dynamically allocated but behave as values in expressions, leading to excessive copying.
std::string
needs to grow dynamically to accommodate its content, unlike fixed-size character arrays (char[]
). To implement this flexibility, it uses dynamic memory allocation.
A string object contains a small internal buffer of fixed size, located on the stack. If the content does not exceed this size, the string does not need to allocate memory. However, once it exceeds the buffer size, the string allocates memory on the heap instead of using local storage. This mechanism is known as Small String Optimization (SSO).
The size of the local buffer is implementation-dependent. However, typical values are: | Compiler | SSO Buffer Size (bytes)| | ---- | ---- | | GCC (libstdc++) | 15 | | Clang | 22 or 23 | | MSVC | 15 |
To check the size of the SSO buffer, we can incrementally add characters to a string and monitor when its capacity changes:: Compiler explorer link here.
#include <iostream>
#include <string>
using namespace std;
int main() {
std::string s;
std::size_t capacity = s.capacity();
for (int i = 1; i < 100; ++i) {
s += 'a'; // Add characters one by one
if (s.capacity() != capacity) { // Detect capacity increase
std::cout << "SSO buffer size: " << i - 1 << " bytes\n";
break;
}
}
return 0;
}
Assigning one string to another behaves as if each string variable has a private copy of its content.
int str1, str2;
str1 = "Hello"
str2 = str1;
str2[4] = '!';
// Will print "Hello", not "Hell!"
std::cout << str1 << std::endl;
Since strings also support mutating operations, they must behave as if they have private copies of their content. This results in copying when:
Before C++11, COW was used to optimize memory usage. Two strings could share the same dynamically allocated storage until one of them performed a mutating operation, at which point a new allocation was triggered.
Since C++11, COW is officially forbidden due to issues in multi-threaded environments. Modern C++ requires that std::string::data()
return a mutable pointer (char*
), which is incompatible with COW. Consequently, deep copies are now required.
While COW saves memory, it is unsafe in multithreaded environment when multiple threads modify the same string. It also adds overhead* for reference counting and atomic operations.
In modern C++ (11+), the standard requires string to have the method data()
, that must return a mutable pointer (char*
). This requirements is not compatible with COW.
Now, strings are required to use deep copy instead of COW.
// Old possible behavior with COW
#include <string>
std::string str1 = "Hello";
std::string str2 = str1; // No copy, reference count
str2[0] = 'h'; // Will trigger deep copy due to mutating operation
// Modern c++
std::string str1 = "Hello";
std::string str2 = str1; // Actual copy, no reference counting
str2[0] = 'h'; // Already independent.
Now that we have seen what ar ethe issues with Strings in C++, and got an overview how is composed a string, we can now dive into the ways to optimize the usage of it inside our code.
Mutating operations do not rely on value semantics, which means they avoid unnecessary copies. Using mutating operations instead of expensive copy-based operations is an efficient way to improve performance.
Instead of using string operations that are costly, we can opt for mutating operations that do not involve dynamic memory allocation.
#include <iostream>
#include <string>
using namespace std;
string remove_symbols(std::string str) {
string res;
for (int i = 0; i < str.length(); ++i) {
if (str[i] != ',' && str[i] != ';') {
// String contatenation operation is expensive
// The contatenated string is hold in a newly created
// temporary string
res = res + str[i];
}
}
return res;
}
int main() {
std::string original = "Hello, World;";
auto res = remove_symbols(original);
cout << res << endl;
return 0;
}
#include <iostream>
#include <string>
using namespace std;
string remove_symbols2(std::string str) {
string res;
for (int i = 0; i < str.length(); ++i) {
if (str[i] != ',' && str[i] != ';') {
res += str[i]; // This avoids the copy in the loop
}
}
return res;
}
int main() {
std::string original = "Hello";
modifyStringInPlace(original);
cout << "String : " << original << endl;
return 0;
}
Running it in a benchmark quick-bench link , even using -O3
optimization flag show drastic performance improvement:
reserve()
Using reserve can avoid the number of reallocation. Indeed, when adding content to the string, if the capacity if the buffer is exceeded, it will reallocate a bigger buffer ( usually doubling the capacity), and do a copy of all the content to the new buffer.
To avoid these reallocation,
reserve()
std::vector<int> vec;
for (int i = 0; i < 1000000; ++i) {
vec.push_back(i); // Will cause multiple reallocations
}
reserve()
std::vector<int> vec;
vec.reserve(1000000); // Pre-allocate mem. for 1 million elements
for (int i = 0; i < 1000000; ++i) {
vec.push_back(i);
}
const string&
instead of string
in arguments of a functionPassing by reference avoids copying the string, and can have significant performance improve, especially for large strings.
In the following code, we force a copy inside the functions. This is done because when the function only reads the string and doesn't copy it inside, the compiler might inline the function and optimize the difference away.
// Pass by value
void pass_by_value(std::string str) {
std::string copy = str; // Forces a copy
benchmark::DoNotOptimize(copy);
}
#include <string>
// Pass by reference
void pass_by_reference(const std::string& str) {
std::string copy = str; // Forces a copy inside
benchmark::DoNotOptimize(copy); // Prevent optimization
}
We use this code to benchmark: ( quick-bench link here )
// Benchmark for pass by value
static void BM_pass_by_reference(benchmark::State& state) {
std::string original(10000, 'x'); // 10,000 characters
for (auto _ : state) {
benchmark::DoNotOptimize(original);
pass_by_reference(original);
}
}
// Benchmark for pass by const reference
static void BM_pass_by_value(benchmark::State& state) {
std::string original(10000, 'x');
for (auto _ : state) {
benchmark::DoNotOptimize(original);
pass_by_value(original);
}
}
// Register benchmarks with different string sizes
BENCHMARK(BM_pass_by_reference);
BENCHMARK(BM_pass_by_value);
The result shows significant performance increase when passing by reference.
We have covered some important optimizations for strings, but is it not exhaustive. Some other things that can be check are :
- Avoid copying in returned values
- Use char*
or char[]
instead of string
- Using iterators instead of loops to avoid dereferences
- Use better string library (boost)
- Custom implementation (fbstring)
- stringstream
to avoid value semantic
- string_view
- Avoid string conversion ( C-type strings and C++ string)
- Use better algorithms
Optimizing string usage in C++ is crucial for improving the performance of applications that heavily rely on string manipulation. We’ve explored the main issues related to std::string
, such as dynamic memory allocation, excessive copying, and the implications of Small String Optimization (SSO) and Copy-On-Write (COW) in different C++ standards. Through examples, we highlighted how mutating operations, proper memory allocation via reserve()
, and passing strings by reference rather than by value can greatly enhance performance.
By adhering to best practices like avoiding unnecessary copies, leveraging mutating operations, and using const std::string&
in function arguments, C++ developers can significantly reduce memory overhead and improve the speed of their applications. While these optimizations are essential for writing efficient code, they are not exhaustive. Other techniques, such as using string_view
, char*
arrays, or even third-party libraries like Boost, can provide additional performance improvements based on specific use cases.
String optimization, as discussed in this article, is a powerful way to achieve higher efficiency, especially in performance-critical applications.
Guntheroth, Kurt. Optimized C++: Proven Techniques for Heightened Performance. Addison-Wesley, 2004.
Inside STL: The String. Microsoft. https://devblogs.microsoft.com/oldnewthing/20230803-00/?p=108532
Understanding Small String Optimization (SSO) in std::string
. cppdepend.com. https://cppdepend.com/blog/understanding-small-string-optimization-sso-in-stdstring
Dicanio, Giovanni. The C++ Small String Optimization. giodicanio.com. https://giodicanio.com/2023/04/26/cpp-small-string-optimization
C++ Weekly - Ep 430 - How Short String Optimizations Work. YouTube. https://www.youtube.com/watch?v=CIB_khrNPSU
Optimization C++
No comments yet. Be the first to comment!