To acknowledge substrings, their importance in programming, and their real-life applications, we first need to understand what c strings are? What are the strings? What is a built-in string library? And so on. Let’s dive into the world of letters by understanding the concept of C String.
C Strings
C Strings are characters stored in consecutive memory locations and are terminated by a null terminator. A null terminator is denoted by a null character (‘\0’). Let’s take a look at C String usage in a program.
A C String can be the three following things in the program:
- It can just be any hard-coded string, i.e., a string literal
- It can also be a user-defined array of characters
- It can be a pointer to a character
String Literal
String literals are any word written with double quotes and are called string constants. A classic example would be the words written in a cout statement, as shown below.
// The program illustrates an example string literals. #include <iostream> using namespace std; int main() { cout << "Hello"; }
How the compiler Handles C String?
When the compiler comes across a string literal like the one shown in the example, it creates an array of 6 characters and stores the word hello in consecutive memory locations as shown below in fig.1. The null character is added as the end automatically by the compiler. Finally, the compiler passes the value of char pointer, i.e., the first character’s address to cout.
H | e | l | l | o | \0 |
The program below demonstrates how the string literal is regarded by the compiler:
// This program demonstrates that string literals // are pointers to char. #include <iostream> #include <cstdlib> using namespace std; int main() { // Define variables that are pointers to char. char *x, *y; // Assign string literals to the pointers to char. x = "Hello "; y = "World"; // Print the pointers as C-strings! cout << x << y << endl; }
String literals are a pointer to characters and can be assigned to a char pointer. Now, the x and y pointers hold the address of the string literals.
User-defined Character Arrays
Unlike string literals, which are hard-coded strings written in the program, the user-defined character arrays must-read characters from the keyboard or the file. Here the compiler does not allocate space for the null terminator, so the programmer should be cautious about allocating an additional entry for the null terminator.
An example of user-defined character array is shown below:
char city[10];
If your city is at most of 9 characters, you will need to allocate an array of 10 characters keeping the null character in mind.
A C String can be initialized in different ways as demonstrated below:
char city[20] = "Chicago , Los Angeles";
char country[ ] = "United States";
When initializing an array with a string literal as shown above, array size becomes optional. If the size is not specified, the compiler will set the array’s size to one more than the number of characters in the initializing literal string to allow space for the null terminator.
The program below illustrates the process of printing a character array one character at a time.
// This program iterates through a character array, displaying // each element until a null terminator is encountered. #include <iostream> using namespace std; int main() { const int SIZE = 30; // Maximum length for string char line[SIZE]; // Array of char of size 30 // Read a string into the character array. cout << "Enter a sentence of no more than " << SIZE-1 << " characters:\n"; cin.getline(line, SIZE); cout << "The sentence you entered is:\n"; // Loop through the array printing each character. for(int i = 0; line[i] != '\0'; i++) { cout << line[i]; } }
Pointer to Char
Another method for declaring C Strings is a pointer to a character. This method allocates a character array and then uses the array’s address as a pointer to char.
The code below demonstrates an example of pointer to character.
// This program shows how the pointer variable can point to different string #include <iostream> using namespace std; int main() { char city[20] = "New York"; char *p; p = city; // Point to an existing C-string cout << p << endl; // Print p = "Chicago"; // Point to another C-string cout << p << endl; }
This method’s advantage is highlighted in the above program, i.e., a single pointer can be used to point to different C Strings!
A Common Mistake
When using pointers, it may happen that the pointer does not point to a C String as shown in the example below.
char *pname;
cout << "Enter your name: ";
cin >> pname;
The code above is problematic because the program tries to read the string in the memory pointed by pname, but pname is not initialized; thus, memory leaks occur.
C++ Standard Library: String
The standard library string offers the same functionality as C Strings but is better and easier to use in many ways as it provides a wide range of library functions. Programs using the string class must include the string header file as shown.
#include<string>
We will discuss some of the important library functions used often in our programs. Here s will be the object of the string class, and str will the string passed as an argument.
s.assign(str) | This function appends str to s. Here str can be a string object or a C String. |
s.at(x) | This function returns the character present at a position x. |
s.capacity() | This function returns the size of the memory allocated for the string. |
s.clear() | This function deletes all the characters stored in the string. |
s.compare(str) | This function compares the string object s and the passed in the function and gives return values same as strcmp function. |
s.size() | This function returns the length of the string. |
s.erase(x,n) | This function deletes n number of characters starting at position x. |
s.insert(x,str) | This function inserts the string str starting at position x. |
s.substr(x,n) | This function returns the copy of a substring which starts from position x and is n characters long. |
s.find(str) | This function returns the position where str is found in s. If str is not found, a position beyond the end of s is returned. |
The functions in the above table are only some of the library functions. The string standard library is vast and also has a series of overloaded functions.
Substring Function
A substring is a portion of another bigger string. The function that extracts a substring from a string is.
x = 2; //position from where the function will extract substring
n= 8; //length of the substring that will be extracted
s.substr(x,n)
Here s is the object of a string. The functionality of the substring function is described in the table above. The code below demonstrated an example of how the substr function works.
// This program demonstrates the use of substr function #include <iostream> #include <string> using namespace std; int main() { string str = "Hello World"; cout<<"Substring: "<<str.substr(6,5); return 0; }
Application of Substrings
Extracting a substring might be trivial and not that useful for small programs; however, it can be useful in some fields like Bioinformatics. For example, certain amino acids have special properties for which they can be used, so to detect those amino acid strings, substr function may be used.
Another application can be an RNA secondary structure. We use the Nussinov algorithm to detect substrings and extract substrings from a bigger string in the dataset and then predict the base pairs of a nested structure. Substring or subsequences are also common in dynamic programming.
Conclusion
That’s all about strings and substrings usage in C++. As you can see, it is instrumental in extracting data from a dataset for further processing. What do you think of these functions? Did you find a specific usage that you want to share? Do let us know in the comments below.