In this tutorial, we will learn how to extract dates and times from raw strings. Extracting helpful information like dates is very useful while parsing raw texts such as logs. You can also use an IDE for writing the code. Here, I am using the open-source VS Code IDE provided by Microsoft. VS Code is a cross-platform IDE. We can use it on any platform like Windows, Linux, or macOS.
Using the python re and Datetime Module
Extracting date and time from a string can be done in many ways. One of the most popular ways is by using python’s re and datetime modules. The re module of python is used to work with regular expressions in python. Regular expressions are syntax that can be used to parse the string of a given pattern. The datetime module of python is used to work with dates and times in python. Both the datetime and the re module come preinstalled with python standard library so, we didn’t require any further installation to use these modules.
To extract date and time from a string, we need to use the search() method of the re module with the strptime() method of the datetime library. The strptime() method accepts two parameters. The first parameter is the string that we need to convert to a datetime object. The second parameter is the pattern of the date string. For example, for the date string 2021-9-20, the pattern is %Y-%m-%d. Where %Y represents the year, %m represents the month, and %d represents the date. There are more format specifiers present for working with dates and times in the datetime class. You can get a list of all the format specifiers by referring to the official documentation of the datetime module.
The following code shows the conversion of a date string into a datetime object using the strptime() method of the datetime class.
# Importing the date time class from the datetime module from datetime import datetime # creating a string containing a date string_date = '09-19-2021' # Displaying teh datatype of the date string which is the str class print(type(string_date)) # converting the string into datetime object date_object = datetime.strptime(string_date, '%m-%d-%Y').date() # Displaying the converted datetime object print(date_object) # Displaying the type of the datetime object which is datetime.datetime class print(type(date_object))
In the above code, we first imported the datetime class from the datetime library. Next, we created a data string containing the date in the format month-date-year. After that, we used the strptime() method of the datetime class to convert the date string into a datetime object. At last, we displayed the converted date.
On running the above code, we will get the output as shown in the below image.
The above code will work perfectly when we only have the date or time in the string. But in real case data, we have many other raw texts between which the date we want to extract is present. So in such cases, we need to use the re.search() method to search the date string in the text string. The following code block shows a simple illustration.
# Importing the re module to work wth regular expressions import re # importing the datetime class from the datetime module from datetime import datetime # The raw string text where the date is present text_string = "Python is released in 20-02-1991" # Displaying the raw string print(text_string) # searching the date in the raw string and converting it into an object searched_string = re.search(r'\d{2}-\d{2}-\d{4}', text_string) # displaying the searched string print(searched_string) # Converting the searched string into a datetime object final_date = datetime.strptime(searched_string.group(), '%d-%m-%Y').date() # displaying the datetime object print(final_date) # Dislaying the datatype of the final converted date print(type(final_date))
In the above code, we first imported the required libraries. Next, we created a text string containing a date in the form of a string. After that, we used the search() method of the re module. The search() method is used to search for the pattern of the date in the string. After getting the searched date string, we used the strptime() method of the datetime class to convert the searched string into a datetime object. At last, we displayed the final date object into the terminal using the print() function.
On running the above code, we will get the output as shown in the below image.
Using the dateutil module
The dateutil library can also be used to extract dates from a python text string. The dateutil module comes preinstalled with the python standard library so, we don’t need any other installation. To parse dates from a text string, we need to use the parse() method of the dateutil.parser. The parse() method accepts the text string as the required argument and provides some optional arguments. We can use the parameter dayfirst=True/False of the parse() method. The dayfirst keyword argument is used to know whether the date is at the first element or the month in the date format.
The below code shows a practical demonstration of extracting dates using the parse() method.
# importing the parse method from dateutil.parser import parse # creating a python string contnng date text_1 = "Python is released in 20-02-1991" # creating two more strings text_2 = "This is 10-1-2021" text_3 = "This is 1-10-2021" # parsing the date using the parse method date_1 = parse(text_1, fuzzy=True) # using the kwargs dayfirst to tell that the first # number of the dateformat is the date not the month date_2 = parse(text_2, fuzzy=True, dayfirst=True) date_3 = parse(text_3, fuzzy=True, dayfirst=True) # displaying the parsed date print(date_1) print(date_2) print(date_3) # printing the datatype of the parsed date print(type(date_1))
In the above code, we first imported the parse() method from the dateutil.parser. Next, we created some strings containing dates. Then, we used the imported parse() method to parse the strings and extract dates from them. While parsing the last two strings, we used the dayfirst kwargs of the parse() method. This keyword argument tells that the first number in the date string is the date rather than the month. At last, we used the python print() function to display all the dates to the console.
On running the above code, we will get the output as shown in the below image.
The parse() method can parse different types of dates format from a string. See the following example for illustration.
# importing the parse method from dateutil.parser import parse # creating python strings containing date of different formats text_1 = "Python is released in 20-02-1991" text_2 = "Python is released in 20/02/1991" text_3 = "Python is released in 20 Feb 1991" text_4 = "Python is released in Feb 20 1991" text_5 = "Python is released in 02-20-1991" # Parsing dates from the above strings date_1 = parse(text_1, fuzzy=True) date_2 = parse(text_2, fuzzy=True) date_3 = parse(text_3, fuzzy=True) date_4 = parse(text_4, fuzzy=True) date_5 = parse(text_5, fuzzy=True) # Displaying the parsed dates print(date_1) print(date_2) print(date_3) print(date_4) print(date_5)
In the above code, we first imported the required functions. Next, we created some strings containing dates in different formats. Then, we used the parse method and passed the strings to its argument to extract the dates. At last, we displayed the parsed dates into the console using the print() function.
On running the above code, we will get the output as shown in the below image.
The parse method can also parse dates where both three parameters are not mentioned, and it replaces the unmentioned parameter with today’s date parameter. See the below code for demonstration.
# importing the parse method from dateutil.parser import parse # creating python strings containing incomplete dates text_1 = "The date is 14 Dec" text_2 = "The date is 10 2021" text_3 = "The date is 14" # Parsing dates from the above strings date_1 = parse(text_1, fuzzy=True) date_2 = parse(text_2, fuzzy=True) date_3 = parse(text_3, fuzzy=True) # Displaying the parsed dates print(date_1) print(date_2) print(date_3)
In the above code, first, we imported the required libraries. Next, we created some strings containing incomplete dates. Then we use the parse method to parse dates from the incomplete string dates. The parameter of the dates that are missing will be replaced by today’s date’s parameter. At last, we displayed the parsed dates using the print() function of python.
On running the above code, we will get the output as shown in the below image.
Conclusion
In this tutorial, we learned two ways to parse strings from texts. The first is by using the re and the datetime module. The second is by using the dateutil.parser. You may also want to see our tutorial on how to print without a new line in python.