Open In App

Perl – Extracting Date from a String using Regex

Last Updated : 14 Dec, 2020
Improve
Improve
Like Article
Like
Save
Share
Report

In Perl generally, we have to read CSV (Comma Separated Values) files to extract the required data. Sometimes there are dates in the file name like sample 2014-02-12T11:10:10.csv or there could be a column in a file that has a date in it. These dates can be of any pattern like YYYY-MM-DDThh:mm:ss or dd/mm/yyyy hh.mm.ss. To handle those dates; the Perl scripts should be flexible enough to handle different types of date formats in a string. We need to use regular expression feature for extracting dates from a string. A regular expression is a string of characters that defines the specific pattern or patterns you are viewing. The basic method for applying a regular expression is to use the pattern binding operators =~ and !~. 

In Perl there are multiple libraries available for handling date and time such as Date::Parse and Time::Piece; both of these libraries come with lots of flexible functions to handle the more complex requirement. But these libraries are not part of standard Perl modules you need to install them separately.

For general date formats its good to find specific regular expressions without installing any new library. Let’s go through some examples of parsing a date from a string in Perl.

Before we look at examples for extracting date from a string we should look at these metasymbols that are used in parsing an expression in a string:

^ metacharacter matches the beginning of the string
$  metasymbol matches the end of the string
* matches 0 or more occurrences of preceding expression
+ matches 1 or more occurrence of preceding expression
? matches 0 or 1 occurrence of preceding expression

Here are some brief examples.

/^$/ # nothing in the string (start and end are adjacent)
/(\d\s) {3}/ a three digits, each followed by a whitespace.e.g:6 7 8 
/(a.)+/   matches a string in which every next letter is a
/^\d+/ string starts with one or more digits
/\d+$/ string ends with one or more digits

There is no separate module required for regular expression. Its in-built in Perl (any version). So you should have Perl (any version) installed on your system. We will see some examples to extract date, in a different format, from a string using Perl regex.

Example 1:

In this example we will see how to extract date with pattern, yyyy-mm-ddThh:mm:ss, from a string in Perl. The below example 1 shows a string sample2018-03-21T12:10:10.csv from which we need to extract the date in year, month and date variables to make it usable for further script. 

Here, the regex \d\d\d\d ensures that the date pattern in the string should start with the pattern of 4 digits. If not then it can throw an Uninitialized variable exception because of missing pattern in the string.

What does /d?/d mean? This pattern ensures that month, day, hours, minutes and seconds could be in 1 digit or 2 digits. 

For example:   

2013-9-21T11:3:30

2014-12-3T9:1:10

So /d?/d will ensure that the expression left to ? is optional and it will execute without any error.

Perl




#!/usr/bin/perl
# your code here
my $str = "sample2018-03-21T12:10:10.csv";
my (($year, $month, $day, $hour, $min, $sec) = 
     $str =~ /(\d\d\d\d)-(\d?\d)-(\d?\d)T(\d?\d):(\d?\d):(\d?\d)/);
print "year : $year  month:$month  day:$day - hour:$hour  minute:$min  seconds:$sec\n";


Output:

year : 2018  month:03  day:21 - hour:12  minute:10  seconds:10

Example 2:

In this example, we will see how to extract date with Pattern mm/dd/yyyy hh.mm.ss from a string. The date can be a part of filename or it could be a content. So the following example will help in parsing the date with format mm/dd/yyyy hh:mm:ss from a string. In this example we have taken a string test_28/04/2017 11.00.00 ; where date starts with 2 digits 28 followed by back-slash / 

Here, (\d?\d) regex ensures that the string will start with the pattern with 2 or 1 digit followed by /. The back-slash \ is put in-front of the . to make sure it only matches dots and not every character as it usually does.

Perl




#!/usr/bin/perl
# your code here
my $str1 = "test_28/04/2017 11.00.00";
my (($month1, $day1, $year1, $hour1, $min1, $sec1) = 
     $str1 =~ m{(\d?\d)/(\d?\d)/(\d\d\d\d) (\d\d)\.(\d\d)\.(\d\d)}); 
print "year:$year1  month:$month1  day:$day1 - hour:$hour1  minute:$min1  seconds:$sec1\n";


Output:

year:2017  month:28  day:04 - hour:11  minute:00  seconds:00

Example 3:

Here we will see one more date pattern that’s  {Day}, dd {mon} yyyy hh:mm:ss like Tue,11 Feb 2014 11:01:54 +0100 (CET); Sometimes CSV files have date column value in an above format that’s not readable for Perl operations, so we want to extract year, month and date from this format and use that as required. 

Here, .+?(\d+) regex means there would be some characters before the date digits 11, after that \s(.+?) regex means date is followed by a space and string of characters that’s Feb, s(\d+)/ regex ensures that 11 Feb is followed by a space and multiple digits that’s 2014. We save these values in the variables defined for day, month and year; that can be used in the further script.

Perl




#!/usr/bin/perl
# your code here
my $string = 'Date: Tue, 11 Feb 2014 11:01:57 +0100 (CET)';
my ($day3, $month3, $year3) = $string =~ /Date:.+?(\d+)\s(.+?)\s(\d+)/;
print "Day:$day3 month:$month3 year:$year3\n";


Output:

Day:11 month:Feb year:2014


Like Article
Suggest improvement
Previous
Next
Share your thoughts in the comments

Similar Reads