PTR logo

Blog Post

Linux/UNIX - Using grep With Regular Expressions

Linux/UNIX - Using grep With Regular Expressions
MD

Author

Mandy Doward

Date

January 11, 2017

Length

5 mins

Searching for Lines Containing Patterns 

There will be many occasions when you are trying locate a specific set of lines in a file, such as a log file, or perhaps you are trying filter the results that have come back from a Linux or Unix command to just the ones relevant to your specific needs.

The grep command is perfect in these situations and we explore some of it’s capabilities here.

grep – Global Regular Expression Print

Linux and UNIX systems offer three variants of the grep command:

  • grep

  • egrep

  • fgrep

grep supports basic regular expression characters and the other two support some of the more more advanced regular expression characters.

The basic characters supported by grep are:

  • [….], [^….], ^, $, ., *,

Here is a brief description of these special characters

  • list of characters enclosed by [ and ] matches any single character in that list (if first  character  is the caret ^ then it matches any character not in the list)

  • The caret ^ at the start of a string matches and the empty string at the beginning of the line

  • The dollar sign $ at the end of a string matches the empty string at the end of a line 

  • The period .  matches any single character.  

  • The asterisk * matches zero or more occurrences of the previous character

  • The back slash is an escape character

Search for a pattern anywhere in a line

The following example matches all lines in the ps -ef output that have sh anywhere in them:

Search for a pattern at the beginning of a line

The following example matches all lines in the ps -ef output that start with the string ptr:

Search for a pattern at the end of a line

The following example matches all lines in the ps -ef output that end in bash:

Search for a pattern containing a range of characters

The following example matches all lines that contain a number in the range 1 to 6, followed by any single character, followed by a “d”.

We can see that the  first 6 matching lines are matching on the number at the end of the modification time follwed by a space and the d from the first letter of the file/directory name.

Search for a pattern containinga dot

If we wanted to match just the lines that contain a number followed by “.d” then we need to escape the dot “.”

Search for a pattern in a specific “field”

In the following scenario we would like to match all long listing entries for files in /etc that have a size beginning with a 2.  The files in /etc/ that matched this requirement at the time of carryimng out this challenge were as follows:

The first command we put together is:

This matches only two of the lines we are after. The pattern “root   2” has exactly 3 spaces between the string root and 2. The challenge we have here is that we need the string root to indicate which number in the line we are trying to match (otherwise it would potentially match a 2 anywhere in the line and not just the size column), but we then have a varying number of spaces between the string root and the 2. Some have 3, some have 4, some have 5, and so on.

This is a job for asterisk *. Asterisk is effectively a padding character as it applies a replication to the previous character. The following example will match the string root followed by 0 or more spaces:

Now we get all of the files we wanted to match.

Now we add a new file to /etc that is called root2. Running the same command as above will result in this file being matched too:

This is because the asterisk (*) represents zero or more of the previous character. To ensure that we get at least one space before the 2 we must add an extra space (space, space, asterisk):

Now we will get the correct lines.

This command line could be improved further to cater for other directories where there may be varying owners of files:

The above pattern looks for lines that contain a lowercase letter (from the end of the group owner column), followed by one or more spaces, followed by a 2 and then zero or more digits (sizes of single or more digits beginning with a 2), followed by one space (the column separator between the size column and the modification time column, and finally followed by an uppercase letter to ensure it is the modification time column rather than the owner column that is matched.

Command Line Options for grep

The grep command offers a lot of options, here are a few of them:

The following example shows a list of filenames for files in the directory /etc that contain the pattern centos1:

The following example shows matching lines from the set output that contain the string name in any case:

The following example shows all who output lines that do not contain the pattern root:

The following example shows how many lines in each file in and below the /etc/lvm directory contain the pattern centos1:

Boost Your Linux/Unix System Administrator Toolbox

grep is a hugely powerful tool that a Linux or UNIX system administrator cannot live without. egrep extends this to provide even more potential.

We will take a look at egrep and fgrep in some later articles.

If you have any questions do email us at info@ptr.co.uk and if you would like to learn more about Linux and Unix take a look at our Linux and UNIX Training Courses.

Share This Page

MD

Mandy Doward

Managing Director

PTR’s owner and Managing Director is a Microsoft MCSE certified Business Intelligence (BI) Consultant, with over 30 years of experience working with data analytics and BI.

Latest Articles

PTR FAQs

See our FAQs or get in contact for more information on any of our services, solutions, or to start your PTR journey.

Ready to take your business to the next level?

Reach out to our team of experts and learn more about our consultancy and training services.