Character classes
Character classes, also known as a character set, is a regular expression consisting of a group of characters enclosed in brackets. It matches any one of the enclosed characters, irrespective of the order of occurrence of that character in the bracket. For example, [abcdefgh] matches any of the a, b, c, d, e, f, g, or h characters.
In the following example, we print the employee information for those whose names begin with any of the characters enclosed within brackets:
$ awk '/^[ ABCDEFGHIJ ]/{ print }' emp.dat
The output on execution of the given code is as follows:
Jack Singh 9857532312 jack@gmail.com M hr 2000
Jane Kaur 9837432312 jane@gmail.com F hr 1800
Eva Chabra 8827232115 eva@gmail.com F lgs 2100
Amit Sharma 9911887766 amit@yahoo.com M lgs 2350
Julie Kapur 8826234556 julie@yahoo.com F Ops 2500
Ana Khanna 9856422312 anak@hotmail.com F Ops 2700
Hari Singh 8827255666 hari@yahoo.com M Ops 2350
John Kapur 9911556789 john@gmail.com M hr 2200
Billy Chabra 9911664321 bily@yahoo.com M lgs 1900
Ginny Singh 9857123466 ginny@yahoo.com F hr 2250
Emily Kaur 8826175812 emily@gmail.com F Ops 2100
Amy Sharma 9857536898 amys@hotmail.com F Ops 2500
We can specify ranges of characters in abbreviated form by using a hyphen. The character immediately to the left of the hyphen defines the beginning of the range and the character immediately to the right defines the end. Thus, the preceding example can be rewritten using a hyphen as follows:
$ awk '/^[ A-J ]/{ print }' emp.dat
The output on execution of the preceding code is as follows:
Jack Singh 9857532312 jack@gmail.com M hr 2000
Jane Kaur 9837432312 jane@gmail.com F hr 1800
Eva Chabra 8827232115 eva@gmail.com F lgs 2100
Amit Sharma 9911887766 amit@yahoo.com M lgs 2350
Julie Kapur 8826234556 julie@yahoo.com F Ops 2500
Ana Khanna 9856422312 anak@hotmail.com F Ops 2700
Hari Singh 8827255666 hari@yahoo.com M Ops 2350
John Kapur 9911556789 john@gmail.com M hr 2200
Billy Chabra 9911664321 bily@yahoo.com M lgs 1900
Ginny Singh 9857123466 ginny@yahoo.com F hr 2250
Emily Kaur 8826175812 emily@gmail.com F Ops 2100
Amy Sharma 9857536898 amys@hotmail.com F Ops 2500
We can print the info of employees whose names begin with 'Ja' and are followed by any two characters, as follows:
$ awk '/Ja[a-z][a-z]/{ print }' emp.dat
The output on execution of this code is as follows:
Jack Singh 9857532312 jack@gmail.com M hr 2000
Jane Kaur 9837432312 jane@gmail.com F hr 1800
Similarly, you can print the info of employees whose salary is either 2300 or 2500, as follows:
$ awk '/2[35]00/{ print }' emp.dat
The output on execution of the preceding code is as follows:
Julie Kapur 8826234556 julie@yahoo.com F Ops 2500
Victor Sharma 8826567898 vics@hotmail.com M Ops 2500
Sam khanna 8856345512 sam@hotmail.com F lgs 2300
Amy Sharma 9857536898 amys@hotmail.com F Ops 2500
Vina Singh 8811776612 vina@yahoo.com F lgs 2300
Without both left- and right-range characters, a hyphen in a character class denotes itself, hence the [-] character classes match -:
$ echo -e "-\n
+\n
a\n
b" | awk '/[-]/'
The output on execution of this code is as follows:
-
We can also put a hyphen at the beginning or end of a range-specified character class to match the hyphen itself, as shown here:
$ echo -e "-\n
+\n
a\n
b" | awk '/[a-z-]/'
This can also be performed as follows:
$ echo -e "-\n
+\n
a\n
b" | awk '/[-a-z ]/'
The output on execution of the preceding code is as follows:
-
a
b
The only metacharacters valid inside the bracket expression are '\', ']', '-', or '^'. We have to put a '\' in front of them to use them inside character classes.
As in the case of anchors, if they are not placed at the appropriate position in regular expressions they lose their meaning, the same is true for the hyphen, '-', and also ']'. For example:
$ echo -e "-\n
+\n
a\n
b\n
]\n
\\" | awk '/[\^ab\-\]\\]/'
The output on execution of this code is as follows:
-
a
b
]
\
A summary of the character classes is as follows:
Pattern |
Matches |
[f-k] |
Matches any single character between [fghijk] |
[0-9] |
Matches any single digit between [0123456789] |
[-] |
Matches a hyphen |
[0-9-] |
Matches any number or a hyphen |
[-0-9] |
Matches any number or a hyphen |
[]0-9] |
Matches any number or a ] |
[0-9]] |
Matches any number followed by ] |
[0-9\-\]] |
Matches any number, a hyphen, or ] |
[0-9\\] |
Matches any number or backslash, "\" |
[\^0-9] |
Matches any number or caret, "^" |
[a-z] |
Matches any small letter |
[A-Z] |
Matches any capital letter |
[a-zA-Z] |
Matches any alphabet |
[a-zA-Z0-9] |
Matches any alphanumeric character |
[5-9G-Lr-z] |
Matches any single character among [56789GHIJKLrstuvwxyz] |
[a-zA-Z][0-9] |
Matches a letter followed by a digit |
[a-zA-Z-]+ |
Matches a letter that includes a hyphen |