User-Defined Sets

You can specify a character class or set by enclosing a list of characters in square brackets ('[ ]'). Such a set will match one and only one character that is included in the list. Within the list the dash character ('-') is used to specify a range, as in the serial number example. This is just shorthand to enumerate all the characters in the range. Because ranges are interpreted using the UNICODE value of the characters and such ranges cannot cross UNICODE pages, we suggest limiting the use of ranges to the digits and the 26 lowercase and uppercase letters of the English alphabet. If you wish to include the '-' and ']' characters themselves within the set, either use '\' in front of them to take away their special meaning, or use them in a position where they cannot be treated as special characters. For example, sets '[+\-\]=]' and '[]+=-]' both match one of the characters '-', '+', ']' and '='.

Sometimes it is easier to list the characters that you do not want to match, say "not a space nor a dot". This is a negated character class. The set '[^ .]' defines this: if the first character of a set is the up arrow ('^'), this is removed from the list and the following negated set matches all characters not in the list. Within a set you do not need to precede the dot character with a backslash - its special "any character" meaning is used only outside sets, however, there is no harm in including the backslash.