A regular expression (or regex, or regexp) is a way to describe complex search patterns using sequences of characters.
Regular expressions are patterns that can be matched against strings. Regular expressions are important tools for text processing. Many text editors and most programming languages have some built-in support for regular expressions.
Regular expressions are used for searching through data. They allow you to search for pieces of text that match a certain form, instead of searching for a piece of text identical to the one you supply. For example, the regular expression [0-9]+ allows you to search through a file for any integer number.
Certain characters have special purposes in regular expressions. These are called meta-characters or meta-symbols. Meta-characters are not part of the strings that are matched by a pattern. Instead, they are part of the syntax that is used for representing patterns. Typically, the following characters are meta-characters:
. * | ? + ( ) [ ] { } ^ $ \
These characters have special meaning in regular expressions. For example, parentheses are used for grouping, just as they are in arithmetic. If you want to use a meta-character as a regular character instead of with its special meaning, you have to “escape” it by preceding it with a backslash, such as \*, \(, \$, or \\.
MATCH ANY NUMBER LINE
We’ll start with a very simple example – Match any line that only contains numbers.
^[0-9]+$
Let’s walk through this piece-by-piece.
^
– Signifies the start of a line.[0-9]
– Matches any digit between 0 and 9+
– Matches one or more instance of the preceding expression.$
– Signifies the end of the line
We could replace [0-9] with \d which will do the same thing.
The great thing about this expression (and regular expressions in general) is that it can be used, without much modification, in any programing language.
import re with open('test.txt', 'r') as f: test_string = f.read() regex = re.compile(r'^([0-9]+)$', re.MULTILINE) result = regex.findall(test_string) print(result)