dev-notes/docs/misc/regular-expressions.md

67 lines
2.2 KiB
Markdown
Raw Normal View History

2021-01-31 11:05:37 +01:00
# Common Regex Syntax
2021-02-06 21:37:33 +01:00
## Character Types
`\d` any digit (0-9)
`\D` any non digit character
`\s` whitespace (space, tab, new line)
2021-09-20 19:35:32 +02:00
`\S` any non whitespace characters
`\w` any alphanumeric character (a-z, A-Z)
2021-02-06 21:37:33 +01:00
`\W` any non alphanumeric character
`\b` whitespace surrounding words (only at row start or end)
`\B` whitespace surrounding words (not at row start or end)
`\A` search only at string start
`\Z` search only at string end
2021-09-20 19:35:32 +02:00
`.` any characters but newline (CRLF, CR, LF)
2021-02-06 21:37:33 +01:00
## Quantifiers
`+` one or more repetitions
`*` zero or more repetitions
`?` zero or one repetition
2021-04-02 10:16:04 +02:00
`{m}` exactly *m* times
2021-02-06 21:37:33 +01:00
`{m, n}` at least *m* times, at most *n* times
2021-01-31 11:05:37 +01:00
The `*`, `x`, and `?` qualifiers are all greedy; they match as much text as possible
Adding `?` *after* the qualifier makes it perform the match in non-greedy or minimal fashion; as few characters as possible will be matched.
2021-02-06 21:37:33 +01:00
## Special Characters
2021-01-31 11:05:37 +01:00
2021-04-02 10:16:04 +02:00
`\a, \b, \f, \n, \r, \t, \u, \U, \v, \x, \\, \?, \*, \+ , \., \^, \$` special characters
2021-02-06 21:37:33 +01:00
`\(`, `\)`, `\[`, `\]` brackets escaping
2021-01-31 11:05:37 +01:00
2021-02-06 21:37:33 +01:00
## Delimiters
2021-01-31 11:05:37 +01:00
2021-02-06 21:37:33 +01:00
`^` match must be at start of string/line
`$` match must be at end of string/line
`^__$` match must be whole string
2021-01-31 11:05:37 +01:00
2021-02-06 21:37:33 +01:00
## Character classes
2021-01-31 11:05:37 +01:00
2021-09-20 19:35:32 +02:00
`[__]` one of the characters in the class (`[ab]` --> a or b)
2021-02-06 21:37:33 +01:00
`[__]{m , n}` consecutive characters in the class (`[aeiou]{2}` --> ae, ao, ...)
`[a-z]` sequence of lowercase characters
`[A-Z]` sequence of uppercase characters
2021-04-02 10:16:04 +02:00
`[a-zA-Z]` sequence of lowercase or uppercase characters
2021-09-20 19:35:32 +02:00
`[a-z][A-Z]` sequence of lowercase characters followed by sequence of uppercase characters
2021-02-06 21:37:33 +01:00
`[^__]` anything but the elements of the class (include `\n` to avoid matching line endings)
2021-01-31 11:05:37 +01:00
2021-09-20 19:35:32 +02:00
`^`, `\`, `-` and `]` must be escaped to be used in classes: `[ \]\[\^\- ]`
2021-01-31 11:05:37 +01:00
2021-02-06 21:37:33 +01:00
## Groups
2021-01-31 11:05:37 +01:00
2022-02-16 19:42:10 +01:00
`(__)` capturing group
`(?:__)` non-capturing group
2021-04-02 10:16:04 +02:00
`(REGEX_1 | REGEX_2)` match in multiple regex (R1 OR R2)
2021-02-06 21:37:33 +01:00
`(?=__)` match only if `__` is next substring
`(?!__)` match only if `__` is not next substring
`(?<=__)` match only if `__` is previous substring
`(?<!__)` match only if `__` is not previous substring
2021-01-31 11:05:37 +01:00
2021-04-02 10:16:04 +02:00
`\<number>` refers to n-th group
2021-01-31 11:05:37 +01:00
2021-02-06 21:37:33 +01:00
## Special Cases
2021-01-31 11:05:37 +01:00
2021-04-02 10:16:04 +02:00
`(.*)` match anything
2021-02-06 21:37:33 +01:00
`(.*?)` match anything, non-greedy match