SRELL ~ Regular Expression Template Library for C++

Features

Header-only and the same class design as std::regex

SRELL is a header-only template libarary and does not need any installation. SRELL has an ECMAScript (JavaScript) compatible regular expression engine wrapped into the same class design as std::regex. As APIs are compatible, SRELL can be used in the same way as std::regex (or boost::regex on which std::regex is based).

Unicode-specific implementation

SRELL has native support for Unicode:

UTF-8, UTF-16, and UTF-32 strings can be handled without any additional configurations.
'.' does not match a half of a surrogate pair in UTF-16 strings or does not match a code unit in UTF-8 strings.
Supplementary Characters can be specified in a character class such as [丈𠀋], and a range also can be specified in a character class such as [\u{1b000}-\u{1b0ff}].
When the case-insensitive match is performed, even characters having two lowercase letters for one uppercase letter such as Greek Σ (u+03c2 [ς] and u+03c3 [σ]) or having the third case called "titlecase" besides the uppercase and the lowercase such as Ǆ (uppercase; Ǆ, lowercase; ǆ and titlecase; ǅ) are processed appropriately.

Consideration for ignore-case (icase) search

SRELL has been tuned up not to slow down remarkably when case-insensitive (icase) search is performed.

As std::regex was proposed early for C++0x (now C++11), it is little dependent on C++11's new features. So SRELL should be available with even pre-C++11 compilers as far as they interpret C++ templates accurately. (The oldest compiler on where I confirm SRELL can be used is Visual C++ 2005 in Visual Studio 2005).

Download

SRELL 4.130 (BSD License) 14 Dec 2025 (Revision History)
Link to the latest version (mainly for creating a library installer)

Known Issue

251216: I received a report that GCC/Clang's ubsan (undefined behaviour sanitizer) warns several undefined behaviours in SRELL. Thanks to Avi Hayoun for the report.
Tentative Patches for this problem: For SRELL 4.130 ordinary version, single-header version (only different in line numbers). The next release will accommodate fixes for this problem.

Releases
Related Section: Possible future changes

About the version number: The third decimal place in version numbers is no longer used. But to avoid confusion, the format of three decimal places is kept until the major version number is incremented (i.e., 4.080, 4.090, 4.100, ... not 4.08, 4.09, 4.10, ...).

How to use

No preparation is required. Place srell*.h* (the three files of srell.hpp, srell_ucfdata2.h, and srell_updata3.h) somewhere in your PATH and include srell.hpp.

If you have used <regex>, you already know how to use SRELL generally.

//  Example 01:
#include <cstdio>
#include <string>
#include <iostream>
#include "srell.hpp"

int main()
{
    srell::regex e;     //  Regular expression object holder.
    srell::cmatch m;    //  Object which receives results.

    e = "\\d+[^-\\d]+"; //  Compile a regular expression string.
    if (srell::regex_search("1234-5678-90ab-cdef", m, e))
    {
        //  If use printf.
        const std::string s(m[0].first, m[0].second);
            //  The code above can be replaced with one of the following lines.
            //  const std::string s(m[0].str());
            //  const std::string s(m.str(0));
        std::printf("result: %s\n", s.c_str());

        //  If use iostream.
        std::cout << "result: " << m[0] << std::endl;
    }
    return 0;
}

As in this example, all classes and algorithms that belong to SRELL have been put within namespace "srell". Except for this point, the usage is basically identical to std::regex.

Please see also readme_en.txt included in the zip archive.

Syntax

SRELL supports the expressions defined in the RegExp (Regular Expression) Objects section in the latest draft of the ECMAScript Specification.

By default, the u flag is assumed to be always set. Starting with version 4.000, SRELL supports also v flag mode, which is turned on by passing the unicodesets flag to the pattern compiler (srell::basic_regex). For the details of the v mode, see the proposal page.

The detailed list of supported expressions is as follows:

List of Regular Expressions supported by SRELL
Characters
.	Default: Matches any character but LineTerminator code points (U+000A, U+000D, U+2028, and U+2029), i.e., corresponding to `[^\u000A\u000D\u2028\u2029]`. With the `dotall` option flag: Matches every code point, i.e., equivalent to `[\0-\u{10ffff}]`. Note that when `dotall` is set, `.*` matches all the remaining characters in the subject.
\0	Matches NULL (`\u0000`).
\t	Matches Horizontal Tab (`\u0009`).
\n	Matches Line Feed (`\u000a`).
\v	Matches Vertical Tab (`\u000b`).
\f	Matches Form Feed (`\u000c`).
\r	Matches Carriage Return (`\u000d`).
\cX	Matches a control character corresponding to (`(the code point value of X) & 0x1f`) where X is one of `[A-Za-z]`. If `\c` is not followed by one of A-Z or a-z, then `error_escape` is thrown.
\\	Matches a backslash (`\u005c`) itself.
\xHH	Matches a character whose code unit value in UTF-16 is the two hexadecimal digits `HH`. If `\x` is not followed by two hexadecimal digits `error_escape` is thrown. Because code unit values `0x00`-`0xFF` in UTF-16 represent U+0000-U+00FF respectively, `HH` in this expression virtually represents a code point.
\uHHHH	Matches a character whose Unicode code point is the four hexadecimal digits `HHHH`. If `\u` is not followed by four hexadecimal digits `error_escape` is thrown. SRELL 2.500-: When sequential `\uHHHH` escapes represent a valid surrogate pair in UTF-16, they are interpreted as a Unicode code point value. For example, `/\uD842\uDF9F/` is interpreted as being equivalent to `/\u{20B9F}/`.
\u{H...}	Matches a character whose Unicode code point is identical to the value represented by one or more hexadecimal digits `H...`. If the inside of `{}` in `\u{...}` is not one or more hexadecimal digits, a value represented by the hexadecimal digits exceeds the max value of Unicode code points (`0x10FFFF`), or the closing curly bracket `'}'` does not exist, then `error_escape` is thrown. Note: This expression has been available since ECMAScript 6.0. In SRELL up to version 2.001, `H...` in `\u{H...}` was limited to "one to six hexadecimal digits". This is because this feature was implemented based on the proposal document, and the change that was made to the text when the proposal was approved formally was overlooked.
\	When a `\` is followed by one of `^ $ . * + ? ( ) [ ] { } \| /`, the sequence represents the following character itself. Id est, prefixing `\` removes the special meaning of these characters and making the pattern compiler interpret the character literally. (The reason why `'/'` is also included in the list is because a sequence of regular expressions is enclosed by `//` in ECMAScript.) In the character class mentioned below, `'-'` also becomes a member of this group in addition to the fourteen characters above and can be used as `"\-"`. Note: In the `u` flag mode of ECMAScript, all the combinations of `\` and some-letter are reserved. You cannot expect that if `\` SOME-LETTER does not have any special meaning, the sequence is treated as SOME-LETTER itself. An arbitrary combination of `\` and something causes `error_escape` to be thrown.
Any character but ^$.*+?()[]{}\|\/	Represents that character itself.
Alternatives
A\|B	Matches a sequence of regular expressions A or B. An arbitrary number of `'\|'` can be used to separete expressions, such as `/abc\|def\|ghi?\|jkl?/`. Each sequence of regular expressions separeted by `'\|'` is tried from left to right, and only the sequence that first succeeds in matching is adopted. For example, when matching `/abc\|abcdef/` against `"abcdef"`, the result is `"abc"`.
Character Class
[]	A character class. A set of characters: `[ABC]` matches `'A'`, `'B'`, and `'C'`. `[^DEF]` matches any character but `'D'`, `'E'`, `'F'`. When the first charcter in `[]` is `'^'`, any character being not included in `[]` is matched. I.e., `'^'` as the first character means negation. `[G^H]` matches `'G'`, `'^'`, and `'H'`. `'^'` that is not the first character in `[]` is treated as an ordinary character. `[I-K]` matches `'I'`, `'J'`, and `'K'`. The sequence CH1-CH2 represents "any character in the range from the Unicode code point of CH1 to the code point of CH2 inclusive". `[-LM]` matches `'-'`, `'L'`, and `'M'`. `'-'` that does not fall under the condition above is treated as an ordinary character. `[N-P-R]` matches `'N'`, `'O'`, `'P'`, `'-'`, and `'R'`; does not match `'Q'`. `'-'` following a range sequence represents `'-'` itself. `[S\-U]` matches `'S'`, `'-'`, and `'U'`. `'-'` escaped by `\` is treated as `'-'` itself (`"\-"` is available only in the character class). `[.\|({]` matches `'.'`, `'\|'`, `'('`, and `'{'`. These characters lose their special meanings in `[]`. `[]` is the empty class. It does not match any code point. This expression always makes matching fail whenever it occurs. `[^]` is the complementary set of the empty class. Thus it matches any code point. The same as `[\0-\u{10FFFF}]`. Examples when case insensitive match is performed (when the `icase` flag is set): `[E-F]` matches `'E'`, `'F'`, `'e'`, and `'f'`; all the characters in the range from `'E'` (u+0045) to `'F'` (u+0046) inclusive, and the ones regarded as the same character as any in this range when Unicode case folding is applied to. `[E-f]` matches `'A'` to `'Z'`, `'a'` to `'z'`, `'['`, `'\'`, `']'`, `'^'`, `'_'`, '`', `'ſ'`, and `'K'`; all the characters in the range from `'E'` (u+0045) to `'f'` (u+0066) inclusive, and the ones regarded as the same character as any in this range when Unicode case folding is applied to. Although `']'` immediately after `'['` is counted as a `']'` itself in Perl's regular expression, there is not such a special treatment in ECMAScript's RegExp. To include `']'` in a character class, it is always needed to escape like `"\]"` by prefixing a `'\'` to `']'`. If regular expressions contain a mismatched `'['` or `']'`, `error_brack` is thrown. If regular expressions contain an invalid character range such as `[b-a]`, `error_range` is thrown.
[]	In the v mode (when the `unicodesets` flag is specified), in addition to the features explained above (called union), the following features are available in the character class: Intersection by `&&`: CC1&&CC2 represents "any character that is included in both the character classes CC1 and CC2". For example, `[\p{sc=Latin}&&\p{Ll}]` matches any character that belongs to the Latin script (`\p{sc=Latin}`) and is a lower letter (`\p{Ll}`). Difference/subtraction by `--`: CC1--CC2 represents "any character that is included in the character class CC1, but is NOT included in the character class CC2. For example, `[\p{sc=Latin}--\p{Ll}]` matches any character that belongs to the Latin script (`\p{sc=Latin}`) and is NOT a lower letter (`\p{Ll}`). By using `\q{...}`, strings can be contained in a character class. For example, `[a-z\q{ch\|th\|ph}]` matches any single character in the range `[a-z]`, or the sequences `ch`, `th`, or `ph`. When strings are included in a character class, it is ensured that longest strings matched first. Consequently, the previous example is virtually equivalent to `(?:ch\|th\|ph\|[a-z])`. `\q{...}` can be used as an operand of the operations (union, intersection, and difference). `[]` can be nested and used as an operand of the operations. For example, `[\p{sc=Latin}--[a-z]]` matches any character that belongs to the Latin script (`\p{sc=Latin}`) and is NOT in the range `[a-z]`. Per level of `[...]`, only one type of operator can be used. (Suppose that in the following examples, `A`, `B`, `C`, `D` represent an arbitrary character class each): `[AB--CD]`: Error. SRELL throws `error_operator`, because after the union operation was used at `AB`, a different type of operator, `--` appeared. `[[AB]--[CD]]`: OK. `[A[B--C]D]`: OK. `[\p{sc=Latin}--\p{Lu}--[a-z]]`: OK. Using one type of operator multile times does not cause an error. In the v mode, the eight characters `( ) [ { } / - \|` cannot be written directly in a character class. They need to be escaped by placing `\` in front of themselves; otherwise, SRELL throws `error_noescape` (Note: Regardless of the u/v modes, `]` needs to be escaped always in the character class). Moreover, the following 18 double punctuators are reserved in the vmode for future use. They cannot be written in []. If written, SRELL throws `error_operator`. `!!`, `##`, `$$`, `%%`, `**`, `++`, `,,`, `..`, `::`, `;;`, `<<`, `==`, `>>`, `??`, `@@`, `^^`, ``, `~~`
Predefined Character Classes
\d	Equivalent to `[0-9]`. This expression can be used also in a character class, such as `[\d!"#$%&'()]`.
\D	Equivalent to `[^0-9]`. This can be used in a character class, as well as `\d`.
\s	Equivalent to `[ \t\n\v\f\r\u00a0\u1680\u2000-\u200a\u2028-\u2029\u202f\u205f\u3000\ufeff]`. This can be used in a character class, too. Note: Strictly speaking, this consists of the union of WhiteSpace and LineTerminator. Whenever some code point(s) were to be added to category Zs in Unicode, the number of code points that `\s` matches is increased.
\S	Equivalent to `[^ \t\n\v\f\r\u00a0\u1680\u2000-\u200a\u2028-\u2029\u202f\u205f\u3000\ufeff]`. This can be used in a character class, too.
\w	Equivalent to `[0-9A-Za-z_]`. This can be used in a character class, too.
\W	Equivalent to `[^0-9A-Za-z_]`. This can be used in a character class, too.
\p{...}	Matches any character that has the Unicode property specified in "`...`". For example, `\p{scx=Latin}` matches every character defined as a Latin letter in Unicode. This expression can be used also in a character class. For the details about what can be specified in "`...`", see the tables in the latest draft of the ECMAScript specification.. In the v mode, properties of strings (Unicode properties that match sequences of characters) are also supported. They can be used also in the character class, except negated character classes (`[^...]`). If used in a negated character class, SRELL throws `error_complement`. Compatibility: Introduced in ES2018/ES9.0 and available since SRELL 2.000. Properties of strings were introduced in ES2024 and available since SRELL 4.000.
\P{...}	Matches any character that does not have the Unicode property specified in "`...`". This can be used in a character class, too. Unlike `\p` above, even in the v mode `\P{...}` supports only properties that match single characters, does not support properties of strings. If any property name that represents a property of strings is specified in `\P{...}`, SRELL throws `error_complement`. Compatibility: Introduced in ES2018/ES9.0 and available since SRELL 2.000. Note: When `icase` (case-insensitive) matching is performed, `\P{...}` may represent different character sets between the u mode and the v mode. See here for details.
Quantifiers
* *?	Repeats matching the preceding expression 0 or more times. `` tries to match as many as possible, whereas `?` tries to match as few as possible. If this appears without a preceding expression, `error_badrepeat` is thrown. This applies to the following five also.
+ +?	Repeats matching the preceding expression 1 or more time(s). `+` tries to match as many as possible, whereas `+?` tries to match as few as possible.
? ??	Repeats matching the preceding expression 0 or 1 time(s). `?` tries to match as many as possible, whereas `??` tries to match as few as possible.
{n}	Repeats matching the preceding expression exactly `n` times. If regular expressions contain a mismatched `'{'` or `'}'`, `error_brace` is thrown. This applies to the following two also.
{n,} {n,}?	Repeats matching the preceding expression at least `n` times. `{n,}` tries to match as many as possible, whereas `{n,}?` tries to match as few as possible.
{n,m} {n,m}?	Repeats matching the preceding expression `n` time at least and `m` times at most. `{n,m}` tries to match as many as possible, whereas `{n,m}?` tries to match as few as possible. If an invalid range in {} is specified like `{3,2}`, `error_badbrace` is thrown.
Brackets and backreference
(...)	Grouping of regular expressions and capturing the string matched with them. Every pair of capturing brackets is assigned with a number starting from 1 in the order that its left roundbracket `'('` appears leftwards in the entire sequence of regular expressions, and the substring matched with the regular expressions enclosed by the pair can be referenced by the number from other position in the expressions. If regular expressions contain a mismatched `'('` or `')'`, `error_paren` is thrown. When a pair of capturing roundbrackets itself is bound with a quantifier or it is inside another pair of brackets having a quantifier, the captured string by the pair is cleared whenever a repetition happens. Thus, any captured string cannot be carried over to the next loop. For example, when `/(?:(a)\|(b))+/` matches something, either of `\1` or `\2` is empty.
\N (N is a positive integer)	Backreference. When `'\'` is followed by a number that begins with 1-9, it is regarded as a backreference to a string captured by `(...)` assigned with the corresponding number and matching is performed with that string. If a pair of brackets assigned with Number `N` do not exist in the entire sequence of regular expressions, `error_backref` is thrown. For example, `/(TO\|to)..\1/` matches `"TOMATO"` or `"tomato"`, but does not match `"Tomato"`. In RegExp of ECMAScript, capturing brackets are not required to appear prior to its corresponding backreference(s). So expressions such as `/\1(abc)/` and `/(abc\1)/` are valid and not treated as an error. When a pair of brackets does not capture anything, it is treated as having captured the special `undefined` value. A backreference to `undefined` is equivalent to an empty string, matching with it always succeeds.
(?<NAME>...)	Identical to `(...)` except that the substring matched with the regular expressions inside a pair of brackets can be referenced by the name `NAME` as well as the number assigned to the pair of the brackets. For example, in the case of `/(?<year>\d+)\/(?<month>\d+)\/(?<day>\d+)/`, the string captured by the first pair of parentheses can be referenced by either `\1` or `\k<year>`. The same group name can be re-used when all of them occur in different alternatives separated by `'\|'`, like `/(?<year>\d{4})-\d{1,2}\|\d{1,2}-(?<year>\d{4})/`. Compatibility: Introduced in ES2018/ES9.0 and available since SRELL 2.000. The feature of re-using the same group name (duplicate named capturing groups) was introduced in ES2025, SRELL supports since version 4.043.
\k<NAME>	References to a substring captured by the pair of brackets named `NAME`. If the pair of corresponding brackets does not exist in the entire sequence of regular expressions, `error_backref` is thrown. Compatibility: Introduced in ES2018/ES9.0 and available since SRELL 2.000.
(?:...)	Grouping. Unlike `(...)`, this does not capture anything but only do grouping. So assignment of a number for backreference is not performed. For example, `/tak(?:e\|ing)/` matches `"take"` or `"taking"`, but does not capture anything for backreference. Usually, this is somewhat faster than `(...)`.
Flag modifiers
(?ims-ims:...)	Bounded forms of syntax option flag modifiers, which enable or disable flags only in the subexpression `...` in the same group. `(?i:...)` sets the `icase` flag only in the group. `(?m:...)` sets the `multiline` flag only in the group. `(?s:...)` sets the `dotall` flag only in the group. `(?-i:...)` clears (unsets) the `icase` flag only in the group. `(?-m:...)` clears the `multiline` flag only in the group. `(?-s:...)` clears the `dotall` flag only in the group. These can be combined like `(?ims:)`, `(?im-s:)`. However, it is not permitted that the same flag letter appears more then once in the same group, such as `(?ii:)(?i-i:)`. In this case, `error_modifier` is thrown. Compatibility: Introduced in ES2025, implemented in SRELL 4.045 (but disabled by default) and enabled by default since version 4.058.
(?imsvy-imsvy)	* This feature is an extension and not part of the ECMAScript specification. This feature can be disabled by defining `SRELL_NO_UBMOD`. Embedded syntax option flags. `(?i)` makes as if the `icase` flag is set. `(?m)` makes as if the `multiline` flag is set. `(?s)` makes as if the `dotall` flag is set. `(?v)` makes as if the `unicodesets` flag is set. `(?y)` makes as if the `sticky` flag is set. `(?n)` makes as if the `nosubs` flag is set (experimental). As in `(?-ms)`, flags corresponding to letters that appear after `-` are unset. srell::regex re("(?i)"); printf("icase? %s\n", re.flags() & srell::regex::icase ? "yes" : "no"); // icase? yes re.assign("(?-i)", srell::regex::icase); printf("icase? %s\n", re.flags() & srell::regex::icase ? "yes" : "no"); // icase? no These can be combined like `(?i-ms)`. However, it is not permitted that the same flag letter appears more then once in the same pair of brackets, such as `(?ii)(?i-i)`. In this case, `error_modifier` is thrown. This expression can be used only at the beginning of a regular expression (the same as Python 3.11-). If used elsewhere, `error_modifier` is thrown. [Note 1] Available since SRELL 4.007. `v` and `y` are available since 4.070. `n` is available since 4.080. [Note 2] Re `(?n-n)`: ECMAScript does not have a feature corresponding to the `nosubs` flag or the `//n` flag. Adopting the letter n is based on Perl and .NET which have a similar feature. If ECMAScript were to use the n flag for a different feature, SRELL's `(?n-n)` could come to have a different meaning accordingly. Thus, this feature is "experimental".
Assertions
^	Matches at the beginning of the string. When the `multiline` option is specified, `^` also matches every position immediately after one of LineTerminator.
$	Matches at the end of the string. When the `multiline` options is specified, `$` also matches every position immediately before one of LineTerminator.
\b	Out of a character class: matches a boundary between `\w` and `\W`. Inside a character class: matches BEL (`\u0008`).
\B	Out of a character class: matches any boundary where `\b` does not match. Inside a character class: `error_escape` is thrown.
(?=...)	A zero-width positive lookahead assertion. For example, `/a(?=bc\|def)/` matches `"a"` followed by `"bc"` or `"def"`, but only `"a"` is counted as the matched string.
(?!...)	A zero-width negative lookahead assertion. For example, `/a(?!bc\|def)/` matches `"a"` not followed by `"bc"` nor `"def"`. Incidentally, expression `/&(?!amp;\|lt\|gt\|#)/` would be useful to find and escape bare `'&'`s when source code in where many `'&'`s are used is copied to a HTML file.
(?<=...)	A zero-width positive lookbehind assertion. For example, `/(?<=bc\|de)a/` matches `"a"` following `"bc"` or `"de"`, but only `"a"` is counted as the matched string and `"bc"` or `"de"` is not. Note: In SRELL 1, the number of characters matched with regular expressions inside a lookbehind assertion must be a fixed-length, such as `/(?<=abc\|def)/`, `/(?<=\d{2})/`; otherwise `error_lookbehind` is thrown. This restriction does not exist in SRELL 2.000 or later.
(?<!...)	A zero-width negative lookbehind assertion. For example, `/(?<!bc\|de)a/` matches `"a"` not following `"bc"` nor `"de"`. Note: In SRELL 1 the number of characters matched with regular expressions inside a lookbehind assertion must be a fixed-length; otherwise `error_lookbehind` is thrown. This restriction does not exist in SRELL 2.000 or later.

Footnotes

When a sequence of regular expressions ends with a bare '\', or a combination of the backslash and a character which is not explained in the table above appears, error_escape is thrown. Up to version 2.300, SRELL was interpreting the latter as "representing the following character itself", but since version 2.301 handles it as an error in accordance with the ECMAScript specification.
When you want to write a digit character immediately after a backreference, there are several solutions to prevent them from being interpreted together: 1) write the backreference inside a non-capturing group that consists of that backreference only, such as /(?:\1)0/, 2) write the digit character using its code point, such as /\1\u0030/, or 3) write the digit character inside a character class that consists of that character only, such as /\1[0]/. SRELL's pattern compiler translates both types of the expressions into the same internal representation.
The ECMAScript specification does not define any regular expression for the octal escape sequence like \ooo and \0ooo. See also the ECMAScript Specification.
Even not in the v-mode, the AND (intersection) operation of character classes can be substituted by the positive lookahead assertion (for example, /(?=\p{sc=Latin})\p{Ll}/ means any lower case letter of the Latin script). Similarly, the subtraction can be substituted by the negative lookahead assertion (for example, /(?!\p{sc=Latin})\p{Ll}/ means any lower case letter that is not of the Latin script).

Extensions to std::regex

Unicode support

For Unicode support, SRELL has the following typedefs and extensions that do not exist in <regex>:

Typedef list of three basic classes (`basic_regex`, `match_results`, `sub_match`)
Prefix and interpretation of string	Type of T	`basic_regex<T> (-regex)`	`match_results<T> (-cmatch) (-smatch)`	`sub_match<T> (-csub_match) (-ssub_match)`	Note
`u8- (UTF-8)`	`char8_t` or `char`	`u8regex`	`u8cmatch` `u8smatch`	`u8csub_match` `u8ssub_match`	Specialised with `char8_t` when the compiler supports C++20 or later; otherwise with `char`. In the latter case, they are just aliases to `u8c-` types shown below.
`u16- (UTF-16)`	`char16_t`	`u16regex`	`u16cmatch` `u16smatch`	`u16csub_match` `u16ssub_match`	Defined only when the compiler supports C++11 or later.
`u32- (UTF-32)`	`char32_t`	`u32regex`	`u32cmatch` `u32smatch`	`u32csub_match` `u32ssub_match`	Defined only when the compiler supports C++11 or later.
`u8c- (UTF-8)`	`char`	`u8cregex`	`u8ccmatch` `u8csmatch`	`u8ccsub_match` `u8cssub_match`
`u16w- (UTF-16)`	`wchar_t`	`u16wregex`	`u16wcmatch` `u16wsmatch`	`u16wcsub_match` `u16wssub_match`	Defined only when `0xFFFF` <= `WCHAR_MAX` < `0x10FFFF`.
`u32w- (UTF-32)`		`u32wregex`	`u32wcmatch` `u32wsmatch`	`u32wcsub_match` `u32wssub_match`	Defined only when `WCHAR_MAX` >= `0x10FFFF`.
`u1632w-`		`u1632wregex`	`u1632wcmatch` `u1632wsmatch`	`u1632wcsub_match` `u1632wssub_match`	Aliases to `u16w-` or `u32w-` types above, depending on the value of `WCHAR_MAX`.

The meaning of each prefix is as follows:

u8: meaning changes depending on whether your compiler supports the char8_t type (detected by checking if __cpp_char8_t is defined):
- If char8_t supported: handles an array of char8_t or an instance of std::u8string as a UTF-8 string.
- If char8_t not supported: identical to the "u8c-" prefix. Defined as mere aliases to "u8c-" types shown below.
By varying as above, "u8-" prefix types are always suitable for UTF-8 string literals (u8"...") in code for both before and after C++20, in which the type of u8"..." was changed from char to char8_t.
u16: handles an array of char16_t or an instance of std::u16string as a UTF-16 string. Suitable for UTF-16 string literals (u"...").
u32: handles an array of char32_t or an instance of std::u32string as a UTF-32 string. Suitable for UTF-32 string literals (U"...").

u8c: handles an array of char or an instance of std::string as a UTF-8 string. (Introduced in SRELL version 2.100. Until version 2.002, the "u8-" prefix was used for this kind of type.)
u16w: handles an array of wchar_t or an instance of std::wstring as a UTF-16 string. (Defined only when WCHAR_MAX is equal to or more than 0xFFFF and less than 0x10FFFF.)
u32w: handles an array of wchar_t or an instance of std::wstring as a UTF-32 string. (Defined only when WCHAR_MAX is equal to or more than 0x10FFFF.)
u1632w: When 0xFFFF <= WCHAR_MAX < 0x10FFFF, identical to u16w- above. When WCHAR_MAX >= 0x10FFFF, identical to u32w- above. Unlike u16w- and u32w-, these u1632w- types are always defined on condition that WCHAR_MAX >= 0xFFFF. Types of this prefix are available in SRELL version 2.930 and later.

* For u16w- types and u32w- types, only either of them is provided depending on the value of WCHAR_MAX. Because I realised later that this had affected the portability of code, u1632w- types were introduced in SRELL 2.930.

Although omitted from the table above, regex_iterator, regex_iterator2, and regex_token_iterator also have typedefs that have u(8c?|16w?|32w?|u1632w) prefixes similarly, based on these rules above.

Basic use of Unicode support versions is as follows:

srell::u8regex u8re(u8"UTF-8 Regular Expression");
srell::u8cmatch u8cm;   //  -smatch instead of -cmatch if target string is of basic_string type. And so on.
std::printf("%s\n", srell::regex_search(u8"UTF-8 target string", u8cm, u8re) ? "found!" : "not found...");

srell::u16regex u16re(u"UTF-16 Regular Expression");
srell::u16cmatch u16cm;
std::printf("%s\n", srell::regex_search(u"UTF-16 target string", u16cm, u16re) ? "found!" : "not found...");

srell::u32regex u32re(U"UTF-32 Regular Expression");
srell::u32cmatch u32cm;
std::printf("%s\n", srell::regex_search(U"UTF-32 target string", u32cm, u32re) ? "found!" : "not found...");

srell::u1632wregex u1632wre(L"UTF-16 or UTF-32 Regular Expression");
srell::u1632wcmatch u1632wcm;
std::printf("%s\n", srell::regex_search(L"UTF-16 or UTF-32 target string", u1632wcm, u1632wre) ? "found!" : "not found...");

srell::u16wregex u16wre(L"UTF-16 Regular Expression");
srell::u16wcmatch u16wcm;
std::printf("%s\n", srell::regex_search(L"UTF-16 target string", u16wcm, u16wre) ? "found!" : "not found...");
    //  The three lines above and the ones below are mutually exclusive.
    //  If wchar_t is less than 21-bit, the ones above are available;
    //  if equal to or more than, the ones below are available.
srell::u32wregex u32wre(L"UTF-32 Regular Expression");
srell::u32wcmatch u32wcm;
std::printf("%s\n", srell::regex_search(L"UTF-32 target string", u32wcm, u32wre) ? "found!" : "not found...");

syntax_option_type

The following flag option has been added:

namespace regex_constants
{
    static const syntax_option_type dotall;  //  (Since SRELL 2.000)
        //  Single-line mode. If specified, the behaviour of '.' is changed.
        //  This corresponds to the s flag (/.../s) in ECMAScript 2018 (ES9.0) and later.

    static const syntax_option_type unicodesets;  //  (Since SRELL 4.000)
    static const syntax_option_type vmode;  //  (Since SRELL 4.066. Alias to above)
        //  For using v mode.

    static const syntax_option_type sticky;  //  (Since SRELL 4.049)
        //  Search with a basic_regex object created with this flag is performed
        //  as if the match_continuous flag is set implicitly. It corresponds to
        //  the y flag (/.../y) in ECMAScript.
        //  Note that a regular expression object created with this flag is not suitable
        //  for using with regex_iterator, regex_iterator2, or regex_token_iterator,
        //  because matching is tried only at begin in the subject range [begin, end).

    static const syntax_option_type quiet;  //  (Since SRELL 4.066)
        //  Prevents throwing exceptions of the regex_error type, in both pattern
        //  compiling time and text matching time. A compilation error can be known
        //  via basic_regex::ecode(), and an error in the matching time can be
        //  known via match_results::ecode().
}

Like the other values of the syntax_option_type type, these values are also defined in basic_regex.

The benefit of the sticky flag is that several optimisation processes which become needless if the match_continuous flag is passed to regex_search() (or regex_match() is called) are skipped in the pattern compilation. So, it is expected for pattern compilation to be finished a bit faster than usual.

error_type

The following error type values have been added:

namespace regex_constants
{
    static const error_type error_utf8; //  (Since SRELL 2.630)
        //  Invalid UTF-8 sequence was found in a regular expression passed to basic_regex.

    static const error_type error_property; //  (Since SRELL 3.010)
        //  Unknown or unsupported name or value was specified in \p{...} or \P{...}.

    static const error_type error_noescape; //  (Since SRELL 4.000; v mode only)
        //  ( ) [ ] { } / - \ | needs to be escaped by using \ in the character class.

    static const error_type error_operator; //  (Since SRELL 4.000; v mode only)
        //  Operation error in the character class. Reserved double punctuators are
        //  found, or different operations are used at the same level of [].

    static const error_type error_complement; //  (Since SRELL 4.000; v mode only)
        //  Complement of strings cannot be used. \P{POSName}, [^\p{POSName}],
        //  or [^\q{strings}] where POSName is a name of property-of-strings was found.

    static const error_type error_modifier; //  (Since SRELL 4.007)
        //  The expression contained the unbounded form of flag modifiers ((?ims-ims))
        //  at a position other than the beginning, or a specific flag modifier appeared
        //  more then once in one pair of brackets.
}

No throw/exception mode

To prevent SRELL from throwing an exception of the regex_error type, the following two ways are available:

Option 1: Pass the quiet option to the pattern compiler (available since version 4.066).
As whether to throw an exception or not is checked at the runtime, the mechanism for throwing exceptions itself exists within the executable file..
Option 2: Define the SRELL_NO_THROW macro prior to including srell.hpp (available since version 4.034).
The mechanism for throwing exceptions is disabled and not output to the resulting binary.

In either option, an error that should have been thrown during the previous pattern compiling can be known by calling basic_regex::ecode(), and an error that should have been thrown during the previous search or match can be known by calling match_results::ecode(). They return 0 if no error has been occurred.

Only exceptions of the regex_error type can be prevented by these options. std::bad_alloc is thrown if memory allocation is failed.

When an error occurs, the regular expression algorithms (regex_search(), regex_match()) return false.

regex_iterator and regex_iterator2 become the end-of-sequence iterator immediately if any error occurs during iterating. As these iterators point to an internal instance of match_results,it->ecode() can be used to check whether "having become the end-of-sequence iterator after iterating is complete" or "having become so by occurrence of any error".

Because regex_token_iterator points to sub_match, there is no way to access match_results::ecode(). To fix this problem, the member function ecode() has been added to regex_token_iterator since version 4.065.

regex_search()

3 iterators version

Since SRELL 2.600, overload function that takes three BidirectionalIterator as parameters has been added:

template <class BidirectionalIterator, class Allocator, class charT, class traits>
bool regex_search(
    BidirectionalIterator first,
    BidirectionalIterator last,
    BidirectionalIterator lookbehind_limit,
    match_results<BidirectionalIterator, Allocator> &m,
    const basic_regex<charT, traits> &e,
    const regex_constants::match_flag_type flags = regex_constants::match_default);

The third iterator, lookbehind_limit is used for specifying the limit until where regex_search() can read a sequence backwards when a lookbehind assertion is performed.

In other words, this three-iterators version starts searching at the postion first in the range [lookbehind_limit, last).

const char text[] = "0123456789abcdefghijklmnopqrstuvwxyz";
const char* const begin = text;
const char* const end = text + std::strlen(text);
const char* const first = text + 10;    //  Sets to the position of 'a'.
const srell::regex re("(?<=^\\d+).");
srell::cmatch match;

std::printf("matched %d\n", srell::regex_search(first, end, match, re));
    //  Does not match as lookbehind is performed only in the range [first, end).

std::printf("matched %d\n", srell::regex_search(first, end, begin, match, re));
    //  Matches because regex_search() is allowed to lookbehind until begin.
    //  I.e., in a three-iterators version, searching againist the sequence
    //  [begin, end), begins at first in the sequence.

As in the example shown above, in the three-iterators version, ^ matches begin (the third iterator) instead of first (the first iterator).

When the three-iterators version is called, the position() member of match_results returns a distance from the position passed to as the third iterator, while prefix().first of match_results is set to the position passed to as the first iterator.

Note

The one which does not take match_results as a parameter was removed in version 4.065.
By introducing this three-iterators overload, the way used in SRELL 2.300~2.500 has been removed.

basic_string with starting position

Since SRELL 4.065, the following overload function has been added for searching basic_string with specifying the starting position:

template <class ST, class SA, class Allocator, class charT, class traits>
bool regex_search(
    const std::basic_string<charT, ST, SA> &s,
    const std::size_t start,
    match_results<typename std::basic_string<charT, ST, SA>::const_iterator, Allocator> &m,
    const basic_regex<charT, traits> &e,
    const regex_constants::match_flag_type flags = regex_constants::match_default);

This behaves as the same as regex_search(s.begin() + start, s.end(), s.begin(), m, e, flags).

match_results

Overload functions for the named capture feature

In SRELL 2.000 and later, the following member functions have been added to the match_results class for the named capture feature:

difference_type length(const string_type &sub) const;
difference_type position(const string_type &sub) const;
string_type str(const string_type &sub) const;
const_reference operator[](const string_type &sub) const;

//  The following ones are available since SRELL 2.650 and later.
difference_type length(const char_type *sub) const;
difference_type position(const char_type *sub) const;
string_type str(const char_type *sub) const;
const_reference operator[](const char_type *sub) const;

Basically, these can be used in the same way as the member functions having the same names in regex. The only difference is that these take the group name string as a parameter, instead of the group number corresnponding to a pair of parentheses.

//  Example.
srell::regex e("-(?<digits>\\d+)-");
srell::cmatch m;

if (srell::regex_search("1234-5678-90ab-cdef", m, e))
{
    const std::string by_number(m.str(1));      //  access by paren's number. a feature of std::regex.
    const std::string by_name(m.str("digits")); //  access by paren's name. an extension of SRELL.

    std::printf("results: bynumber=%s byname=%s\n", by_number.c_str(), by_name.c_str());
}
//  results: bynumber=5678 byname=5678

Until version 4.033: When a group name that does not exist in the regular expression is passed, error_backref is thrown.
Version 4.034 and later: No error is thrown even when a group name that does not exist in the regular expression is passed. Instead, a reference to an instance of sub_match whose matched member variable is false is returned.

Symbol for format()

As an additional format symbol, $<NAME> has been added for the named capture feature support.

Special symbols for replacement
Symbol	String to be used replacement
`$$`	`$` itself.
`$&`	The entire matched substring.
$`	The substring that precedes the matched substring.
`$'`	The substring that follows the matched substring.
`$n` where `n` is one of 1 2 3 4 5 6 7 8 9 not followed by a digit.	The substring captured by the pair of `n`th round bracket (1-based index) in the regular expression. Replaced with an empty string if nothing is captured. Not replaced if the number `n` is greater than the number of capturing brackets in the regular expression.
`$nn` where `nn` is any value in the range 01 to 99 inclusive	The substring captured by the pair of `nn`th round bracket (1-based index) in the regular expression. Replaced with an empty string if nothing is captured. Not replaced if the number `nn` is greater than the number of capturing brackets in the regular expression.
`$<NAME>`	Addition for the named capture feature support If any named group does not exist in the regular expression, replacement does not happen. Otherwise, replaced with the substring captured by the pair of round brackets whose group name is `NAME`. If any capturing group whose name is `NAME` does not exist or nothing is capture by that group, replaced with an empty string.

ecode() const

Returns the error code that should have been thrown during the previous search or match. This member function is intended to be used in the no throw/exception mode supported since 4.034.
The returned value is an integer number of the error_type type, which is the same as the return type of regex_error::code().

If no error has occurred in the previous call to the regular expression algorithm, returns 0.

//  std::regex compatible error handling.
try {
    srell::regex re("a*");
    srell::smatch m;

    regex_search(text, m, re);
} catch (const srell::regex_error &e) {
    //  Error handling.
}

//  Error handling in no throw/exception mode.
srell::regex re("a*");
srell::smatch m;

if (!regex_search(text, m, re))
    if (m.ecode()) //  If not 0, error occurred.
        //  Error handling.

Since version 4.069, the corresponding error name can be got via srell::regex_error(m.ecode()).what().

basic_regex

Since SRELL 4.009, the following member functions have been added to the basic_regex class of SRELL as extensions.

match(): Executes matching like srell::regex_match().
search(): Executes searching like srell::regex_search().

Since SRELL 4.100, among the pattern compilers (three of the constructor, operator=()、assign()), overloads that took a parameter of the std::basic_string type have been modified to take a parameter of the contiguous_container_view type instead.

contiguous_container_view

contiguous_container_view is a view class similar to std::string_view. For a parameter of this type, any containers can be passed to as an argument if the following two requirements are met: 1) the container's elements are stored contiguously, 2) and the container has member functions data() that returns the address of the first element and size() that returns the number of elements.
basic_string, basic_string_view, std::vector (data() member exists since C++11), and std::array satisfy the requirements.
Note that this class is defined for the parameter within SRELL, not intended to be used directly outside SRELL.

match() const

Does matching like srell::regex_match(). Supposing an instance of the basic_regex type is re, re.match(...) is a shorthand for srell::regex_match(..., re, ...).

The following overload functions are provided. The order of parameters is like regex_match(): target string, instance of match_results, and flag option(s) of match_flag_type (optional).

template <typename BidirectionalIterator, typename Allocator> bool match( const BidirectionalIterator begin, const BidirectionalIterator end, match_results<BidirectionalIterator, Allocator> &m, const regex_constants::match_flag_type flags = regex_constants::match_default) const; // The same as srell::regex_match(begin, end, m, re, flags)	(1)
template <typename Allocator> bool match( const charT const str, match_results<const charT , Allocator> &m, const regex_constants::match_flag_type flags = regex_constants::match_default) const; // The same as srell::regex_match(str, m, re, flags)	(2)
template <typename ST, typename SA, typename MA> bool match( const std::basic_string<charT, ST, SA> &s, match_results<typename std::basic_string<charT, ST, SA>::const_iterator, MA> &m, const regex_constants::match_flag_type flags = regex_constants::match_default) const; // The same as srell::regex_match(s, m, re, flags)	(3)
// Since version 4.069. template <typename MA> bool match( const contiguous_container_view c, match_results<const charT *, MA> &m, const regex_constants::match_flag_type flags = regex_constants::match_default) const;	(4) * Since 4.069-

When using the overload (4) that takes contiguous_container_view as a parameter, any of the cmatch family among match_results needs to be used (smatch family is exclusive to iterators of std::basic_string).
When std::basic_string and smatch are passed to as arguments (3) is called, whereas passed to with cmatch (4) is called.

search() const

Does searching like srell::regex_search(). Supposing an instance of the basic_regex type is re, re.search(...) is a shorthand for srell::regex_search(..., re, ...).

The following overload functions are provided. The order of parameters is like regex_search(): target string, instance of match_results, and flag option(s) of match_flag_type (optional).

template <typename BidirectionalIterator, typename Allocator> bool search( const BidirectionalIterator begin, const BidirectionalIterator end, match_results<BidirectionalIterator, Allocator> &m, const regex_constants::match_flag_type flags = regex_constants::match_default) const; // The same as srell::regex_search(begin, end, m, re, flags)	(1)
template <typename Allocator> bool search( const charT const str, match_results<const charT , Allocator> &m, const regex_constants::match_flag_type flags = regex_constants::match_default) const; // The same as srell::regex_search(str, m, re, flags)	(2)
template <typename ST, typename SA, typename MA> bool search( const std::basic_string<charT, ST, SA> &s, match_results<typename std::basic_string<charT, ST, SA>::const_iterator, MA> &m, const regex_constants::match_flag_type flags = regex_constants::match_default) const; // The same as srell::regex_search(s, m, re, flags)	(3)
// The following member function is not part of std::regex.
template <typename BidirectionalIterator, typename Allocator> bool search( const BidirectionalIterator begin, const BidirectionalIterator end, const BidirectionalIterator lookbehind_limit, match_results<BidirectionalIterator, Allocator> &m, const regex_constants::match_flag_type flags = regex_constants::match_default) const; // The same as srell::regex_search(begin, end, lookbehind_limit, m, re, flags)	(4)
// Since version 4.065. template <typename ST, typename SA, typename MA> bool search( const std::basic_string<charT, ST, SA> &s, const std::size_t start, match_results<typename std::basic_string<charT, ST, SA>::const_iterator, MA> &m, const regex_constants::match_flag_type flags = regex_constants::match_default) const; // The same as srell::regex_search(s, start, m, re, flags)	(5) * Since 4.065-
// Since version 4.069. template <typename MA> bool search( const contiguous_container_view c, match_results<const charT *, MA> &m, const regex_constants::match_flag_type flags = regex_constants::match_default) const;	(6) * Since 4.069-
// Since version 4.069. template <typename MA> bool search( const contiguous_container_view c, const std::size_t start, match_results<const charT *, MA> &m, const regex_constants::match_flag_type flags = regex_constants::match_default) const;	(7) * Since 4.069-

When using overloads (6)(7) that take contiguous_container_view as a parameter, any of the cmatch family among match_results needs to be used (smatch family is exclusive to iterators of std::basic_string).
When std::basic_string and smatch are passed to as arguments (3) or (5) is called, whereas passed to with cmatch (6) or (7) is called.

* 2025/02/14 Note: These were accidentally removed in version 4.057. Only overloads that take match_results as a parameter have been restored since version 4.064, but the others that do not take have been dropped officially.

ecode() const

Returns the error code that should have been thrown during the previous pattern compiling. This member function is intended to be used in the no throw/exception mode supported since 4.034.
The returned value is an integer number of the error_type type, which is the same as the return type of regex_error::code().

If no error has occurred in the previous pattern compiling, returns 0.

//  std::regex compatible error handling.
try {
    srell::regex re("a{2,1}");
} catch (const srell::regex_error &e) {
    //  e.code() == srell::regex_constants::error_badbrace
}

//  Error handling in no throw/exception mode.

srell::regex re("a{2,1}");
//  re.ecode() == srell::regex_constants::error_badbrace

Since version 4.069, the corresponding error name can be got via srell::regex_error(re.ecode()).what().

regex_iterator2

Since 4.013, SRELL has regex_iterator2. It is a modificatoin of regex_iterator, to which the following changes have been applied:

Removal of the special handling when the iterator holds a zero-length match. By this change, results of replacement using this iterator becomes JavaScript compatible (example shown later).
Addition of assign() for re-use of the object.
Addition of helper functions for replacement and splitting.

template <typename BidirectionalIterator,
    typename BasicRegex = basic_regex<typename std::iterator_traits<BidirectionalIterator>::value_type,
        regex_traits<typename std::iterator_traits<BidirectionalIterator>::value_type> >,
    typename MatchResults = match_results<BidirectionalIterator> >
class regex_iterator2
{
    typedef typename std::iterator_traits<BidirectionalIterator>::value_type char_type;
    typedef BasicRegex regex_type;
    typedef MatchResults value_type;
    typedef std::ptrdiff_t difference_type;
    typedef const value_type *pointer;
    typedef const value_type &reference;
    typedef std::input_iterator_tag iterator_category;

    //  Member functions follow...

The second template parameter is a type of basic_regex, and the third one is a type of match_results. They have been simplified more than the ones of regex_iterator.
After regex_iterator, the following typedefs are provided:

typedef regex_iterator2<const char *> cregex_iterator2;
typedef regex_iterator2<const wchar_t *> wcregex_iterator2;
typedef regex_iterator2<std::string::const_iterator> sregex_iterator2;
typedef regex_iterator2<std::wstring::const_iterator> wsregex_iterator2;

//  For UTF-8 with char.
typedef regex_iterator2<const char *, u8cregex> u8ccregex_iterator2;
typedef regex_iterator2<std::string::const_iterator, u8cregex> u8csregex_iterator2;

//  Defined only when char16_t, char32_t are available.
typedef regex_iterator2<const char16_t *> u16cregex_iterator2;
typedef regex_iterator2<const char32_t *> u32cregex_iterator2;
typedef regex_iterator2<std::u16string::const_iterator> u16sregex_iterator2;
typedef regex_iterator2<std::u32string::const_iterator> u32sregex_iterator2;

//  Defined only when char8_t is available.
typedef regex_iterator2<const char8_t *> u8cregex_iterator2;
//  Defined only when std::u8string is available.
typedef regex_iterator2<std::u8string::const_iterator> u8sregex_iterator2;

//  Defined only when char8_t is NOT available.
typedef u8ccregex_iterator2 u8cregex_iterator2;
//  Defined only when std::u8string is NOT available.
typedef u8csregex_iterator2 u8sregex_iterator2;

//  Defined only when WCHAR_MAX >= 0x10FFFF.
typedef wcregex_iterator2 u32wcregex_iterator2;
typedef wsregex_iterator2 u32wsregex_iterator2;
typedef u32wcregex_iterator2 u1632wcregex_iterator2;
typedef u32wsregex_iterator2 u1632wsregex_iterator2;

//  Defined only when 0x10FFFF > WCHAR_MAX >= 0xFFFF.
typedef regex_iterator2<const wchar_t *, u16wregex> u16wcregex_iterator2;
typedef regex_iterator2<std::wstring::const_iterator, u16wregex> u16wsregex_iterator2;
typedef u16wcregex_iterator2 u1632wcregex_iterator2;
typedef u16wsregex_iterator2 u1632wsregex_iterator2;

Constructor

Like regex_iterator, there are 1) no parameter version that constructs an end-of-sequence iterator, and 2) ordinary one:

regex_iterator2() {} // Constructs an end-of-sequence iterator.	(1)
regex_iterator2( const BidirectionalIterator a, const BidirectionalIterator b, const regex_type &re, const regex_constants::match_flag_type m = regex_constants::match_default);	(2)

assign()

Recreates an iterator instance. The order of parameters is the same as the ordinary version of the constructor:

void assign( const BidirectionalIterator a, const BidirectionalIterator b, const regex_type &re, const regex_constants::match_flag_type m = regex_constants::match_default);

(1)

done() const

Returns true if the iterating is complete, otherwise false.

bool done() const;

Like regex_iterator, iterating can be performed by for-loop until comparing the iterator with an end-of-sequence iterator created with no arguments returns true. But by using of this done(), whether the iterator has already reached the end or not can be checked more simply.

srell::sregex_iterator2 eit;
srell::sregex_iterator2 it(text.begin(), text.end, re);

//  for (; it != eit; ++it) {   //  The same as below.
for (; !it.done(); ++it) {
    //  Does something.
}

replace()

If a range that has been passed to the constructor is a part of an object of std::basic_string, and the object has not been resized after that (the given area of memory has not been changed elsewhere), then the current matched range of the iterator ((*it)[0]) can be replaced with a new string by calling the replace() member function of the iterator.

regex_iterator2::replace() takes the entire string object of std::basic_string as a first parameter, and a replacement string as a second parameter:

// Replaces [(it)[0].first, (it)[0].second) in // [entire.begin(), entire.end()) with replacement or [begin, end).
template <typename ST, typename SA> void replace(std::basic_string<char_type, ST, SA> &entire, const std::basic_string<char_type, ST, SA> &replacement);	(1)
template <typename ST, typename SA> void replace(std::basic_string<char_type, ST, SA> &entire, BidirectionalIterator begin, BidirectionalIterator end);	(2)
template <typename ST, typename SA> void replace(std::basic_string<char_type, ST, SA> &entire, const char_type *const replacement);	(3)

If the size of entire is lengthen or shorten by replacement, position information inside the iterator is adjusted accordingly, and if the given area of memory is changed, all stashed internal iterators are recreated automatically.

Example of regex_iterator2::replace() and showing differences from regex_iterator and consistency:

#include <cstdio>
#include <string>
#include <regex>
#include "srell.hpp"

template <typename Iterator, typename Regex>
void replace(const Regex &re, const std::string &text, const char *const title) {
    std::string::const_iterator prevend = text.begin();
    Iterator it(text.begin(), text.end(), re), eit;
    std::string out;

    for (; it != eit; ++it) {
        out += it->prefix();
        out += ".";
        prevend = (*it)[0].second;
    }

    const std::string::const_iterator end = text.end();
    out.append(prevend, end);
    std::printf("[%s] by %s\n", out.c_str(), title);
}

int main() {
    std::string text("a1b");
    std::regex re1("\\d*?");
    srell::regex re2("\\d*?");

    replace<std::sregex_iterator>(re1, text, "std::sregex_iterator");
    replace<srell::sregex_iterator>(re2, text, "srell::sregex_iterator");
    replace<srell::sregex_iterator2>(re2, text, "srell::sregex_iterator2");

    srell::sregex_iterator2 it(text, re2);
    for (; !it.done(); ++it)
        it.replace(text, ".");  //  Use of replace().
    std::printf("[%s] by srell::sregex_iterator2::replace()\n", text.c_str());

    return 0;
}
---- output ----
[.a...b.] by std::sregex_iterator
[.a...b.] by srell::sregex_iterator
[.a.1.b.] by srell::sregex_iterator2
[.a.1.b.] by srell::sregex_iterator2::replace()

Through the special handling mentioned above, "1" was replaced in the first two examples using regex_iterator, whereas it remained unchanged in the last two examples of replacement being compatible with JavaScript.

Incidentally, it is unclear what this behaviour of std::regex_iterator depends on. It does not seem to match Perl's behaviour. But because boost::regex on which std::regex is based has been adopting Perl regular expressions as its default behaviour, perhaps, it followed the behaviour in some old version of Perl.

Helpers for splitting

Gatherinig the prefixes of matches that the iterator points to, and the suffix of the final match is equivalent to what split() does (Cf. the table below. it means an iterator):

Positions pointerd to by (*it)[0] and it->prefix()
Subject	Unmatch	First match	Unmatch	Second match	Unmatch
Iterator it	it->prefix() of 1st match	(*it)[0]	it->prefix() of 2nd match	(*it)[0]	it->suffix() of 2nd match

So, the following helper functions are provided for gathering blue portions easily:

bool split_ready();
//  Returns whether the current it->prefix() points to a range
//  that can be treated as a split subsequence.
//  The criterion is accordance with the method defined for split()
//  of ECMAScript (it->prefix().first != (*it)[0].second).
//  I.e., false can be returned only if the regex matches a zero-length string.

const typename value_type::value_type &remainder(bool only_after_match = false):
//  Returns a subsequence equivalent to "it->suffix() of 2nd match" in the table above.
//  The return type value_type::value_type is the sub_match type.
//  When an iterator it has never once matched anything, it->suffix() returns
//  an undefined value, whereas it.remainder() always returns a valid range.
//  When the argument is true and the previous match has succeeded,
//  returns [(*it)[0].second, endOfSequence); otherwise returns [it->prefix().first, endOfSequence).

Example of a simple splitting operation:

for (; !it.done(); ++it) {
    if (it.split_ready())
        list.push_back(it->prefix());
}
list.push_back(it.remainder());

Another example of split, which supports features like pushing also submatches when the regular expression contains capturing round brackets, and specifying the max number (LIMIT) of split chunks, as seen in split() in other languages:

for (std::size_t count = 0; !it.done(); ++it) {
    if (it.split_ready()) {
        if (++count == LIMIT)
            break;
        list.push_back(it->prefix());   //  *1
        for (std::size_t i = 1; i < it->size(); ++i) {
            if (++count == LIMIT) {
                list.push_back(it.remainder(true));
                //  true to exclude the range of prefix()
                //  that has already been pushed above (*1).
                return;
            }
            list.push_back((*it)[i]);
        }
    }
}
list.push_back(it.remainder());

Even using helper functions, now code is lengthy. Thus, more helper functions are provided. The code above can be written as follows:

std::size_t count = 0;
for (it.split_begin(); !it.done(); it.split_next()) {
    if (++count == LIMIT)
        break;
    list.push_back(it.split_range());
}
list.push_back(it.split_remainder());   //  Note: not remainder(), but split_remainder().

void split_begin();
//  Moves to a first subsequence for which split_ready() returns true.
//  This is intended to be called only once at the beginning of iterating.

bool split_next();
//  Moves to a next subsequence for which split_ready() returns true.
//  If such a subsequence is found, returns true; if done() == true then returns false.
//  This member function is intended to be used instead of the ordinary operator++().

const typename value_type::value_type &split_range() const;
//  Returns a current subsequence (the reference to an instance of the
//  sub_match type) to which the iterator points.

const typename value_type::value_type &split_remainder();
//  Returns the final subsequence (the reference to an instance of the sub_match type)
//  immediately following the last match range.
//  This is intended to be called after iterating is complete or broken off.
//  Unlike remainder() above, a boolean value corresponding to only_after_match is
//  automatically calculated.

//  Since version 4.049.
const typename value_type::value_type &split_aptrange();
//  When done() returns false, returns split_range().
//  Otherwise returns split_remainder().

For doing the same operation on every split substring easily, split_aptrange() has been added since SRELL 4.049.

for (it.split_begin();; it.split_next()) {
    list.push_back(it.split_aptrange());    //  The same as the following:
    //  list.push_back(!it.done() ? it.split_range() : it.split_remainder());

    if (it.done())
        break;
}

Others

There are also the following member functions:

regex_iterator2 &operator=(const regex_iterator2 &right);

bool operator==(const regex_iterator2 &right) const;
bool operator!=(const regex_iterator2 &right) const;

const value_type &operator*() const;
const value_type *operator->() const;

regex_iterator2 &operator++()
regex_iterator2 operator++(int)

Except the removal of the special handling of ++ mentioned above, these behave as the member functions of the same names in regex_iterator.

Note: APIs undocumented here, if any, are experimental. They may be changed or even removed without notice.

regex_token_iterator

ecode() const

Returns the error code that should have been thrown during the previous search. This member function is intended to be used in the no throw/exception mode supported since 4.034.
The returned value is an integer number of the error_type type, which is the same as the return type of regex_error::code(). If no error has occurred in the previous search, returns 0.

Measures against long time thinking

The regular expression engine of ECMAScript (and also Perl on which it is based) usually uses the backtracking algorithm in matching. The backtracking algorithm can require exponential time to search with a regular expression that includes 1) repeated expressions are nested in another repeated expression or 2) a character set that an expression matches and character sets that its adjacent expressions match are not mutually exclusive but overlapping. The following patterns are well-known examples:

"aaaaaaaaaaaaaaaaaaaaaaaaaaaaa" =~ /(a*)*b/
"aaaaaaaaaaaaaaaaaaaaaaaaaaaaa" =~ /a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?aaaaaaaaaaaaaaaaaaaaaaaaaaaaa/

Unfortunately, against this problem, fundamental measures that can be applied to any situation have not been found yet. So, to avoid holding control for long time, SRELL throws regex_error(regex_constants::error_complexity) when matching from a particular position fails repeatedly more than certain times.

The default value of the "certain times" is 2097152 (1 << 21. 128 to the third power. Until SRELL 4.054, 16777216 = 256 to the third power). But this value can be changed by setting an arbitrary value to the limit_counter member variable of an instance of regex_basic passed to the regular expression algorithms (regex_search() and regex_match()).

For security

SRELL's pattern compiler calls itself recursively in the following cases:

When parsing a group (capturing brackets, non-capturing brackets, lookahead, lookbehind),
When parsing [] nested in another [] (only in the v mode).

As usually the stack is used for calling a function, if there are nested groups and/or character classes in a sequence of regular expressions and their nesting levels are too deep, it can cause the program to terminate because of a stack overflow.
To avoid this, since SRELL 4.065, if the total depth of recursive calls exceeds SRELL_MAX_DEPTH, error_complexity is thrown.

The default value of SRELL_MAX_DEPTH is 256, but it can be changed by #define SRELL_MAX_DEPTH (arbitrary value) prior to including SRELL or adding -DSRELL_MAX_DEPTH=(arbitrary value) to your compiler options.

Differences between std::regex and SRELL

Regular expression engines and flags

<regex> has the following six regular expression engines: ECMAScript (default), basic, extended, awk, grep, egrep; whereas SRELL has an ECMAScript-compatible engine only.
The comparison between the ECMAScript mode of <regex> and SRELL is as follows:

<regex>'s ECMAScript mode consists of the expressions defined in the ECMAScript specificatoin third edition
- (MINUS) Unicode dependent matters (such as what \s matches)
+ (PLUS) locale dependent matters
+ (PLUS) [:class name:], [.class name.], [=class name=] expressions.
SRELL 2.000 and later: consists of the expressions defined in the ECMAScript 2018 or later specification.
SRELL 1.nnn: consists of the expressions defined in the ECMAScript 2017 (ES8) specification
+ (PLUS) fixed-length lookbehind assertions.

Although both are based on the same ECMAScript's regular expression specification, neither <regex> nor SRELL is a superset of each other.

The following flag options defined in <regex> are ignored in SRELL even if specified:

`syntax_option_type` (also `flag_type` of `basic_regex`)

basic, extended, awk, grep, egrep, nosubs (until 4.070), optimize, collate (i.e., all but icase and multiline)

Since version 4.080 the nosubs option has been supported. This flag has no effect on named-capturing groups. This behaviour is compatible with Perl, .NET, and PCRE2 which have a similar feature (The C++ specification does not mention named-capturing groups, because std::regex supports only unnamed-capturing groups).

srell::regex re("()", srell::regex::nosubs);
printf("subs: %u\n", re.mark_count());
//  subs: 0

re.assign("(?<abc>)()", srell::regex::nosubs);
printf("subs: %u\n", re.mark_count());
//  subs: 1

`match_flag_type`

match_any, format_sed

Simplification

The implementations of the following functions in SRELL have been simplified to avoid redundant overheads:

basic_regex::assign(): In <regex>, when an exception is thrown (when compiling a regular expression string fails) *this remains unchanged (cf. 11 in [re.regex.assign]), whereas *this is cleared in SRELL. This is because when SRELL begins to compile a new pattern, it does not keep the old contents anywhere.
If SRELL_STRICT_IMPL is defined, SRELL behaves as std::regex in this point.
[Until version 4.033] match_results::operator[](size_type n): While <regex> guarantees safety even when n >= match_results::size() (i.e., out-of-range access) (cf. 8 in [re.results.acc]), SRELL did not until version 4.033. Guaranteeing safety needs an additional dummy member of the sub_match type only for the purpose of preparing out-of-range access.

Tips

For better performance

1. When matching or searching with the same regular expression pattern is performed multiple times, it is recommended to construct a regular expression object as static const (of basic_regex) in order that the pattern compile is executed only once.

//  A function called multiple times in the program.
bool is_included(const std::string &text, const std::string &regex)
{
    static const srell::regex r1("a*b");    //  OK. Pattern compile is executed only at the first time.
//  srell::regex r2("a*b");     //  Compiled evertime.
    ...

2. When regex_search() or regex_match() is called multiple times in a loop, it is recommeneded to pass an object of match_results to the function for better performance even if you do not need the results.

std::vector<std::string> lines;
srell::regex re("https?://\\S+");   //  Matches something that looks like URL.
srell::smatch match;    //  typedef of match_results<std::string::const_iterator>.

//  Reads text into lines here.

for (std::size_t i = 0; i < lines.size(); ++i)
{
    //  Very slow because a disposable match_results object
    //  is prepared in regex_search() everytime.
//  if (srell::regex_search(lines[i], re))  //  *1

    if (srell::regex_search(lines[i], match, re))   //  *2
        ++count;
    ...

The reason of the better performance of *2 is because match_results contains a stack used when regex matching is performed. In *1 above, each time the function is called 1) a disposable match_results object is prepared, 2) memory for the stack in it is allocated, 3) and freed, while in the version of *2, once memory is allocated it will be reused in the subsequent calls. So, *2 version can be faster over twice than *1 when the number of repeats is a lot.

For smaller binary size

Some feature(s) that you do not need can be cut off by defining one or more macros in the following table before including srell.hpp. This will make the size of an output binary file smaller and compiling time (not of a regex pattern but of C++ source code) faster.

SRELL_NO_UNICODE_ICASE	Prevents Unicode case folding data used for `icase` (case-insensitive) matching from being output into the executable file. In this case, only the ASCII characters are case-folded when `icase` matching is performed ([A-Z] -> [a-z] only).
SRELL_NO_UNICODE_PROPERTY	Prevents Unicode property data from being output into the executable file. In this case, `\p{...}` and `\P{...}` become unavailable. Moreover, the name for a named capturing group is not parsed strictly, but any character except `'\'` and `'>'` is accepted as a letter that can be used for the group name. When this macro is defined, `SRELL_NO_VMODE` below is also defined implicitly.
SRELL_NO_UNICODE_DATA	Defines both `SRELL_NO_UNICODE_ICASE` and `SRELL_NO_UNICODE_PROPERTY`.
SRELL_NO_NAMEDCAPTURE	Cuts off the code for named capturing groups.
SRELL_NO_VMODE	Cuts off the code for v-mode (only SRELL 4.000-4.053).
SRELL_NO_UNICODE_POS	Prevents properties of strings data from being output into the executable file. SRELL 4.000-4.053: SRELL_NO_VMODE above needs to be defined together. SRELL 4.054 and later: Solely available.

Miscellaneous information

SIMD

Since version 4.061, SRELL uses SIMD instructions if SSE 4.2 support is detected at run-time. This feature is available if SRELL is compiled by any of the following compilers with targeting x86/x64:

Microsoft Visual C++ 2008 and later
GCC 4.9 and later
Clang 3.8 and later

As SSE 4.2 has been supported by GCC since version 4.3 and LLVM/Clang since version 2.6, even with versions older than the ones above, the SIMD acceleration is turned on if compilation is done with -msse4.2.
But an executable file generated explicitly with the -msse4.2 option may not run correctly without SSE 4.2 support, because the compiler thinks that all SIMD instructions up to SSE 4.2 can be used on the entire code of SRELL.

Use of SIMD can be disabled by defining #define SRELL_NO_SIMD or specifying -DSRELL_NO_SIMD as a compiler option at compile-time.

* SSE4.2 is supported by Intel Core i3/i5/i7 since Nehalem processors released in 2008 and Pentium/Celeron since Sandy Bridge processors released in 2011, and by AMD processors since the FX series released in 2011.

Invalid UTF-8 sequence

Against invalid UTF-8 strings, SRELL does the following things:

Checks if trailing bytes of a 2-4 byte character are really in the range 80..BF. If any of 00..7F and C0..FF appears as a trailing byte, at pattern compile time error_utf8 is thrown, at matching time it leads to matching failure at that point. [1]
If a value >= 0x110000 is decoded from a four byte character ditto. [1]
If a non-shortest UTF-8 form of BMP characters is found ditto (for example, \u0030 matches only the shortest form 0x30, does not match longer forms, 0xc0 0xb0, 0xe0 0x80 0xb0, or 0xf0 0x80 0x80 0xb0). [2]

Since version 2.630. In versions 2.200-2.620, the part in question was replaced with U+FFFD and compilation or matching continued.
Since version 4.048.

SRELL with char, wchar_t

Among typedefs of basic_regex, types that do not have any Unicode prefix (u8-, u8c-, u16-, u16w-, u1632w-, u32-, u32w-) treat an input string as a sequence of Unicode values.

For example, when CHAR_BIT is 8, srell::regex (typedef of srell::basic_regex<char>) interprets 0x00-0xFF in an input string as U+0000-U+00FF, respectively. Because U+0000-U+00FF in Unicode are compatible with ISO-8859-1, as a result, it can be assumed that srell::regex supports ISO-8859-1.

srell::regex can be used to find a specific pattern of bytes in a binary data.

This applies also to srell::wregex (typedef of srell::basic_regex<wchar_t>). It interprets an input as a sequence of Unicode values in the range 0x00-WCHAR_MAX.

The suitable type to use with the W functions of WinAPI is srell::u16wregex or srell::u1632wregex which supports UTF-16, not srell::wregex that virtually supports UCS-2.

C++11 and later features

For compilers that do not define feature test macros appropriately, the following macros were available in SRELL up to version 4.056:

SRELL_CPP11_CHAR1632_ENABLED	For a compiler that does not define `__cpp_unicode_characters` despite supporting `char16_t`, `char32_t`. When this macro was defined, SRELL did `typedef` `u16regex`, `u32regex` etc.
SRELL_CPP11_INITIALIZER_LIST_ENABLED	For a compiler that does not define `__cpp_initializer_lists` despite supporting initializer lists.
SRELL_CPP11_MOVE_ENABLED	For a compiler that does not define `__cpp_rvalue_references` despite supporting `std::move`.
SRELL_CPP20_CHAR8_ENABLED	For a compiler that does not `__cpp_char8_t` or `__cpp_lib_char8_t` despite supporting `char8_t`. When this macro's value is 1, SRELL assumed that `char8_t` was available. When the value is 2, SRELL assumed that `std::u8string` was also available.

If you still needs any of these SRELL_CPP* macros, please define the corresponding __cpp_* macro directly.

C++98/03 and UTF-8, UTF-16, UTF-32

In compilers prior to C++11, only "u8c-" types and "u16w-" types are available if wchar_t is a type being equal to or more than 16-bit and less than 21-bit, and only "u8c-" types and "u32w-" types are available if wchar_t is a type being equal to or more than 21-bit.
However, even in such environments, "u8c-", "u16-" and "u32-" types are available if such code as below is put before including SRELL:

typedef unsigned short char16_t;    //  Do typedef for a type that can have a 16-bit value.
typedef unsigned long char32_t;    //  Do typedef for a type that can have a 32-bit value.

namespace std
{
    typedef basic_string<char16_t> u16string;
    typedef basic_string<char32_t> u32string;
}

#define __cpp_unicode_characters    //  Make them available manually.
// #define SRELL_CPP11_CHAR1632_ENABLED    //  Up to version 4.056.

Incidentally, handling UTF-8 or UTF-16 is performed by u8regex_traits or u16regex_traits passed to basic_regex as a template argument. By using these classes, for example, it is possible to make a class to handle UTF-16 strings with uint32_t type array, such as basic_regex<uint32_t, u16regex_traits<uint32_t> >.

Possible future changes

Bit widths of int/size_t

Applied in version 4.100.

The minimum bit width guaranteed by the C++ speicifcation for the int type is 16. I have been intending to keep this in mind when coding SRELL, but it is unclear whether SRELL can really be compiled on such a system as the int type is exactly 16-bits and runs properly on it.

There are several points in the code that can be simplified if the minimum width of int and size_t can be assumed to be 32-bits, so at some point in the near future, I may make changes so that compiling SRELL on a system where int/size_t are less than 32-bits leads to a compilation error.

Character set not compatible with ASCII, like EBCDIC

Because use of string literals are avoided in the source code, theoretically, SRELL should work even in environments whose character set is not compatible with ASCII, such as EBCDIC, if the source code of SRELL is converted into EBCDIC. However, SRELL always interprets input strings as UTF-8/16/32, so, for example, srell::regex re("a+"); in EBCDIC-encoded source code will not work as expected. Since 'a' in EBCDIC is 0x81 and '+' is 0x4e, the regular expression above appears to SRELL as "\x81N". In such environments, SRELL will work as expected only if both the regular expression string and the subject string are loaded from external files.

If consideration for environments whose character set is not ASCII compatible is unnecessary, the "\xHH" escaping in srell_updata3.hpp becomes needless, which brings the advantage of making file size more compact. If the file size of SRELL becomes a concern in the future (especially if single-header/srell.hpp exceeds 1MB), I may decide to stop using the \xHH escaping and not to support compilers that run on a character set not being ASCII-compatible.

Change of the iterator category of the subject from BidirectionalIterator to RandomAccessIterator

regex_search() and regex_match of std::regex take as parameters iterator pairs being categorised as bidirectional iterator or ones higher than it in the iterator hierarchy. SRELL has followed this specification since early versions, but to do so, SRELL has incorporated "Boyer Moore Hoorpool code that is only used for bidirectional iterators (not used for random access iterators and contiguous iterators)". As there does not seem to be much demand for searching/matching using bidirectional iterators, at some point, bidirectional iterator support might be removed in searching and matching.

Breaking changes

This section has been moved to its own separated page.

Links

Releases

External Links

RegExp of ECMAScript (JavaScript)

Proposals (Updated: 10 Oct 2025)

RegExp Buffer Boundaries (\A, \z~~, \Z~~) (2/4)
\Z was excluded from the proposal at the 2021-12 meeting.
RegExp \R Escape (1/4)
RegExp Extended Mode and Comments (1/4)
RegExp Atomic Operators (1/4)
Addition of (?>...), ATOM*+, ATOM++, ATOM?+, ATOM{n}+, ATOM{n,}+, and ATOM{n,m}+.

Finished Proposals (Updated: 10 Oct 2025)

RegExp Modifiers (ES 2025): Implemented in SRELL 4.045. Enabled by default since 4.058.
Note: The unbounded form ((?imsx-imsx)) was excluded from the proposal at the 2021-12 meeting.
Duplicate named capturing groups (ES 2025): SRELL 4.043-
RegExp v flag with set notation + properties of strings (ES 2024)：SRELL 4.000-
RegExp Unicode Property Escapes (ES 2018)：SRELL 2.000-
RegExp Lookbehind Assertions (ES 2018)：SRELL 2.000-
RegExp named capture groups (ES 2018)：SRELL 2.000-
s (dotAll) flag for regular expressions (ES 2018)：SRELL 2.000-

* Proposals that are not related to expansion of regular expressions, such as addition of API, are out of scope.

SRELL

Contents

Features

Download

Known Issue

How to use

Syntax

Footnotes

Extensions to std::regex

3 iterators version

Note

basic_string with starting position

Overload functions for the named capture feature

Symbol for format()

Addition for the named capture feature support

ecode() const

contiguous_container_view

match() const

search() const

ecode() const

Constructor

assign()

done() const

replace()

Helpers for splitting

Others

ecode() const

Differences between std::regex and SRELL

syntax_option_type (also flag_type of basic_regex)

match_flag_type

Tips

Miscellaneous information

Possible future changes

Breaking changes

Links

External Links

RegExp of ECMAScript (JavaScript)

Proposals (Updated: 10 Oct 2025)

Finished Proposals (Updated: 10 Oct 2025)

`syntax_option_type` (also `flag_type` of `basic_regex`)

`match_flag_type`