C12Adapter Opensource C++ Interface
MRegexp Class Reference

POSIX-like regular expression handler. More...

Inheritance diagram for MRegexp:

Public Types

enum  { NUMBER_OF_SUBEXPRESSIONS = 10 }
 

Public Member Functions

 MRegexp ()
 Default constructor.
 
 MRegexp (const MStdString &exp, bool caseInsensitive=false)
 Constructor of the regular expression that takes an expression as standard string. More...
 
 MRegexp (MConstChars exp, bool caseInsensitive=false)
 Constructor of the regular expression that takes an expression as a pointer to a zero terminated string. More...
 
 MRegexp (const MRegexp &r)
 Copy constructor. More...
 
virtual ~MRegexp ()
 Object destructor.
 
bool IsCompiled () const
 Check whether a valid regular expression was supplied.
 
int GetCount () const
 Return the number of items found after a successful Match.
 
const MStdStringGetPattern () const
 Get the pattern, as it was set at compile method.
 
MRegexpoperator= (const MRegexp &r)
 Assignment operator. More...
 
void Compile (const MStdString &exp, bool caseInsensitive=false)
 Compile the regular expression given as standard string. More...
 
void Clear ()
 Clear the regular expression, possibly reclaim memory.
 
bool Match (const MStdString &)
 Examine the character string with this regular expression, returning true if there is a match. More...
 
MStdString Item (int i) const
 Return the I-th matched item after a successful Match. More...
 
MStdString operator[] (int i) const
 Return the I-th matched item after a successful Match. More...
 
int GetItemStart (int i) const
 Return the starting offset of the I-th matched item from the beginning of the character array used in Match. More...
 
int GetItemLength (int i) const
 Return the length of the I-th matched item as used in Match. More...
 
MStdString GetReplaceString (const MStdString &source) const
 Get the string for replacement, use source as standard string. More...
 
void CheckIsCompiled () const
 Check if the regular expression is compiled, throw error if not. More...
 
- Public Member Functions inherited from MObject
virtual ~MObject ()
 Object destructor.
 
virtual const MClassGetClass () const =0
 Get the final class of the object. More...
 
virtual unsigned GetEmbeddedSizeof () const
 For embedded object types, return the size of the class. More...
 
bool IsEmbeddedObject () const
 Tell if the object is of embedded kind. More...
 
SHOW_INTERNAL MVariant Call (const MStdString &name, const MVariant &params)
 Call the object service with parameters, given as variant. More...
 
MVariant Call0 (const MStdString &name)
 Call the object service with no parameters. More...
 
MVariant Call1 (const MStdString &name, const MVariant &p1)
 Call the object service with one parameter. More...
 
MVariant Call2 (const MStdString &name, const MVariant &p1, const MVariant &p2)
 Call the object service with two parameter. More...
 
MVariant Call3 (const MStdString &name, const MVariant &p1, const MVariant &p2, const MVariant &p3)
 Call the object service with three parameter. More...
 
MVariant Call4 (const MStdString &name, const MVariant &p1, const MVariant &p2, const MVariant &p3, const MVariant &p4)
 Call the object service with four parameter. More...
 
MVariant Call5 (const MStdString &name, const MVariant &p1, const MVariant &p2, const MVariant &p3, const MVariant &p4, const MVariant &p5)
 Call the object service with five parameter. More...
 
MVariant Call6 (const MStdString &name, const MVariant &p1, const MVariant &p2, const MVariant &p3, const MVariant &p4, const MVariant &p5, const MVariant &p6)
 Call the object service with six parameter. More...
 
virtual MVariant CallV (const MStdString &name, const MVariant::VariantVector &params)
 Call the object service with parameters, given as variant vector. More...
 
virtual bool IsPropertyPresent (const MStdString &name) const
 Tell if the property with the given name exists.
 
virtual bool IsServicePresent (const MStdString &name) const
 Tell if the service with the given name exists.
 
virtual MVariant GetProperty (const MStdString &name) const
 Get the property value using name of the property. More...
 
virtual void SetProperty (const MStdString &name, const MVariant &value)
 Set the property using name of the property, and value. More...
 
virtual MStdStringVector GetAllPropertyNames () const
 Return the list of publicly available properties, persistent or not. More...
 
virtual MStdStringVector GetAllPersistentPropertyNames () const
 Return the list of persistent properties. More...
 
virtual void SetPersistentPropertiesToDefault ()
 Set the persistent properties of the object to their default values. More...
 
virtual MVariant GetPersistentPropertyDefaultValue (const MStdString &name) const
 Get the default value of persistent property with the name given. More...
 
virtual void SetPersistentPropertyToDefault (const MStdString &name)
 Set the persistent property with the name given to default value. More...
 
virtual const char * GetType () const
 Get the name of the type for the object (could be the same as class name).
 
virtual void SetType (const MStdString &)
 Intentionally, it will set the name of the type for the object, but the service will not allow setting the name to anything other than the current name. More...
 
virtual void Validate ()
 Validate internal structures of the object. More...
 

Static Public Member Functions

static bool StaticMatch (MConstChars regexp, const MStdString &str, bool caseInsensitive=false)
 Do a match using the given regular expression and string without creating MRegexp object. More...
 
- Static Public Member Functions inherited from MObject
static const MClassGetStaticClass ()
 Get the declared class of this particular object. More...
 
static bool IsClassPresent (const MStdString &name)
 Tells if the given class name is available. More...
 

Additional Inherited Members

- Static Public Attributes inherited from MObject
static const MClass s_class
 Class of MObject.
 
- Protected Member Functions inherited from MObject
 MObject ()
 Object constructor, protected as the class is abstract.
 
void DoSetPersistentPropertiesToDefault (const MClass *staticClass)
 Set the persistent properties to their default values for one object provided the class for that object. More...
 

Detailed Description

POSIX-like regular expression handler.

A class could be given a regular expression and from that, return specific substrings (items) from its input. Regular expressions may not be the fastest way to parse input (though with careful anchoring they can be made so that they fail quickly if they are going to) but once you have a working library they do allow for fairly rapid coding. On the whole this is good enough, worry about making it faster once you have it working and actually know that your optimization effort isn't going unnoticed. For example:

MRegexp re("^[\t ]*(.*)[\t ]*\\((.*)\\)");
MStdString str("example.com!david (David)\n");
MStdString name, addr;
if ( re.Match(str) && re.GetCount() == 2 )
{
name = re[2];
addr = re[1];
}

Will give:

name == "David" and addr == "example.com!david"

If you decompose the regular expression you get:

  • "^" Beginning of line anchor.
  • "[\t ]*" Any amount (that is zero or more characters) of tabs or spaces.
  • "(.*)" Field 1: A tagged expression matching any string of characters This will be the longest string that will still allow the rest of the pattern to match.
  • "[\t ]*" Any amount of tabs or spaces.
  • "\\(" An escaped open parenthesis. The double slash is a C/C++ convention since this is the escape character and we want a literal slash to be passed through to the regular expression code. If the user were typing this sort of thing into your regular expression they would only enter one slash. We escape the parenthesis so that it doesn't get interpreted as a regular expression special character.
  • "(.*)" Field 2: A tagged expression matching any string of characters.
  • "\\)" An escaped closing parenthesis.

Note: The phrase tagged regular expression refers to any part of the regular expression that is, because it was surrounded by parenthesis, accessible after a match has been made as a separate item.

In English, we are looking for two fields. The first will be all characters from the start of the line through to the second field (without any surrounding white space), and the second will be all characters within parenthesis following the first field.

Regular Expression Syntax

A regular expression is zero or more branches, separated by '|'. It matches anything that matches one of the branches.

A branch is zero or more pieces, concatenated. It matches a match for the first, followed by a match for the second, etc.

A piece is an atom possibly followed by '*', '+', or '?'. An atom followed by '*' matches a sequence of 0 or more matches of the atom. An atom followed by '+' matches a sequence of 1 or more matches of the atom. An atom followed by '?' matches a match of the atom, or the empty string. An atom is a regular expression in parentheses (matching a match for the regular expression), a range (see below), '.' (matching any single character), '^' (matching the empty string at the beginning of the input string), '$' (matching the empty string at the end of the input string), a '\' followed by a single character (matching that character), or a single character with no other significance (matching that character).

A range is a sequence of characters enclosed in '[]'. It normally matches any single character from the sequence. If the sequence begins with '^', it matches any single character not from the rest of the sequence. If two characters in the sequence are separated by '-', this is shorthand for the full list of ASCII characters between them (e.g. '[0-9]' matches any decimal digit). To include a literal ']' in the sequence, make it the first character (following a possible '^'). To include a literal '-', make it the first or last character.

Ambiguity

If a regular expression could match two different parts of the input string, it will match the one which begins earliest. If both begin in the same place but match different lengths, or match the same length in different ways, life gets messier, as follows. In general, the possibilities in a list of branches are considered in left-to-right order, the possibilities for '*', '+', and '?' are considered longest-first, nested constructs are considered from the outermost in, and concatenated constructs are considered leftmost-first. The match that will be chosen is the one that uses the earliest possibility in the first choice that has to be made. If there is more than one choice, the next will be made in the same manner (earliest possibility) subject to the decision on the first choice. And so forth.

For example, '(ab|a)b*c' could match 'abc' in one of two ways. The first choice is between 'ab' and 'a'; since 'ab' is earlier, and does lead to a successful overall match, it is chosen. Since the 'b' is already spoken for, the 'b*' must match its last possibility–the empty string–since it must respect the earlier choice.

In the particular case where the regular expression does not use `|' and does not apply `*', `+', or `?' to parenthesized subexpressions, the net effect is that the longest possible match will be chosen. So `ab*', presented with `xabbbby', will match `abbbb'. Note that if `ab*' is tried against `xabyabbbz', it will match `ab' just after `x', due to the begins-earliest rule. (In effect, the decision on where to start the match is the first choice to be made, hence subsequent choices must respect it even if this leads them to less-preferred alternatives.)

Member Enumeration Documentation

anonymous enum
Enumerator
NUMBER_OF_SUBEXPRESSIONS 

How many subexpressions that the library will support, attempting to use a regular expression with more than this number will generate an error.

Constructor & Destructor Documentation

MRegexp::MRegexp ( const MStdString exp,
bool  caseInsensitive = false 
)

Constructor of the regular expression that takes an expression as standard string.

Parameters
expRegular expression
caseInsensitiveWhen true, the match shall be case insensitive, false by default.
Precondition
The expression has to correspond to the valid syntax definition as presented in the header of the file. Otherwise MERegexp is thrown with the type and string that corresponds to the error.
MRegexp::MRegexp ( MConstChars  exp,
bool  caseInsensitive = false 
)

Constructor of the regular expression that takes an expression as a pointer to a zero terminated string.

Parameters
expRegular expression
caseInsensitiveWhen true, the match shall be case insensitive, false by default.
Precondition
The expression should not be NULL, otherwise the behavior is undefined (the debug version has an assertion operator). The expression has to correspond to the valid syntax definition as presented in the header of the file. Otherwise MERegexp is thrown with the type and string that corresponds to the error.
MRegexp::MRegexp ( const MRegexp r)

Copy constructor.

Precondition
If the object given had a compilation error, the new object has it too.

Member Function Documentation

void MRegexp::CheckIsCompiled ( ) const

Check if the regular expression is compiled, throw error if not.

Precondition
The regular expression needs to be compiled. Otherwise an error is thrown.
void MRegexp::Compile ( const MStdString exp,
bool  caseInsensitive = false 
)

Compile the regular expression given as standard string.

The format of the regular expression is defined in the class header.

Parameters
expRegular expression
caseInsensitiveWhen true, the match shall be case insensitive, false by default.
Precondition
The expression has to correspond to the valid syntax definition as presented in the header of the file. Otherwise the exception MERegexp is thrown.
int MRegexp::GetItemLength ( int  i) const

Return the length of the I-th matched item as used in Match.

Along with the GetItemStart, this service can be used as follows:

MRegexp re("^[\t ]*(.*)[\t ]*\\((.*)\\)");
MStdString str( "example.com!david (David)\n" );
assert(re.Match(str));
assert(re.GetCount() == 2);
assert(re.GetItemStart(0) == 0);
assert(re.GetItemLength(0) == 26);
assert(re.GetItemStart(1) == 0);
assert(re.GetItemLength(1) == 19);
assert(re.GetItemStart(2) == 20);
assert(re.GetItemLength(2) == 5);
Precondition
The index has to be within zero and GetCount minus one. Otherwise the "" string is returned, and the object is put into an erroneous state, the error string is available.
int MRegexp::GetItemStart ( int  i) const

Return the starting offset of the I-th matched item from the beginning of the character array used in Match.

Precondition
The index has to be within zero and GetCount minus one. Otherwise the "" string is returned, and the object is put into an erroneous state, the error string is available.
MStdString MRegexp::GetReplaceString ( const MStdString source) const

Get the string for replacement, use source as standard string.

After a successful Match one can retrieve a replacement string as an alternative to building up the various items by hand.

Each character in the source string will be copied to the return value except for the following special characters:

& The complete matched string (item 0).
\1 Item 1
... and so on until...
\9 Item 9

So:

MStdString repl = re.GetReplacementString("\2 == \1");

Will give: "David == example.com!david"

Precondition
The items given as parameters should be within range zero and GetCount minus one. Otherwise the "" string is returned, and the object is put into an erroneous state.
MStdString MRegexp::Item ( int  i) const

Return the I-th matched item after a successful Match.

As in the classic regexp, the zeroth element is the whole string, and the last allowed index is equal to GetCount. Look at operator[] for convenience.

Precondition
The index has to be within zero and GetCount. There is a check.
bool MRegexp::Match ( const MStdString )

Examine the character string with this regular expression, returning true if there is a match.

This match updates the state of this MRegexp object so that the items of the match can be obtained. The 0th item is the item of string that matched the whole regular expression. The others are those items that matched parenthesized expressions within the regular expression, with parenthesized expressions numbered in left-to-right order of their opening parentheses. If a parenthesized expression does not participate in the match at all, its length is 0.

Precondition
MRegexp has been successfully initialized. Otherwise the match will return false in any case.
MRegexp& MRegexp::operator= ( const MRegexp r)
inline

Assignment operator.

Precondition
If the object given had a compilation error, the new object has it too.
MStdString MRegexp::operator[] ( int  i) const
inline

Return the I-th matched item after a successful Match.

As in the classic regexp, the zeroth element is the whole string, and the last allowed index is equal to GetCount. This is a more convenient C++ way of accessing an item.

Precondition
The index has to be within zero and GetCount. There is a check.
static bool MRegexp::StaticMatch ( MConstChars  regexp,
const MStdString str,
bool  caseInsensitive = false 
)
static

Do a match using the given regular expression and string without creating MRegexp object.

Parameters
regexpRegular expression to match
strString in which the regular expression shall be matched.
caseInsensitiveWhen true, the match shall be case insensitive, false by default.
See also
Match - non-static version of this call