Contains methods for examining, searching and replacing, and manipulating strings.
Encoding and String Indexes
Strings in FlexSim are encoded in UTF-8 format.
This means that for simple ASCII strings, each
character takes up a single byte (8-bits). However, non-ASCII characters may consist of
up to 4 bytes. For example, the greek Σ character requires two bytes to encode
in UTF-8.
Since any character in a string may be made up of multiple bytes, it is non-trivial to access
characters in the string by their character index. As such, FlexSim's string interface uses byte-based indexing
for accessing the characters inside the string. In other words, all string methods that take or
return indexes into the string (charAt(),
substr(), slice(),
indexOf(), lastIndexOf(),
search(), slice(),
length, array [] operator),
take and return byte indexes, not character indexes. Also, in FlexSim, all indexing
is 1-based, such that the first byte in a string is accessed by index 1 (not index 0 like in many other
programming languages).
Take the string "Σ=σ" for example. While the = character is technically the second character and thus would
be at "character index" 2, since the Σ character takes up two bytes, the = character is accessed with "byte index" 3.
string("Σ=σ").charAt(3) // returns "="
Note also that, given this rule, some command calls may give back the same results for different byte indexes
because the same character takes up several bytes.
string("Σ=σ").charAt(1) // returns "Σ"
string("Σ=σ").charAt(2) // returns same "Σ"
Usually this will not affect the way you manipulate strings, especially since there are many methods that
do not involve accessing characters by their index, such as using the
replace() and/or match() methods.
If you do need to manually access characters in a string, there are several methods of ensuring that your
code will work for all types of characters.
The easiest way to manipulate individual characters without worrying about encoding is to use the
split() and join() methods. The
split() method will split the string up into an Array of individual characters. Once
it is split, you can traverse the array using character, not byte, indexes. After you are finished you can join the
array back into a string.
string str = "Σ=σ";
Array asArray = str.split();
for (int i = 1; i <= asArray.length; i++) {
if (asArray[i] == "=")
asArray[i] = "≠";
}
str = asArray.join(); // "Σ≠σ"
If you want to search the string by character instead of by byte, but don't want to split it into an array,
you can loop through the string and,
using the charAt() method, increment the looping index based
on the byte-length of each character.
string str = "Σ=σ";
string curChar = "";
for (int i = 1; i <= str.length; i += curChar.length) {
curChar = str.charAt(i);
if (curChar == "=") {
str = str.substring(1, i) + "≠" + str.substring(i + curChar.length);
curChar = "≠";
}
}
If you are searching the string looking for/replacing only ASCII character values, searching the string using the [ ]
array operator can be used.
string str = "Σ=σ";
for (int i = 1; i <= str.length; i++) {
if (str[i] == '=') // comparing a byte to an ASCII character works fine
str[i] = '#'; // setting a byte to an ASCII character works fine,
// as long as the existing byte was already an ASCII character.
}
Finally, the string provides the byteToCharIndex() and charToByteIndex()
methods to convert between character and byte indexes. Note that these methods do require a search of the string, so they will be slow for long strings.
Returns the character index associated with a specific byte index in the string.
string text = "Σ=σ";
int charIndex = text.byteToCharIndex(text.search("=")); // 2
int numChars = text.byteToCharIndex(text.length); // 3 - the number of characters in the string
Do no remove, this fixes the anchor on doc.flexsim.com
Pads the end of the string with the padString to the given length.
Pads the end of the string with the padString to the given targetLength. If the targetLength
is less than the string's length, the string will be returned unpadded.
Pads the start of the string with the padString to the given length.
Pads the start of the string with the padString to the given targetLength. If the targetLength
is less than the string's length, the string will be returned unpadded.
A copy of the string with a find pattern replaced with a new string.
Description
Replaces a series of characters with another.
Regular Expressions
A regular expression is a concise way of specifying a pattern of characters to look for. You start and end
a regular expression with a forward slash "/" (similar to how you start and end a string with quotes) and then add modifiers:
/pattern/modifiers
Here is a brief explanation of some regular expression syntax.
Modifiers
These come after the closing / of the regular expression and modify their behavior
g - Global Match, matches all occurences, and not just the first one.
i - Case-insensitive match.
Brackets
[] - Matches any character in the brackets. Can use a dash to specify a range of characters.
[abc] - matches an a, b or c.[a-z] - matches any lowercase letter.[0-9] - matches any numerical digit.[^abc] - matches any character that is not an a, b or c.
Or
| - Matches any of the alternatives separated by a |.
(abc|cba)- matches the sequence "abc" or "cba".(gray|grey) or gr(a|e)y - matches "gray" and "grey".
Quantifiers
Define the quantity of characters the preceding expression will match
* - Matches any string that has zero or more of the specified characters.
ab*c - matches "ac", "abc", "abbc", "abbbc", etc. Basically any sequence of an a, any number of b's, and then a c.
+ - Matches any string that has at least one of the specified characters.
ab+c - almost the same as ab*c except that "ac" no longer matches.
? - Matches any string that has at zero or one of the specified characters.
colou?r - matches both "color" and "colour".
Repetitions
{n} - Matches n number of occurences.
{m,n} - Matches at least m, up to n number of occurences.
Periods
. - Matches any character.
\. - Matches a period.
Examples
This code removes all numbers. We are using [0-9] to find characters in the numeric range and then we put a plus
after it to find any sequence of numbers no matter how long. Finally we add the g modifier to match all occurences.
Replace any instances of blue or green regardless of case with red.
string text = "Blue cars are really green.";
return text.replace(/blue|green/gi, "red"); // "red cars are really red."
Replace all 3 letter words starting with b and ending with t.
string text = "My favorite words are: bit bat but bot bet.";
return text.replace(/b[aeiou]t/g, "Money"); // My favorite words are: Money Money Money Money Money.
Hide any email addresses. Note that "\." is how you specify you want to see a period and not just any character.
string text = "Email me at [email protected] or [email protected]";
return text.replace(/([a-z0-9_\.-]+)@([0-9a-z\.-]+)\.([a-z\.]{2,6})/g, "###"); //Email me at ### or ###
FlexScript's string regular expression implementation uses c++'s regex
library for its functionality, using the ECMAScript grammar.
Our design was also guided by JavaScript's
regular expression implementation.
For more detailed information on building regular expressions, refer to that documentation. Note that we do not (yet) implement
JavaScript's /m, /y, or /u regular expression flags.
Do no remove, this fixes the anchor on doc.flexsim.com
The 1-based byte index of the first character to extract. If this index
is in the middle of a multi-byte character, it will extract the full character. If it is a negative number,
the start position is string.length + beginIndex, or in other words it is an offset from the end of the
string.
endIndex
The 1-based byte index of the end character. The extraction will go up-to-but-not-including
the character at this index. If the index
is in the middle of a multi-byte character, the extraction will include that full character.
If this parameter is left out, the method will extract to the end of the string. If the parameter
is negative, then the end index will be string.length + endIndex, or in other words it is
an offset from the end of the string.
The string that marks the separators for where the string should be split.
If this parameter is excluded or is an empty string, every character will be separated into its own array element.
limit
The maximum number of array elements to return. If excluded, it will split the whole string.
The 1-based byte index of the first character to extract. If this index
is in the middle of a multi-byte character, it will extract the full character. If it is a negative number,
the start position is string.length + beginIndex, or in other words it is an offset from the end of the
string.
The [ ] operator accesses individual elements of the string as byte values. For
an ASCII string, these can be compared to ASCII character values using single-quote
literals such as 'A', 'B', 'C', etc. All ASCII character values are in the integer range [1, 127].
For multi-byte characters, accessing an individual byte will return a value that is
in the integer range [128, 255]. Non-ASCII multi-byte characters in the string that are accessed with
the [ ] operator are read-only. If you try to set the value of a byte that is part of a multi-byte
character using the [ ] operator, FlexSim will throw an exception.
string str = "Σ=σ";
for (int i = 1; i <= str.length; i++) {
if (str[i] == '=') // comparing a byte to an ASCII character works fine
str[i] = '#'; // setting a byte to an ASCII character works fine,
// as long as the existing byte was already an ASCII character.
if (str[i] >= 128)
str[i] = ' '; // setting a byte that is part of a multi-byte character will throw an exception
}
Do no remove, this fixes the anchor on doc.flexsim.com