Wolfram Computation Meets Knowledge

42 String Patterns and Templates

42String Patterns and Templates
This picks out all instances of + followed by a single character:
StringCases["+string +patterns are +quite +easy", "+" ~~ _]
This picks out three characters after each +:
StringCases["+string +patterns are +quite +easy", "+" ~~ _ ~~ _ ~~ _]
Use the name x for the character after each +, and return that character framed:
StringCases["+string +patterns are +quite +easy", "+" ~~ x_ -> Framed[x]]
In a string pattern, _ stands for any single character. __ (“double blank”) stands for any sequence of one or more characters, and ___ (“triple blank”) stands for any sequence of zero or more characters. __ and ___ will normally grab as much of the string as they can.
Pick out the sequence of characters between [ and ]:
StringCases["the [important] word", "[" ~~ x__ ~~ "]" -> Framed[x]]
__ normally matches as long a sequence of characters as it can:
StringCases["now [several] important [words]", "[" ~~ x__ ~~ "]" -> Framed[x]]
Shortest forces the shortest match:
StringCases["now [several] important [words]", "[" ~~ Shortest[x__] ~~ "]" -> Framed[x]]
StringCases picks out cases of a particular pattern in a string. StringReplace makes replacements.
Make replacements for characters in the string:
StringReplace["now [several] important [words]", {"[" -> "<<", "]" -> ">>"}]
Make replacements for patterns, using to compute ToUpperCase in each case:
StringReplace["now [several] important [words]", "[" ~~ Shortest[x__] ~~ "]" :> ToUpperCase[x]]
Use NestList to apply a string replacement repeatedly:
NestList[StringReplace[#, {"A" -> "AB", "B" -> "BA"}] &, "A", 5]
StringMatchQ tests whether a string matches a pattern.
Select common words that match the pattern of beginning with a and ending with b:
Select[WordList[ ], StringMatchQ[#, "a" ~~ ___ ~~ "b"] &]
You can use | and .. in string patterns just like in ordinary patterns.
Pick out any sequence of A or B repeated:
StringCases["the AAA and the BBB and the ABABBBABABABA", ("A" | "B") ..]
In a string pattern, LetterCharacter stands for any letter character, DigitCharacter for any digit character and Whitespace for any sequence of “white” characters such as spaces.
StringCases["12 and 123 and 4567 and 0x456", DigitCharacter ..]
Pick out sequences of digit characters “flanked” by whitespace:
StringCases["12 and 123 and 4567 and 0x456", Whitespace ~~ DigitCharacter .. ~~ Whitespace]
Split a string into a list of pieces, by default breaking at spaces:
StringSplit["a string to split"]
This uses a string pattern to decide where to split:
StringSplit["you+can+split--at+any--delimiter", "+" | "--"]
Within strings, there’s a special newline character which indicates where the string should break onto a new line. The newline character is represented within strings as \n.
Split at newlines:
StringSplit["first line second line third line", "\n"]
StringJoin joins any list of strings together. In practice, though, one often wants to insert something between the strings before joining them. StringRiffle does this.
Join strings, riffling the string "---" in between them:
StringRiffle[{"a", "list", "of", "strings"}, "---"]
TextString turns numbers and other Wolfram Language expressions into strings:
StringJoin["two to the ", TextString[50], " is ", TextString[2^50]]
A more convenient way to create strings from expressions is to use string templates. String templates work like pure functions in that they have slots into which arguments can be inserted.
In a string template each `` is a slot for a successive argument:
StringTemplate["first `` then ``"][100, 200]
Named slots pick elements from an association:
StringTemplate[ "first: `a`; second `b`; first again `a`"][<|"a" -> "AAAA", "b" -> "BB BBB"|>]
You can insert any expression within a string template by enclosing it with <*...*>. The value of the expression is computed when the template is applied.
Evaluate the <*...*> when the template is applied; no arguments are needed:
StringTemplate["2 to the 50 is <* 2^50 *>"][ ]
Use slots in the template (` is the backquote character):
StringTemplate["`1` to the `2` is <* #1^#2 *>"][2, 50]
The expression in the template is evaluated when the template is applied:
StringTemplate["the time now is <* Now *>"][ ]
patt1~~patt2 sequence of string patterns
Shortest[patt] shortest sequence that matches
StringCases[string,patt] cases within a string matching a pattern
StringReplace[string,pattval] replace a pattern within a string
StringMatchQ[string,patt] test whether a string matches a pattern
LetterCharacter pattern construct matching a letter
DigitCharacter pattern construct matching a digit
Whitespace pattern construct matching spaces, etc.
\n newline character
StringSplit[string] split a string into a list of pieces
StringJoin[{string1,string2, ...}] join strings together
StringRiffle[{string1,string2, ...},m] join strings, inserting m between them
TextString[expr] make a text string out of anything
StringTemplate[string] create a string template to apply
`` slot in a string template
< *...*> expression to evaluate in a string template
42.1Replace each space in "1 2 3 4" with "---"»
Expected output:
42.2Get a sorted list of all sequences of 4 digits (representing possible dates) in the Wikipedia article on computers. »
Sample expected output:
42.3Extract “headings” in the Wikipedia article about computers, as indicated by strings starting and ending with "==="»
Sample expected output:
42.4Use a string template to make a grid of results of the form i+j=... for i and j up to 9. »
Expected output:
42.5Find names of integers below 50 that have an “i” somewhere before an “e”. »
Expected output:
42.6Make any 2-letter word uppercase in the first sentence from the Wikipedia article on computers. »
Sample expected output:
42.7Make a labeled bar chart of the number of countries whose TextString names start with each possible letter. »
Sample expected output:
Expected output:
It’s usually read “tilde tilde”. The underlying function is StringExpression.
How does one type `` to make a slot in a string template?
It’s a pair of what are usually called backquote or backtick characters. On many keyboards, they’re at the top left, along with ~ (tilde).
Can I write rules for understanding natural language?
Yes, but we didn’t cover that here. The key function is GrammarRules.
What does TextString do when things don’t have an obvious textual form?
It does its best to make something human readable, but if all else fails, it’ll fall back on InputForm.