What is New in Mathematica 5.1 Products
-----
 /
Mathematica
<String Manipulation
*DatabaseLink
*Binary Data I/O
*Additional Import/Export Formats
*Web Services
*Large-Scale Graph Layout
*Array Plotting
*Enhanced Linear Algebra Performance
*Piecewise
*Integration over Regions
*Event Handling
*Symbolic Differential Equations
*Cluster Analysis
*EquationTrekker
*MathematicaMark2004
*GUIKit
*Summary of Features
*Ask about this page
PreviousNext

Industrial-Strength String Manipulation

See What's New in Wolfram Mathematica 6

Comprehensive string manipulation, matching, and searching capabilities are now built into Mathematica, and integrated with its high-level pattern-matching system.

Pattern matching has always been central to Mathematica's language, ease of use, and ability to manipulate general expressions. Now string data can be manipulated in the same way, with performance scalable to gigabyte sequences. In combination with Mathematica's extensive symbolic language and computational abilities, this makes Mathematica uniquely capable for text-intensive applications, such as website manipulation, data mining, and bioinformatics.

Mathematica provides a high-level syntax for string patterns, but also allows low-level regular expression syntax of the kind familiar to Perl and Python users.




More Details

 Get complete example
Example: Bioinformatics

Here is the beginning of the string giving the genome of the SARS virus:

CTACCCAGGAAAAGCCAACCAACCTCGATCTCTTGTAGATCTGTTCTCTAAACGAACTTTAAAATCTGTGTAGCTGTCGCTCG
GCTGCATGCCTAGTGCACCTACGCAGTATAAACAATAATAAATTTTACTGTCGTTGACAAGAAACGAGTAACTCGTCCCTCTT
CTGCAGACTGCTTACGGTTTCGTCCGTGTTGCAGTCGATCATCAGCATACCTAGGTTTCGTCCGGGTGTGACCGAAAGGTAAG
ATGGAGAGCCTTGTTCTTGGTGTCAACGAGAAAACACACGTCCAACTCAGTTTGCCTGTCCTTCAGGTTAGAGACGTGCTAGT
GCGTGGCTTCGGGGACTCTGTGGAAGAGGCCCTATCGGAGGCACGTGAACACCTCAAAAATGGCACTTGTGGTCTAGTAGAGC
TGGAAAAAGGCGTACTGCCCCAGCTTGAACAGCCCTATGTGTTCATTAAACGTTCTGATGCCTTAAGCACCAATCACGGCCAC
AAGGT...

This finds short palindromes in the SARS genome:

In[1]:=StringCases[SARS, x_ ~ ~ y_ ~ ~ x_ ~ ~ x_ ~ ~ x_ ~ ~ y_ ~ ~ x_]

Out[1]:= {TCTTTCT, ACAAACA, CTCCCTC, TATTTAT, TCTTTCT, TCTTTCT, GTGGGTG, TCTTTCT, TCTTTCT, TGTTTGT, TATTTAT, ACAAACA, ACAAACA, TATTTAT, ACAAACA, ATAAATA, TCTTTCT, TGTTTGT, ACAAACA, TATTTAT, AAAAAAA, TTTTTTT, ATAAATA, ACAAACA, AGAAAGA, GAGGGAG, ACAAACA, AAAAAAA, AAAAAAA, AAAAAAA}

This finds long repeats:

In[2]:= StringCases[SARS, w : ((x_) ..) /; StringLength[w] > 5]

Out[2]:= {CCCCCC, GGGGGG, AAAAAAAA, TTTTTTT, TTTTTT, CCCCCC, AAAAAAAAAAAAAAAAAAAAAAAA}


 Get complete example
Example: Dictionary

Mathematica 5.1 comes with a 90,000-word English dictionary.

This finds all palindromic words in the dictionary.

In[3]:= FindWords[x__ /; x == StringReverse[x]]

Out[3]:= {a, aha, aka, bib, bob, boob, bub, CFC, civic, dad, dd, deed, deified, did, dud, DVD, eke, ere, eve, ewe, eye, gag, gig, huh, I, kayak, kook, level, ma'am, madam, mam, MGM, minim, mom, mum, nan, non, noon, nun, oho, pap, peep, pep, pip, poop, pop, pup, radar, redder, refer, repaper, reviver, rotor, sagas, sees, seres, sexes, shahs, sis, solos, SOS, stats, stets, tat, tenet, tit, TNT, toot, tot, tut, wow, WWW}


Related Links

Tutorials from The Mathematica Book  Tutorials from The Mathematica Book
Operations on Strings
String Patterns
Advanced Topic: Regular Expressions
  
Built-In Functions Reference Guide Built-In Functions Reference Guide
StringExpression
RegularExpression
StringCases
StringCount
StringDrop
StringFreeQ
StringInsert
StringLength
StringMatchQ
StringPosition
StringReplace
StringReplaceList
StringReplacePart
StringReverse
StringSplit
StringTake
  
In-Depth Advanced Documentation  In-Depth Advanced Documentation
String Patterns in Mathematica
  
Additional Information  Additional Information
Technical Presentation: String Patterns


Back to Top PreviousNext



 © 2008 Wolfram Research, Inc.  Terms of Use  Privacy Policy | [ja]