New in Wolfram Mathematica 7: Sequence Alignment & Analysis  previous | next 
Do Cluster Analysis Using Built-in Sequence Similarity Metrics
Find clusters of DNA sequences based on their global similarity to two reference sequences.
In[1]:=

Click for copyable input
str = Table[

   StringJoin@RandomChoice[StringSplit["ACGT", ""], 100], {500}];

xstr = StringJoin@RandomChoice[StringSplit["ACGT", ""], 100];

ystr = StringJoin@RandomChoice[StringSplit["ACGT", ""], 100];

colors = ColorData[14, "ColorList"];

Graphics[{#[[1]], PointSize[0.02], 

    MapThread[Point[#1] &, {#[[2, All, 1]], #[[2, All, 2]]}]} & /@ 

  Transpose[{colors, 

    FindClusters[

     Transpose[{Map[{NeedlemanWunschSimilarity[#, xstr]/100., 

           NeedlemanWunschSimilarity[#, ystr]/100.} + 

          RandomReal[0.02, 2] &, str], str}], 10, 

     DistanceFunction -> (Sqrt[(#1[[1]] - #2[[1]]).(#1[[1]] - #2[[

             1]])] &)]}]]
Out[1]=