Find a Splice Site in a DNA Sequence 

A DNA sequence consists of base letters A, C, G, and T. Suppose there is a sequence that begins in an exon, contains a splice site, and ends in an intron. If the exons have a uniform base composition, the introns are deficient in C and G, and the splice site consensus nucleotide is G with probability 0.95, the frequency distributions are as follows.

In[1]:=
Click for copyable input
X
In[2]:=
Click for copyable input
X

The state machine has states for exon (1), splice (2), intron (3), and end (4), with the following transition probabilities between states.

In[3]:=
Click for copyable input
X

The emissions are nucleotides A (1), C (2), G (3), T (4), or end (5).

In[4]:=
Click for copyable input
X
In[5]:=
Click for copyable input
X

Find the most probable nucleotide subsequence (exon, splice, intron, or end).

In[6]:=
Click for copyable input
X
Out[6]=

Find the joint probability of the preceding nucleotide sequence and the DNA sequence.

In[7]:=
Click for copyable input
X
Out[7]=
de es ja pt-br zh