Analyze Email Threads

The conversation graph of a mailbox consists of vertices representing a single email and a directed edge from each vertex (email) to its direct replies. Such a representation allows finding and analyzing the different conversation threads present in the mailbox. The example uses a mailing list archive taken from here.

Assuming the downloaded MBOX is stored in file, import particular message elements for all email messages.

Create an association with message IDs as the keys and associations containing the various files as the values.

Extract just the message IDs.

Select all messages that are replies to another message.

Create edges from each message to each of its replies.

Create a graph from the vertices and edges, using the new body content as the tooltip for each vertex.

Each connected component of the graph is one conversation thread. Separate individual message threads and analyze them.

As an example, compute the timeline of each conversation thread by using MinMax to find the earliest and latest originating dates.

Create a simple timeline using TimelinePlot.

Create custom labels for each thread with the subject, start time and end time.

Pass these labels to TimelinePlot to create a timeline with the enhanced label.

The threads can also be analyzed individually. The following selects the first thread containing exactly three messages; alternatively, two replies.

Visualize a timeline of the messages in this thread using new body content as labels.

Related Examples

de es fr ja ko pt-br zh