How to Write an Email Miner for Python

104 19
    • 1). Open a terminal session and type python -v at the prompt to check that you have Python 2.6 or higher, but not 3.0 or higher. Versions 2.6 or 2.7 are ideal because they are compatible with NLTK and PyYAML. Visit the Python packages index page; find and download the PyYAML and NLTK packages. Unzip/untar them. Change your directory to the PyYAML directory. At command line prompt type in: sudo python setup.py install. It should look like this:

      My-Computer:PyYAML-3.2.0 Me$ sudo python setup.py install

      You will be prompted for a password. Type it and press the return button. Follow this procedure for every Python package you install.

    • 2). Download mail messages for parsing with the following lines of code:

      #!/usr/local/bin/python

      import poplib, getpass, sys, mailconfig

      mailserver = mailconfig.popservername

      mailuser = mailconfig.popusername

      mailpasswd = getpass.getpass('Password for %s?' % mailserver)

      server = poplib.POP3(mailserver)

      server.user(mailuser)

      server.pass_(mailpasswd)

      print(server.getwelcome())

      msgCount, msgBytes = server.stat()

      print('There are', msgCount, 'mail messages in', msgBytes, 'bytes')

      print(server.list())

      print('-' * 80)

      input('[Press Enter key]')

      for i in range(msgCount):

      hdr, message, octets = server.retr(i+1)

      for line in message: print(line.decode())

      read('-' * 80)

      if i < msgCount - 1:

      This script will connect to your pop3 email server, prompt you for your user name and password, count the number of messages on the server and read them into memory.

    • 3). Mine your email messages by converting each message to a string, a native data type in Python, that can be searched with Python's string methods, regular expression engine, and Natural Language Toolkit:

      m = msgCount[1]

      s = str(m)

      from email.parser import Parser

      import nltk

      import re

    • 4). Mine the first message for any information of interest. Discover how many words are in that message by entering the following command:

      >>>>len(s)

      It will return an integer value for the number of words. To find every sentence with the word mortgage, enter the following NLTK command:

      >>>>s.concordance('mortgage')

      This will return every sentence with the word mortgage in it; very useful for detectives investigating mortgage fraud.

Subscribe to our newsletter
Sign up here to get the latest news, updates and special offers delivered directly to your inbox.
You can unsubscribe at any time

Leave A Reply

Your email address will not be published.