Read emails from the mail server using IMAP

Read emails from the mail server using IMAP

What is IMAP

IMAP stands for Internet Message Access Protocol. It's a protocol used for retrieving and managing email messages stored on a mail server. IMAP is designed to allow users to access their email messages from multiple devices while keeping the messages stored on the server.

Unlike POP3 (Post Office Protocol version 3), which downloads email messages to a local device and typically removes them from the server, IMAP maintains messages on the server.

Overview of how IMAP works:

  1. Client Initiation: An email client (receiver) initiates an IMAP connection to an IMAP server (incoming mail server) on a designated port (usually port 143 or 993 for secure connections).

  2. Handshake: The client and server perform a handshake to establish a connection. The client identifies itself using the IMAP CAPABILITY command.

  3. Mailbox Selection: The client selects a mailbox (folder - Sent, Inbox, etc.) to retrieve email messages.

  4. Listing Messages: The client can request a list of email messages in the selected mailbox using the FETCH command. The server responds with a list of message identifiers.

  5. Message Retrieval: The client can retrieve specific email messages by sending the FETCH command with the desired message identifier. The server sends the email content, including headers and body.

  6. Synchronization: IMAP maintains synchronization between the server and the client. For example, if a message is marked as read on one device, the change is reflected across all devices.

  7. Closing Connection: After the client is done with the mailbox, it can close the IMAP connection using the LOGOUT command.

IMAP is particularly useful for users who access their email from multiple devices or prefer to keep their messages stored on the server.

Receiving email using IMAP

IMAP server setting

# IMAP server settings
IMAP_SERVER = 'smtp.gmail.com'
USERNAME = 'example@mail.com'
PASSWORD = 'example_password'

# Connect to the IMAP server
imap = imaplib.IMAP4_SSL(IMAP_SERVER)
imap.login(USERNAME, PASSWORD)

# list all the mailboxes - Inbox, Sent, "[Gmail]/All Mail", etc
print(imap.list())

Search for all emails

# Select the mailbox (inbox in this case)
mailbox = '"[Gmail]/All Mail"'
imap.select(mailbox)

# Search for all emails (you can refine this search using criteria)
status, email_ids = imap.search(None, "ALL")
email_ids = email_ids[0].split()

# sort the ids in the reverse order to get the latest email at the beginning.
# Get the latest 10 emails.
if email_ids:
    email_ids = email_ids[::-1][:10]

# Create a dictionary to store emails and their thread emails
email_threads_map = {}

Fetch and process emails

Among the other properties, an email has message_id and references.

Message-id is the unique id of any email and references is the field where the message id of the email is stored against which the reply email is being sent. Here is an example:

If an email R1 is replied to the email M1. Then, both M1 and R1 will have unique message-id. But R1 will only have the reference-id where the message-id of M1 will be stored.

In the same way, If another email R2 is replied to the R1 email, then it will have the message-id of both M1 and R1 in the references field. That means in the references field, It stores the message-ids of the previous emails. In other way, we can say that, It stores a copy of the references field and adds the message-id of the email we are currently replying to.

M1 - 
    message-id: M1_id
    references: None
R1 -
    message-id: R1_id
    references: [M1_id]
R2 -
    message-id: R2_id
    references: [M1_id, R1_id]

From this example, we can say that M1_id will always be there in the references of all the previous reply emails. So, using this idea, we can determine that all the emails M1, R1, and R2 are in the same thread where M1 is the parent email and R1, R2 are reply emails.

def separate_emails_based_on_thread(imap, email_ids, email_threads_map) -> dict:
    # Fetch and process emails
    for email_id in email_ids:
        status, msg_data = imap.fetch(email_id, "(RFC822)")
        raw_email = msg_data[0][1]
        msg = email.message_from_bytes(raw_email)

        # Extract Message-ID and References
        message_id = msg.get("Message-ID")
        references = msg.get("References")

        # Use Message-ID as thread ID
        thread_id = message_id

        if references:
            reference_ids = references.split()
            thread_id = reference_ids[0]

        if not email_threads_map.get(thread_id):
            email_threads_map[thread_id] = []

        email_threads_map[thread_id].append(msg)

    return email_threads_map

So, Finally, after putting all codes together, we can fetch all the emails and separate them according to their thread.

Final code to receive emails

import imaplib

# IMAP server settings
IMAP_SERVER = 'smtp.gmail.com'
USERNAME = 'example@mail.com'
PASSWORD = 'example_password'

# Connect to the IMAP server
imap = imaplib.IMAP4_SSL(IMAP_SERVER)
imap.login(USERNAME, PASSWORD)

# list all the mailboxes - Inbox, Sent, "[Gmail]/All Mail", etc
print(imap.list())

# Select the mailbox (inbox in this case)
mailbox = '"[Gmail]/All Mail"'
imap.select(mailbox)

# Search for all emails (you can refine this search using criteria)
status, email_ids = imap.search(None, "ALL")
email_ids = email_ids[0].split()

# get the latest 10 emails.
if email_ids:
    email_ids = email_ids[::-1][:10]

# Create a dictionary to store emails and their thread emails
email_threads_map = {}

# Fetch and process emails
email_threads_map = **separate_emails_based_on_thread**(imap, email_ids, email_threads_map)

# Logout and close the connection
imap.logout()

# Print the list of emails and their related thread emails
for thread_id, thread_emails in email_threads_map.items():
    print(f"Thread ID: {thread_id}\\n")
    for msg in thread_emails:
        print(f"Subject: {msg['Subject']}")
        print(f"In-Reply-To: {msg.get('In-Reply-To')}")
    print("=" * 40)

A little update need to be made for Outlook for fetching and processing emails

For fetching and processing emails in Outlook, we do not get all emails at once. We have to fetch inbox and sent emails separately. Here you can see how to do it.

# Create a dictionary to store emails and their thread emails
email_threads_map = {}

# Select the mailbox (inbox in this case)
mailbox = 'Sent'
imap.select(mailbox)

# Search for all emails (you can refine this search using criteria)
status, email_ids = imap.search(None, "ALL")
sent_email_ids = email_ids[0].split()

if sent_email_ids:
    sent_email_ids = sent_email_ids[::-1][:10]
email_threads_map = separate_emails_based_on_thread(imap, sent_email_ids, email_threads_map)

# Select the mailbox (inbox in this case)
mailbox = 'Inbox'
imap.select(mailbox)

# Search for all emails (you can refine this search using criteria)
status, email_ids = imap.search(None, "ALL")
inbox_email_ids = email_ids[0].split()

if inbox_email_ids:
    inbox_email_ids = inbox_email_ids[::-1][:10]

email_threads_map = separate_emails_based_on_thread(imap, inbox_email_ids, email_threads_map)

# Logout and close the connection
imap.logout()

Download attachments

import os

# fetch data of an email
status, msg_data = imap.fetch(email_id, "(RFC822)")
    raw_email = msg_data[0][1]
    msg = email.message_from_bytes(raw_email)

for part in email_message.walk():
    if part.get_content_maintype() == 'multipart':
        continue
    if part.get('Content-Disposition') is None:
        continue

    filename = part.get_filename()
    att_path = os.path.join("attachments", filename)
    if not os.path.isfile(att_path):
        fp = open(att_path, 'wb')
        fp.write(part.get_payload(decode=True))
        fp.close()
        print('Downloaded file:', filename)

Get plain text and HTML text data

if email_message.is_multipart():
    print('To:\t\t', email_message['To'])
    print('From:\t', email_message['From'])
    print('Subject:', email_message['Subject'])
    print('Date:\t', email_message['Date'])

    for part in email_message.walk():
        if (part.get_content_type() == 'text/plain') and (part.get('Content-Disposition') is None):
            print('Body:\t', part.get_payload(decode=True).decode())
        elif (part.get_content_type() == 'text/html') and (part.get('Content-Disposition') is None):
            print('HTML Body:\t', part.get_payload(decode=True).decode())
else:
    print('To:\\t\\t', email_message['To'])
    print('From:\\t', email_message['From'])
    print('Subject:', email_message['Subject'])
    print('Date:\\t', email_message['Date'])
    print('Thread-Index:\\t', email_message['Thread-Index'])
    text = f"{email_message.get_payload(decode=True)}"
    print("text: ", text)