What is IMAP
IMAP stands for Internet Message Access Protocol. It's a protocol used for retrieving and managing email messages stored on a mail server. IMAP is designed to allow users to access their email messages from multiple devices while keeping the messages stored on the server.
Unlike POP3 (Post Office Protocol version 3), which downloads email messages to a local device and typically removes them from the server, IMAP maintains messages on the server.
Overview of how IMAP works:
Client Initiation: An email client (receiver) initiates an IMAP connection to an IMAP server (incoming mail server) on a designated port (usually port 143 or 993 for secure connections).
Handshake: The client and server perform a handshake to establish a connection. The client identifies itself using the IMAP CAPABILITY command.
Mailbox Selection: The client selects a mailbox (folder - Sent, Inbox, etc.) to retrieve email messages.
Listing Messages: The client can request a list of email messages in the selected mailbox using the FETCH command. The server responds with a list of message identifiers.
Message Retrieval: The client can retrieve specific email messages by sending the FETCH command with the desired message identifier. The server sends the email content, including headers and body.
Synchronization: IMAP maintains synchronization between the server and the client. For example, if a message is marked as read on one device, the change is reflected across all devices.
Closing Connection: After the client is done with the mailbox, it can close the IMAP connection using the LOGOUT command.
IMAP is particularly useful for users who access their email from multiple devices or prefer to keep their messages stored on the server.
Receiving email using IMAP
IMAP server setting
# IMAP server settings
IMAP_SERVER = 'smtp.gmail.com'
USERNAME = 'example@mail.com'
PASSWORD = 'example_password'
# Connect to the IMAP server
imap = imaplib.IMAP4_SSL(IMAP_SERVER)
imap.login(USERNAME, PASSWORD)
# list all the mailboxes - Inbox, Sent, "[Gmail]/All Mail", etc
print(imap.list())
Search for all emails
# Select the mailbox (inbox in this case)
mailbox = '"[Gmail]/All Mail"'
imap.select(mailbox)
# Search for all emails (you can refine this search using criteria)
status, email_ids = imap.search(None, "ALL")
email_ids = email_ids[0].split()
# sort the ids in the reverse order to get the latest email at the beginning.
# Get the latest 10 emails.
if email_ids:
email_ids = email_ids[::-1][:10]
# Create a dictionary to store emails and their thread emails
email_threads_map = {}
Fetch and process emails
Among the other properties, an email has message_id
and references
.
Message-id
is the unique id of any email and references is the field where the message id
of the email is stored against which the reply email is being sent. Here is an example:
If an email R1
is replied to the email M1
. Then, both M1
and R1
will have unique message-id
. But R1
will only have the reference-id
where the message-id
of M1 will be stored.
In the same way, If another email R2
is replied to the R1
email, then it will have the message-id
of both M1
and R1
in the references field. That means in the references
field, It stores the message-id
s of the previous emails. In other way, we can say that, It stores a copy of the references field and adds the message-id of the email we are currently replying to.
M1 -
message-id: M1_id
references: None
R1 -
message-id: R1_id
references: [M1_id]
R2 -
message-id: R2_id
references: [M1_id, R1_id]
From this example, we can say that M1_id
will always be there in the references of all the previous reply emails. So, using this idea, we can determine that all the emails M1, R1, and R2 are in the same thread where M1 is the parent email and R1, R2 are reply emails.
def separate_emails_based_on_thread(imap, email_ids, email_threads_map) -> dict:
# Fetch and process emails
for email_id in email_ids:
status, msg_data = imap.fetch(email_id, "(RFC822)")
raw_email = msg_data[0][1]
msg = email.message_from_bytes(raw_email)
# Extract Message-ID and References
message_id = msg.get("Message-ID")
references = msg.get("References")
# Use Message-ID as thread ID
thread_id = message_id
if references:
reference_ids = references.split()
thread_id = reference_ids[0]
if not email_threads_map.get(thread_id):
email_threads_map[thread_id] = []
email_threads_map[thread_id].append(msg)
return email_threads_map
So, Finally, after putting all codes together, we can fetch all the emails and separate them according to their thread.
Final code to receive emails
import imaplib
# IMAP server settings
IMAP_SERVER = 'smtp.gmail.com'
USERNAME = 'example@mail.com'
PASSWORD = 'example_password'
# Connect to the IMAP server
imap = imaplib.IMAP4_SSL(IMAP_SERVER)
imap.login(USERNAME, PASSWORD)
# list all the mailboxes - Inbox, Sent, "[Gmail]/All Mail", etc
print(imap.list())
# Select the mailbox (inbox in this case)
mailbox = '"[Gmail]/All Mail"'
imap.select(mailbox)
# Search for all emails (you can refine this search using criteria)
status, email_ids = imap.search(None, "ALL")
email_ids = email_ids[0].split()
# get the latest 10 emails.
if email_ids:
email_ids = email_ids[::-1][:10]
# Create a dictionary to store emails and their thread emails
email_threads_map = {}
# Fetch and process emails
email_threads_map = **separate_emails_based_on_thread**(imap, email_ids, email_threads_map)
# Logout and close the connection
imap.logout()
# Print the list of emails and their related thread emails
for thread_id, thread_emails in email_threads_map.items():
print(f"Thread ID: {thread_id}\\n")
for msg in thread_emails:
print(f"Subject: {msg['Subject']}")
print(f"In-Reply-To: {msg.get('In-Reply-To')}")
print("=" * 40)
A little update need to be made for Outlook for fetching and processing emails
For fetching and processing emails in Outlook, we do not get all emails at once. We have to fetch inbox
and sent
emails separately. Here you can see how to do it.
# Create a dictionary to store emails and their thread emails
email_threads_map = {}
# Select the mailbox (inbox in this case)
mailbox = 'Sent'
imap.select(mailbox)
# Search for all emails (you can refine this search using criteria)
status, email_ids = imap.search(None, "ALL")
sent_email_ids = email_ids[0].split()
if sent_email_ids:
sent_email_ids = sent_email_ids[::-1][:10]
email_threads_map = separate_emails_based_on_thread(imap, sent_email_ids, email_threads_map)
# Select the mailbox (inbox in this case)
mailbox = 'Inbox'
imap.select(mailbox)
# Search for all emails (you can refine this search using criteria)
status, email_ids = imap.search(None, "ALL")
inbox_email_ids = email_ids[0].split()
if inbox_email_ids:
inbox_email_ids = inbox_email_ids[::-1][:10]
email_threads_map = separate_emails_based_on_thread(imap, inbox_email_ids, email_threads_map)
# Logout and close the connection
imap.logout()
Download attachments
import os
# fetch data of an email
status, msg_data = imap.fetch(email_id, "(RFC822)")
raw_email = msg_data[0][1]
msg = email.message_from_bytes(raw_email)
for part in email_message.walk():
if part.get_content_maintype() == 'multipart':
continue
if part.get('Content-Disposition') is None:
continue
filename = part.get_filename()
att_path = os.path.join("attachments", filename)
if not os.path.isfile(att_path):
fp = open(att_path, 'wb')
fp.write(part.get_payload(decode=True))
fp.close()
print('Downloaded file:', filename)
Get plain text and HTML text data
if email_message.is_multipart():
print('To:\t\t', email_message['To'])
print('From:\t', email_message['From'])
print('Subject:', email_message['Subject'])
print('Date:\t', email_message['Date'])
for part in email_message.walk():
if (part.get_content_type() == 'text/plain') and (part.get('Content-Disposition') is None):
print('Body:\t', part.get_payload(decode=True).decode())
elif (part.get_content_type() == 'text/html') and (part.get('Content-Disposition') is None):
print('HTML Body:\t', part.get_payload(decode=True).decode())
else:
print('To:\\t\\t', email_message['To'])
print('From:\\t', email_message['From'])
print('Subject:', email_message['Subject'])
print('Date:\\t', email_message['Date'])
print('Thread-Index:\\t', email_message['Thread-Index'])
text = f"{email_message.get_payload(decode=True)}"
print("text: ", text)