Scraping Migros M-Budget with python

Motivation

I recently caught myself repeatedly checking my internet consumption balance.

A perfect case for a script and an accompanying tutorial.

For the sake of this tutorial’s scope and the script’s brevity I’ll skip various code paradigms like exception handling, testing, continuous integration, virtual environments etc.

In this particular case the concerning ISP (Internet Service Provider) is Migros, their mobile brand is M-Budget and the actual provider is Swisscom.

How to understand this post

Each section contains a Manual and Automated section. Manual shall depict how the obstacle is taken by hand such as typing and clicking the mouse. Automated shows how we’re doing it with a code snippet.

The Complete script contains all snippets merged and ready for copy/paste as described in Usage

Script ingredients

Modules

import requests
from lxml import html
from lxml import etree

Urls

Our working URLs are built upon one URL which serves as base URL in order to build up all endpoints.

BASE_URL = 'https://selfcare.m-budget.migros.ch/eCare'
LOGIN_URL = BASE_URL+'/de/users/sign_in'
LOGOUT_URL = BASE_URL+'/de/users/sign_out'
CONSUMPTION_URL = BASE_URL+'/prepaid/de/my_consumption/index'

Process overview

Visit the ISP’s login page

Manual

Launch your browser, open login and enable the browser’s developer console by pressing F12 on your keyboard.

Automated

1
2
session_requests = requests.session()
result = session_requests.get(LOGIN_URL)

Login

Manual

Enter your username and password and hit return.

After submitting your credentials, deliberate what you just submitted.

Superficially, our login seems to contain only our credentials, thus:

  1. Username (here it’s a phone number beginning with 077)
  2. Password

However, the developer console (F12 on your keyboard) and it’s Network tab reveal more.

The data is sent as POST request, therefore we look for POST methods and therein for the Form data (on the bottom right in figure 1) Alongside the mentioned credentials, a couple of hidden data was submitted as well.

In particular the authenticity_token is interesting. It’s part of a security remedy against a malicious hack. Also known asĀ 1 Cross-site request-forgery or in abbreviated form csrf.

Figure 1: Post form data

Figure 1: Post form data

But where does that cryptic csrf token value come from?

Automated

  • Fetch csrf token

    Now we know where to seek what element:

    1. Where: In the html source text at line 1
    2. What: A meta tag named csrf-token contains the token as it’s content. At line 2

    1
    2
    
    tree = html.fromstring(result.text)
    authenticity_token = tree.xpath("//meta[@name='csrf-token']/@content")[0]

  • Credentials

    PASSWORD = 'changeMe'
    USERNAME = 'changeMe'
    
  • Prepare our form

    We pass the csrf token along with the remaining data as dictionary.

    Compare the similarity to the form in figure 1:

    1
    2
    3
    4
    5
    6
    7
    8
    
    data = {
        'utf8': '\u2713',
        'authenticity_token': authenticity_token,
        'user[id]': USERNAME,
        'user[password]': PASSWORD,
        'user[reseller]': '33',
        'button': ''
    }

  • Sign-in

    We’re all set and can finally sign-in passing our form

    1
    2
    3
    
    result = session_requests.post(LOGIN_URL,
                                   data=data,
                                   headers=dict(referer=LOGIN_URL))

Get the consumption values

Manual

There are multiple ways of finding the credit values

  1. By searching for a value (here 500) in the browsers developer console as shown below in figure 2
  2. By mouse hovering over the element (in the browser, not the dev console) and right-click Inspect element (q)

Figure 2: Searching for values and displaying css properties

Figure 2: Searching for values and displaying css properties

We’re interested in all credits (10,0,500) in order to sum them up as our balance 510.

Automated

Here’s how we’re scripting the balance part starting at line 3:

First we set our focus on the parenting div in the class named circle__inside in order to extract the text from it’s first [1] span element. Then we loop through each child (circles) and append each circle’s credit value to our balance list. Finally we print the balance.

1
2
3
4
5
6
7
result = session_requests.get(CONSUMPTION_URL)
tree = etree.HTML(result.text)
circles = tree.xpath('.//div[@class="circle__inside"]//span[1]/text()')
balance = []
for credit in circles:
    balance.append(int(credit))
print("Your current balance is:", sum(balance), "MB")

Logout

We’re done and therefore properly logout.

Manual

  1. Click on the profile icon in the upper right corner
  2. Click sign-out (Abmelden)

Automated

1
response = session_requests.get(LOGOUT_URL)

Complete script

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
import requests
from lxml import html
from lxml import etree
BASE_URL = 'https://selfcare.m-budget.migros.ch/eCare'
LOGIN_URL = BASE_URL+'/de/users/sign_in'
LOGOUT_URL = BASE_URL+'/de/users/sign_out'
CONSUMPTION_URL = BASE_URL+'/prepaid/de/my_consumption/index'
PASSWORD = 'changeMe'
USERNAME = 'changeMe'


def main():
    session_requests = requests.session()
    result = session_requests.get(LOGIN_URL)
    tree = html.fromstring(result.text)
    authenticity_token = tree.xpath("//meta[@name='csrf-token']/@content")[0]
    data = {
        'utf8': '\u2713',
        'authenticity_token': authenticity_token,
        'user[id]': USERNAME,
        'user[password]': PASSWORD,
        'user[reseller]': '33',
        'button': ''
    }
    result = session_requests.post(LOGIN_URL,
                                   data=data,
                                   headers=dict(referer=LOGIN_URL))
    result = session_requests.get(CONSUMPTION_URL)
    tree = etree.HTML(result.text)
    circles = tree.xpath('.//div[@class="circle__inside"]//span[1]/text()')
    balance = []
    for credit in circles:
        balance.append(int(credit))
    print("Your current balance is:", sum(balance), "MB")
    response = session_requests.get(LOGOUT_URL)

if __name__ == '__main__':
    main()

Your current balance is: 510 MB

Usage

  1. Store the complete script above as e.g. ~/tmp/mbudget.py
  2. Change the Credentials in ~/tmp/mbudget.py
  3. Run it:
python3 ~/tmp/mbudget.py
Your current balance is: 510 MB

Conclusion

Scripting our manual steps now allows us to store and process the returned data further. We could schedule it using a systemd timer or cron. Or send a notification by email or sms if a certain limit is reached.

And of course: No more clicking, calculating and repeating.