Scraping Migros M-Budget with python

7 April, 2020

Table of Contents

Motivation
How to understand this post
Script ingredients
- Modules
- Urls
Process overview
Complete script
Usage
Conclusion

Motivation

I recently caught myself repeatedly checking my internet consumption balance.

A perfect case for a script and an accompanying tutorial.

For the sake of this tutorial’s scope and the script’s brevity I’ll skip various code paradigms like exception handling, testing, continuous integration, virtual environments etc.

In this particular case the concerning ISP (Internet Service Provider) is Migros, their mobile brand is M-Budget and the actual provider is Swisscom.

How to understand this post

Each section contains a Manual and Automated section. Manual shall depict how the obstacle is taken by hand such as typing and clicking the mouse. Automated shows how we’re doing it with a code snippet.

The Complete script contains all snippets merged and ready for copy/paste as described in Usage

Script ingredients

Modules

import requests
from lxml import html
from lxml import etree

Urls

Our working URLs are built upon one URL which serves as base URL in order to build up all endpoints.

BASE_URL = 'https://selfcare.m-budget.migros.ch/eCare'
LOGIN_URL = BASE_URL+'/de/users/sign_in'
LOGOUT_URL = BASE_URL+'/de/users/sign_out'
CONSUMPTION_URL = BASE_URL+'/prepaid/de/my_consumption/index'

Process overview

Manual

Launch your browser, open login and enable the browser’s developer console by pressing F12 on your keyboard.

Automated

1
2


session_requests = requests.session()
result = session_requests.get(LOGIN_URL)

Manual

Enter your username and password and hit return.

After submitting your credentials, deliberate what you just submitted.

Superficially, our login seems to contain only our credentials, thus:

Username (here it’s a phone number beginning with 077)
Password

However, the developer console (F12 on your keyboard) and it’s Network tab reveal more.

The data is sent as POST request, therefore we look for POST methods and therein for the Form data (on the bottom right in figure 1) Alongside the mentioned credentials, a couple of hidden data was submitted as well.

In particular the authenticity_token is interesting. It’s part of a security remedy against a malicious hack. Also known as ¹ Cross-site request-forgery or in abbreviated form csrf.

But where does that cryptic csrf token value come from?

Finding the csrf token

Remember the login page? https://selfcare.m-budget.migros.ch/eCare/de/users/sign%5Fin

Logout, visit it again and look at the source code (Mouse Right-Click: View page source).
```
<meta name="csrf-token" content="mX12K..." />
```

Automated

Fetch csrf token

Now we know where to seek what element:
1. Where: In the html source text at line 1
2. What: A meta tag named csrf-token contains the token as it’s content. At line 2
1 2

tree = html.fromstring(result.text) authenticity_token = tree.xpath("//meta[@name='csrf-token']/@content")[0]

Credentials

PASSWORD = 'changeMe'
USERNAME = 'changeMe'

Prepare our form

We pass the csrf token along with the remaining data as dictionary.

Compare the similarity to the form in figure 1:

1
2
3
4
5
6
7
8


data = {
    'utf8': '\u2713',
    'authenticity_token': authenticity_token,
    'user[id]': USERNAME,
    'user[password]': PASSWORD,
    'user[reseller]': '33',
    'button': ''
}

Sign-in

We’re all set and can finally sign-in passing our form

1
2
3


result = session_requests.post(LOGIN_URL,
                               data=data,
                               headers=dict(referer=LOGIN_URL))

Get the consumption values

Manual

There are multiple ways of finding the credit values

By searching for a value (here 500) in the browsers developer console as shown below in figure 2
By mouse hovering over the element (in the browser, not the dev console) and right-click Inspect element (q)

Figure 2: Searching for values and displaying css properties

We’re interested in all credits (10,0,500) in order to sum them up as our balance 510.

Automated

Here’s how we’re scripting the balance part starting at line 3:

First we set our focus on the parenting div in the class named circle__inside in order to extract the text from it’s first [1] span element. Then we loop through each child (circles) and append each circle’s credit value to our balance list. Finally we print the balance.

1
2
3
4
5
6
7


result = session_requests.get(CONSUMPTION_URL)
tree = etree.HTML(result.text)
circles = tree.xpath('.//div[@class="circle__inside"]//span[1]/text()')
balance = []
for credit in circles:
    balance.append(int(credit))
print("Your current balance is:", sum(balance), "MB")

Logout

We’re done and therefore properly logout.

Manual

Click on the profile icon in the upper right corner
Click sign-out (Abmelden)

Automated

1

response = session_requests.get(LOGOUT_URL)

Complete script

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38


import requests
from lxml import html
from lxml import etree
BASE_URL = 'https://selfcare.m-budget.migros.ch/eCare'
LOGIN_URL = BASE_URL+'/de/users/sign_in'
LOGOUT_URL = BASE_URL+'/de/users/sign_out'
CONSUMPTION_URL = BASE_URL+'/prepaid/de/my_consumption/index'
PASSWORD = 'changeMe'
USERNAME = 'changeMe'


def main():
    session_requests = requests.session()
    result = session_requests.get(LOGIN_URL)
    tree = html.fromstring(result.text)
    authenticity_token = tree.xpath("//meta[@name='csrf-token']/@content")[0]
    data = {
        'utf8': '\u2713',
        'authenticity_token': authenticity_token,
        'user[id]': USERNAME,
        'user[password]': PASSWORD,
        'user[reseller]': '33',
        'button': ''
    }
    result = session_requests.post(LOGIN_URL,
                                   data=data,
                                   headers=dict(referer=LOGIN_URL))
    result = session_requests.get(CONSUMPTION_URL)
    tree = etree.HTML(result.text)
    circles = tree.xpath('.//div[@class="circle__inside"]//span[1]/text()')
    balance = []
    for credit in circles:
        balance.append(int(credit))
    print("Your current balance is:", sum(balance), "MB")
    response = session_requests.get(LOGOUT_URL)

if __name__ == '__main__':
    main()

Your current balance is: 510 MB

Usage

Store the complete script above as e.g. ~/tmp/mbudget.py
Change the Credentials in ~/tmp/mbudget.py
Run it:

python3 ~/tmp/mbudget.py

Your current balance is: 510 MB

Conclusion

Scripting our manual steps now allows us to store and process the returned data further. We could schedule it using a systemd timer or cron. Or send a notification by email or sms if a certain limit is reached.

And of course: No more clicking, calculating and repeating.

https://en.wikipedia.org/wiki/Cross-site%5Frequest%5Fforgery ↩︎

Motivation

How to understand this post

Script ingredients

Modules

Urls

Process overview

Visit the ISP’s login page

Manual

Automated

Login

Manual

Automated

Get the consumption values

Manual

Automated

Logout

Manual

Automated

Complete script

Usage

Conclusion