Scraping Migros M-Budget with python
Motivation
I recently caught myself repeatedly checking my internet consumption balance.
A perfect case for a script and an accompanying tutorial.
For the sake of this tutorial’s scope and the script’s brevity I’ll skip various code paradigms like exception handling, testing, continuous integration, virtual environments etc.
In this particular case the concerning ISP (Internet Service Provider) is Migros, their mobile brand is M-Budget and the actual provider is Swisscom.
How to understand this post
Each section contains a Manual and Automated section. Manual shall depict how the obstacle is taken by hand such as typing and clicking the mouse. Automated shows how we’re doing it with a code snippet.
The Complete script contains all snippets merged and ready for copy/paste as described in Usage
Script ingredients
Modules
import requests
from lxml import html
from lxml import etree
Urls
Our working URLs are built upon one URL which serves as base URL in order to build up all endpoints.
BASE_URL = 'https://selfcare.m-budget.migros.ch/eCare'
LOGIN_URL = BASE_URL+'/de/users/sign_in'
LOGOUT_URL = BASE_URL+'/de/users/sign_out'
CONSUMPTION_URL = BASE_URL+'/prepaid/de/my_consumption/index'
Process overview
Visit the ISP’s login page
Manual
Launch your browser, open login and enable the browser’s developer console by pressing F12 on your keyboard.
Automated
|
|
Login
Manual
Enter your username and password and hit return.
After submitting your credentials, deliberate what you just submitted.
Superficially, our login seems to contain only our credentials, thus:
- Username (here it’s a phone number beginning with 077)
- Password
However, the developer console (F12 on your keyboard) and it’s Network
tab reveal more.
The data is sent as POST request, therefore we look for POST methods and therein for the Form data (on the bottom right in figure 1) Alongside the mentioned credentials, a couple of hidden data was submitted as well.
In particular the authenticity_token
is interesting. It’s part of a
security remedy against a malicious hack. Also known asĀ 1 Cross-site
request-forgery or in abbreviated form csrf
.
But where does that cryptic csrf token value come from?
-
Finding the csrf token
Remember the login page? https://selfcare.m-budget.migros.ch/eCare/de/users/sign%5Fin
Logout, visit it again and look at the source code (Mouse Right-Click: View page source).
<meta name="csrf-token" content="mX12K..." />
Automated
-
Fetch csrf token
Now we know where to seek what element:
- Where: In the html source text at line 1
- What: A
meta
tag namedcsrf-token
contains the token as it’scontent
. At line 2
1 2
tree = html.fromstring(result.text) authenticity_token = tree.xpath("//meta[@name='csrf-token']/@content")[0]
-
Prepare our form
We pass the csrf token along with the remaining data as dictionary.
Compare the similarity to the form in figure 1:
1 2 3 4 5 6 7 8
data = { 'utf8': '\u2713', 'authenticity_token': authenticity_token, 'user[id]': USERNAME, 'user[password]': PASSWORD, 'user[reseller]': '33', 'button': '' }
-
Sign-in
We’re all set and can finally sign-in passing our form
1 2 3
result = session_requests.post(LOGIN_URL, data=data, headers=dict(referer=LOGIN_URL))
Get the consumption values
Manual
There are multiple ways of finding the credit values
- By searching for a value (here 500) in the browsers developer console as shown below in figure 2
- By mouse hovering over the element (in the browser, not the dev console) and right-click
Inspect element (q)
We’re interested in all credits (10,0,500) in order to sum them up as our balance 510.
Automated
Here’s how we’re scripting the balance part starting at line 3:
First we set our focus on the parenting div
in the class
named circle__inside
in order to extract the text
from it’s first [1]
span
element.
Then we loop through each child (circles) and append each circle’s credit value to our balance list. Finally we print the balance.
|
|
Logout
We’re done and therefore properly logout.
Manual
- Click on the profile icon in the upper right corner
- Click sign-out (Abmelden)
Automated
|
|
Complete script
|
|
Your current balance is: 510 MB
Usage
- Store the complete script above as e.g.
~/tmp/mbudget.py
- Change the Credentials in
~/tmp/mbudget.py
- Run it:
python3 ~/tmp/mbudget.py
Your current balance is: 510 MB
Conclusion
Scripting our manual steps now allows us to store and process the returned data further. We could schedule it using a systemd timer or cron. Or send a notification by email or sms if a certain limit is reached.
And of course: No more clicking, calculating and repeating.