Alberta In-situ Oilsands Production Analysis using Python - Part I

Data extraction from AER website

python
web scraping
Author

Farshad Tabasinejad

Published

February 26, 2023

Disclaimer

The information provided in this blog post is for educational purposes only. Commercial use of this information is prohibited. For information on the copyright and permission to reproduce, please visit the AER Copyright and Disclaimer page at AER Copyright and Disclaimer.

The author assumes no responsibility for any damage or loss resulting from the use of the information provided in this blog post.

Introduction

The Alberta Energy Regulator (AER) provides monthly injection and production data for all in-situ oilsands projects in Alberta on its website at Alberta Energy Regulator. The data is available in several spreadsheets and can be downloaded from the ST53: Alberta In Situ Oil Sands Production Summary. The annual data is reported in separate files in .xls format. Each file contains six different sheets with information about bitumen, water, steam, wells, SOR, and WSR.

Downloading the data

To download the data using Python, the requests and BeautifulSoup libraries are used. The following code downloads all the .xls files from the AER website and saves them in the current directory:

import requests
from bs4 import BeautifulSoup
# get the html
url = 'https://www.aer.ca/providing-information/data-and-reports/statistical-reports/st53'
r = requests.get(url)
data = r.text
# parse the html
soup = BeautifulSoup(data, 'html.parser')
# find all the hyperlinks
links = soup.findAll('a', href = True) 
# get the list of xls files only
xls_links = [link for link in links if link.get('href').endswith('xls')]
# download the xls files and save them in the current directory
for link in xls_links:
    xls_url = link.get('href')
    xls_name = xls_url.split('/')[-1]
    xls = requests.get(xls_url)
    with open(xls_name, 'wb') as f:
        f.write(xls.content)

Listing all the saved xls files

import os
# filter only .xls files
xls_files = [file for file in os.listdir(os.getcwd()) if file.endswith('.xls')]
xls_files
['ST53_2010-12.xls',
 'ST53_2011-12.xls',
 'ST53_2012-12.xls',
 'ST53_2013-12.xls',
 'ST53_2014-12.xls',
 'ST53_2016-12.xls',
 'ST53_2017-12.xls',
 'ST53_2018-12.xls',
 'ST53_2019-12.xls',
 'ST53_2020-12.xls',
 'ST53_2021-12.xls',
 'ST53_2022-12.xls',
 'ST53_Current.xls']

In the next post, I will show how to read the data from the .xls files and prepare the datasets for further analysis. This analysis will provide insights into the production trends and patterns of in-situ oilsands projects in Alberta.