Queensland Fuel Price

Author

Nana, Jia Vei Yar, Yawen

Published

July 9, 2025

Introduction

This Report will focus on Fuel price by Fuel types, Suburbs and change by months

  • Modules that used: pandas, seaborn
  • Dataset of Queensland Fuel Price (Fuel prices by the pump in Queensland)
  • Data from Data from the Queensland Government’s Open Data Portal
  • Investigation includes Fuel price corolation by types, date and suburbs

Average Fuel Prices by Fuel Types

Show code
# Set up code
import pandas as pd
df=pd.read_csv("../../data_sources/qld_fuel.csv")
import seaborn as sns
fp=df["Price"]
ft=df["Fuel_Type"]
import matplotlib.pyplot as plt
sns.barplot(data=df, x=ft, y=fp, errorbar=None, width=0.6, palette="muted")

plt.xticks(rotation=45, ha='right')

plt.title("Average Fuel Prices by Type in QLD")
plt.xlabel("Fuel Type")
plt.ylabel("Price ($/L)")

plt.tight_layout()
plt.show()

Fuel Price by Suburbs

Show code
df["TransactionDateutc"] = pd.to_datetime(df["TransactionDateutc"])
df["Price"] = df["Price"] / 100 

# removing outliers 
df = df[df["Price"] < 50] 
df = df[df["Price"] > 10] 

# summary stats 
summary_stats = df.groupby("Site_Suburb")["Price"].agg(min_price = "min", max_price = "max")

summary_stats["price_range"] = summary_stats["max_price"] - summary_stats["min_price"]

# Top 3 and bottom 3 suburbs 

top3_change = summary_stats.sort_values(by="price_range", ascending= False). head(3)
bottom3_change = summary_stats.sort_values(by="price_range", ascending= True). head(3)

selected_suburbs = top3_change.index.tolist() + bottom3_change.index.tolist()
df_selected = df[df["Site_Suburb"].isin(selected_suburbs)]

df_selected["Site_Suburb"] = df_selected["Site_Suburb"].str.strip().str.title()

# Plot
sns.catplot(data = df_selected, x = "Site_Suburb", y = "Price", kind = "box")
plt.xlabel ("Site_Suburb")
plt.ylabel ("Price")
plt.savefig ("tb.png")
plt.show()

Price by Time

Show code
df.describe()

# remove nulls
df[df["X_id"].notna()]
df_full_X_id = df[df["X_id"].notna()]
df_full_X_id.shape


df["Site_Suburb"].count()
df["Site_Suburb"].unique()
df["Site_Suburb"].value_counts()

df["Fuel_Type"].count()
df["Fuel_Type"].unique()
df["Fuel_Type"].value_counts()


df["TransactionDateutc"] = pd.to_datetime(df["TransactionDateutc"])
df["date_only"] = df["TransactionDateutc"].dt.date




gb = df.groupby("date_only")
avg_price_by_date = gb["Price"].agg("mean").reset_index()
avg_price_by_date.to_csv("data/avg_price_by_date.csv")  # export subset

df["TransactionDateutc"].dtypes

df["year_month"] = df["TransactionDateutc"].dt.to_period("M")
gb_year_month = df.groupby("year_month")
avg_price_by_year_month = gb_year_month["Price"].agg("mean").reset_index()
avg_price_by_year_month.to_csv("data/avg_price_by_year_month.csv")
import seaborn as sns
import scipy.stats as stats
import matplotlib.pyplot as plt



type("year_month")
type("date_only")

avg_price_by_year_month["year_month"] = avg_price_by_year_month["year_month"].dt.to_timestamp()
sns.relplot(data = avg_price_by_date, x = "date_only", y = "Price", hue = "Price")
sns.lineplot(data =avg_price_by_year_month, x = "year_month", y = "Price", color = "red")