This repository contains the code for the Skol Network website, accessible at skolnetwork.com. The website is hosted in an AWS S3 bucket and distributed globally using AWS CloudFront for faster loading times. GitHub Actions are used to manage python web scraping, deployment, and cache invalidation processes.
The Skol Network website is dedicated to providing statistical data, news, articles, and updates on the Minnesota Vikings and NFC North teams. It features sections such as:
The frontend is built using HTML and Tailwind CSS, ensuring a responsive and clean user experience.
Data for the website is automatically updated by scraping content from relevant sports websites using Selenium and Pandas. The scraping scripts are run as part of a scheduled GitHub Action. The scraped data is stored as JSON files, which are then used to dynamically populate the website content.
scrape.py
and Lscrape.py
(scripts preceeded with L signify it being built to run locally, not from the repo), with tables extracted by ID and saved to JSON files. For example:NFC
and AFC
divisions are saved as NFC.json
and AFC.json
respectively.The scraped JSON files are further processed and cleaned to ensure data quality and consistency:
clean.py
and Lclean.py
, which apply transformations like removing injured reserve players from the injury report or aggregating division data.The deployment of the Skol Network website is automated using GitHub Actions:
To run the scraping scripts locally or modify the setup, you'll need the following dependencies (specified in requirements.txt
):
requests
beautifulsoup4
selenium
pandas
lxml
json
Make sure to install these dependencies using:
pip install -r requirements.txt
To run the scraping script manually:
python scrape.py
The script will generate JSON files in the scripts/scraped_data
directory, which are then used by the frontend.