Link: https://youtu.be/1Rlz3VlKcsk
This repository contains code to build a web application to search Transcriptions using Weaviate.. The main intent is to provide users with an interface for offline (or online if deployed) vector database that they can use to perform searches independently. The backend uses Django for an API, another dockerized API for sentence-embedding to vectorize text, as well ass code to allow for the downloading of youtube videos and subsequent transcription (found in a google colab for faster speed).
Inspiration for this comes from: https://www.hubermantranscripts.com/ where many aspects are similar however I bring my own spin on reverse engineering how this is built so that other people (with development backgrounds) can spin this up. At it's foundation, this application acts as a database for users to easily search through content. For my use case and demonstration, I target the user being a student who wants to quickly identify transcribed lectures.
This system design is based off of what has been done and what needs to be done. It includes features and how things talk to each other.
Use this google colab: https://colab.research.google.com/drive/14b3hoXzVUB1BwA1PLi59hpPzcPQin1l4?usp=sharing to download and transcribe videos from youtube which you can then use to populate the database with.