
# Lecture Video Analysis Platform  
  
This project provides a prototyp pipeline for analyzing lecture videos that contain both **slides** and **spoken narration**.    
The system extracts text from slides, processes it into embeddings, and stores structured information for downstream tasks such as semantic search or topic classification.  
  
---  
  
## 📌 Key Components  
  
- **Frontend**: Implemented in a very simple way by [Streamlit](https://streamlit.io/), just for user interface (for displaying processed lectures or uploading videos).  
- **Backend API**: Implemented with [FastAPI](https://fastapi.tiangolo.com/) to orchestrate processing and provide endpoints.  
- **PostgresSQL (App Database)**: Stores metadata and structured information about lectures, slides, and extracted text.   
- **PostgreSQL (Vector Database)**: Stores vector embeddings of extracted text for efficient similarity search and retrieval.  
  
---  
  
## 🔄 Processing Pipeline  
  
1. **Video Input**    
- A lecture video is provided as input.    
   - The system generates smart **screenshots of all slides**.  
  
2. **OCR (Optical Character Recognition)**    
- Each screenshot is processed with OCR to extract raw text from slides.  
  
3. **Text Classification**    
- Extracted text is categorized into predefined classes ["Information Technology", "Economics", "Healthcare", "History", "Education"]. Zero-shot classification with a Transformer model.  
  
4. **Text Summarization**    
- Extracted text is summarized by a Transformer model.  
  
5. **Audio Embedding Generation**  
 - Generating audio embeddings from the audio component of the video.  
  
6. **Text Embedding Generation**    
- Text is passed through an **embedding model** (e.g., sentence-transformers).    
   - Embedding vectors are generated to capture semantic meaning.  
  
7. **Database Storage**    
   - **PostgreSQL**: Stores metadata (lecture ID, slide number, OCR text, classification labels).    
   - **PostgreSQL (pgvector)**: Stores both **text and audio embedding vectors** for semantic search and similarity queries.  
  
```mermaid  
flowchart TD  
 A[🎥 Lecture Video] --> B[🖼️ Generate Slide Screenshots]
 B --> C[🔍 OCR: Extract Text]
 C --> D[🧩 Text Classification]
 C --> E[📰 Text Summarization]
 C --> F[🔢 Text Embedding Model]
 F --> G[(🗄️ PostgreSQL - Vectors/Embeddings)]
 A --> I[🔉 Audio Analysis]
 I --> H[🔢🎧Audio Embedding Model]  
 H --> G
 D --> J[(🗄️ PostgreSQL - Metadata)]
 E --> J[(🗄️ PostgreSQL - Metadata)]  
 ```  
---  
  
## 🚀 Future Extensions  
  
- Integration of **speech-to-text (ASR)** for lecturer’s spoken content.    
- Audio analysis - determining the speaker gender and spoken language  
- Advanced semantic search across both **slide text** and **spoken transcript**.    
- Frontend dashboard for querying and visualizing lecture analysis results.  
---  
  
## 🛠️ Tech Stack  
- **Frontend** Streamlit  
- **Backend**: Python FastAPI  
- **Databases**: PostgreSQL (with the `pgvector` vector extension)  
- **Machine Learning**: OCR (e.g., Tesseract), Embedding Models, SentenceTransformers  
- **Deployment**: Docker  
  
---  
  
## ✅ Requirements  
```python 
pip install -r requirements.txt  
```  
## DB structure  
  
```python  
from sqlalchemy.orm import relationship  
  
from sqlalchemy import Column, ForeignKey, Float, Integer, String, Text  
  
from utils.postgres_database import Base  
  
  
## connecting table for M:N relation  
class LectureTopic(Base):  
 __tablename__ = 'lecture_topics'  
 id = Column(Integer, primary_key=True, index=True, autoincrement=True)  
 lecture_id = Column(Integer, ForeignKey('lectures.id'))  
 topic_id = Column(Integer, ForeignKey('topics.id'))  
 topic_probability = Column(Float, default=0.0)  
  
 lecture = relationship("Lecture", back_populates="topics")  
 topic = relationship("Topic", back_populates="lectures")  
  
# Model for table lectures  
class Lecture(Base):  
 __tablename__ = 'lectures'  
 id = Column(Integer, primary_key=True, index=True, autoincrement=True) # indexable, it means increase the performance  
 text = Column(Text, nullable=False)  
 summary = Column(Text, nullable=False)  
 duration_seconds = Column(Integer, nullable=False)  
 lecturer = Column(String(20), nullable=True) # F or M or null, we can have video without a sound  
 voice_language = Column(String(20), nullable=True) # eng  
 slides_language = Column(String(20), nullable=False)  
  
 # 1:N relation on table "images" images = relationship('Image', back_populates='lecture')  
  
 # 1:N relation on table "vdu_data" visual_doc_understanding_data = relationship('VisualDocUnderstandingData', back_populates='lecture')  
  
 # 1:N relation on association table "lecture_topics" topics = relationship('LectureTopic', back_populates='lecture')  
  
  
# Model for table imagges  
class Image(Base):  
 __tablename__ = 'images'  
 id = Column(Integer, primary_key=True, index=True, autoincrement=True)  
 image_url = Column(String(512), nullable=False)  
 image_type = Column(String(256))  
 lecture_id = Column(Integer, ForeignKey("lectures.id"))  
 lecture = relationship('Lecture', back_populates='images')  
  
# Model for table visual_doc_understanding_data  
class VisualDocUnderstandingData(Base):  
 __tablename__ = "visual_doc_understanding_data"  
 id = Column(Integer, primary_key=True, index=True, autoincrement=True)  
 vdu_output = Column(Text, nullable=True)  
 vdu_model = Column(String(128), nullable=True)  
 lecture_id = Column(Integer, ForeignKey("lectures.id"))  
 lecture = relationship('Lecture', back_populates='visual_doc_understanding_data')  
  
  
# Model for table topics  
class Topic(Base):  
 __tablename__ = 'topics'  
 id = Column(Integer, primary_key=True, index=True, autoincrement=True)  
 topic_category = Column(String(256), nullable=False, unique=True)  
 lectures = relationship('LectureTopic', back_populates='topic')  
  
```  
  
# Installation guide  
This project consists of a **FastAPI** backend and a **Postgres** database with the **pgvector** extension.    
It uses two separate databases:  
- `lectures_app_db` – for relational data (primarilly lectures data and metadata),  
- `lectures_vector_db;` – for vector data (lecture embeddings).  
## 🚀 How to Run the Project  
  
### 1. Clone the repository  
```bash  
git clone <url>cd <repo>
```  
  
### 2. Create an .env file  
The .env file stores sensitive credentials (user, password).  
⚠️ Do not commit this file into Git. 
  
.env  
```

BACKEND_URL=http://backend:8000  
  
DB_TYPE=postgresql  
DB_HOST=db  
DB_PORT=5432  
DB_NAME=lectures_app_db  
DB_USER=postgres  
DB_PASSWORD=<your_password>  
  
VECTOR_DB_NAME=lectures_vector_db  
```  
  
### 3. Start with Docker Compose  
```bash  
docker compose up --build
```  
The backend will run at: http://localhost:8000  
  
Postgres will be exposed on port 5432.  
  
The frontend will run at http://localhost:8501  
  
### 4. Database initialization  
On the very first run:  
  
the databases `lectures_app_db` and `lectures_vector_db` will be created,  
the pgvector extension will be enabled,  
all neccessary tables will be created.  
  
This is handled by the `init.sql` file.  
  
## 📂 Project Structure  
```bash  
project-root/  
├─ docs/  
├─ resources/  
├─ src/ # FastAPI backend application  
│ ├─ app.py  
│ └─ config/  
 ...├─ streamlit-frontend/  
 ├─ Dockerfile # Docker image for frontend ├─ frontend.py └─ requirements.txt├─ init.sql # SQL initialization of databases and tables  
├─ Dockerfile # Docker image for backend  
├─ docker-compose.yml # orchestrates backend + DB  
├─ requirements.txt  
└─ .env # credentials (not tracked by Git)  
```  
## Architecture  
```mermaid  
flowchart LR
    subgraph Client["Client (browser / API client)"]
        A[HTTP Request]
    end

    subgraph Backend["FastAPI Backend (Uvicorn)"]
        B[REST API Endpoints]
    end

    subgraph Database["Postgres + pgvector"]
        C1[(lectures_app_db)]
        C2[(lectures_vector_db)]
    end

    A --> B
    B --> C1
    B --> C2

    
```  
## 🛠️ Useful Commands  
Stop and remove containers  
```bash  
docker compose down
```  
Stop and remove containers and the database volume  
(use this if you want to reinitialize the DB from init.sql)  
```bash  
docker compose down -vConnect to the database
```  
```bash  
docker exec -it postgres_db psql -U $POSTGRES_USER -d relational_db
```  
  
## 🔒 Security Notes  
Credentials (DB_USER, DB_PASSWORD) are stored in .env or injected via CI/CD environment variables.  
  
The .env file must not be committed to the repository.  
  
# Postgres vector database  
Examples of some sql with vector db.  
  
```python
import psycopg2  
  
conn = psycopg2.connect("dbname=mydb user=myuser password=mypassword host=localhost")  
cursor = conn.cursor()  
  
insert_query = """  
INSERT INTO lecture_repre (title, content, embedding, audio_embedding)  
VALUES (%s, %s, %s, %s)  
"""  
cursor.execute(insert_query, ("Lecture Title", lecture_text, embedding.tolist(), audio_embedding.tolist()))  
conn.commit()  
cursor.close()  
conn.close()  
```  
  
Searching:  
```  
SELECT id, title, content  
FROM lectures  
ORDER BY embedding <-> '[0.1, 0.2, ..., 0.768]' -- a new vector that is compared, <=> means cosine  
LIMIT 5;  
```  
  
=======  
  
## DB structure  
  
```python  
from sqlalchemy.orm import relationship  
  
from sqlalchemy import Column, ForeignKey, Float, Integer, String, Text  
  
from utils.postgres_database import Base  
  
  
## connecting table for M:N relation  
class LectureTopic(Base):  
 __tablename__ = 'lecture_topics'  
 id = Column(Integer, primary_key=True, index=True, autoincrement=True)  
 lecture_id = Column(Integer, ForeignKey('lectures.id'))  
 topic_id = Column(Integer, ForeignKey('topics.id'))  
 topic_probability = Column(Float, default=0.0)  
  
 lecture = relationship("Lecture", back_populates="topics")  
 topic = relationship("Topic", back_populates="lectures")  
  
# Model for table lectures  
class Lecture(Base):  
 __tablename__ = 'lectures'  
 id = Column(Integer, primary_key=True, index=True, autoincrement=True) # indexable, it means increase the performance  
 text = Column(Text, nullable=False)  
 summary = Column(Text, nullable=False)  
 duration_seconds = Column(Integer, nullable=False)  
 lecturer = Column(String(20), nullable=True) # F or M or null, we can have video without a sound  
 voice_language = Column(String(20), nullable=True) # eng  
 slides_language = Column(String(20), nullable=False)  
  
 # 1:N relation on table "images" images = relationship('Image', back_populates='lecture')  
  
 # 1:N relation on table "vdu_data" visual_doc_understanding_data = relationship('VisualDocUnderstandingData', back_populates='lecture')  
  
 # 1:N relation on association table "lecture_topics" topics = relationship('LectureTopic', back_populates='lecture')  
  
  
# Model for table imagges  
class Image(Base):  
 __tablename__ = 'images'  
 id = Column(Integer, primary_key=True, index=True, autoincrement=True)  
 image_url = Column(String(512), nullable=False)  
 image_type = Column(String(256))  
 lecture_id = Column(Integer, ForeignKey("lectures.id"))  
 lecture = relationship('Lecture', back_populates='images')  
  
# Model for table visual_doc_understanding_data  
class VisualDocUnderstandingData(Base):  
 __tablename__ = "visual_doc_understanding_data"  
 id = Column(Integer, primary_key=True, index=True, autoincrement=True)  
 vdu_output = Column(Text, nullable=True)  
 vdu_model = Column(String(128), nullable=True)  
 lecture_id = Column(Integer, ForeignKey("lectures.id"))  
 lecture = relationship('Lecture', back_populates='visual_doc_understanding_data')  
  
  
# Model for table topics  
class Topic(Base):  
 __tablename__ = 'topics'  
 id = Column(Integer, primary_key=True, index=True, autoincrement=True)  
 topic_category = Column(String(256), nullable=False, unique=True)  
 lectures = relationship('LectureTopic', back_populates='topic')  

```  
