Revise README to reflect YouTube Video Classifier features and setup instructions

2025-10-13 00:32:25 +00:00 · 2025-07-12 03:05:55 +00:00
parent 89314f9c74
commit fa6007c1f3
1 changed files with 210 additions and 181 deletions
--- a/README.md
+++ b/README.md
@@ -1,185 +1,214 @@
-# AutoDelete YouTube Videos 🗑️🎬
+# YouTube Video Classifier

-This script automates the process of deleting videos from your YouTube "Watch Later" playlist using PyAutoGUI.
+An AI-powered tool that automatically classifies YouTube videos in your "Watch Later" playlist based on their titles and thumbnails using vision-language models through Ollama.
+
+## Features ✨
+
+- 🤖 **AI-Powered Classification**: Uses Ollama with Qwen2.5-VL and fallback models to analyze video titles and thumbnails
+- 🔄 **Robust LLM Integration**: Automatic fallback between models with increasing timeouts for reliability
+- 📊 **Comprehensive CSV Storage**: Saves detailed video information including classifications, metadata, and thumbnails
+- 🌐 **Multi-language Detection**: Automatically detects video language using AI
+- 🏷️ **Smart Tagging**: Generates detailed sub-tags for better content organization
+- 🎯 **Smart Categories**: Uses existing classifications or creates new ones automatically
+- 🖥️ **Browser Automation**: Selenium-based interaction with YouTube for reliable data extraction
+- 🎨 **Beautiful Logging**: Rich console output with colors and emojis for better UX
+- ⌨️ **Easy Control**: Press 'q' at any time to safely quit the process
+
+## Quick Start
+
+### Prerequisites
+- Python 3.11.10+
+- Ollama installed locally
+- Chrome or Chromium browser
+
+### Setup
+
+1. **Install Ollama**: Download from [https://ollama.ai](https://ollama.ai)
+
+2. **Pull Required Models**:
+   ```bash
+   ollama pull qwen2.5vl:7b
+   ollama pull gemma2:2b
+   ```
+
+3. **Start Ollama Service**:
+   ```bash
+   ollama serve
+   ```
+
+4. **Clone and Setup Project**:
+   ```bash
+   git clone <repository-url>
+   cd youtube-video-classifier
+   
+   # Create virtual environment
+   python -m venv venv
+   source venv/bin/activate  # On Windows: venv\Scripts\activate
+   
+   # Install dependencies
+   pip install -r requirements.txt
+   ```
+
+5. **Configure Settings** (optional):
+   Edit `config.ini` to customize your setup
+
+6. **Run the Classifier**:
+   ```bash
+   python script.py
+   ```
+
+## How It Works 🔄
+
+1. **Browser Initialization**: Opens Chrome/Chromium and navigates to your YouTube "Watch Later" playlist
+2. **Video Detection**: Finds and extracts information from playlist videos using Selenium
+3. **Data Extraction**: Captures video title, thumbnail, channel info, duration, and upload date
+4. **AI Analysis**: Uses Ollama models to:
+   - Classify the video into categories
+   - Detect the primary language
+   - Generate detailed sub-tags
+5. **Smart Fallback**: If primary model fails/times out, automatically switches to fallback model
+6. **Data Storage**: Saves all information to CSV with base64-encoded thumbnails
+7. **Playlist Management**: Removes processed videos from "Watch Later" playlist
+8. **Continuous Processing**: Continues until all videos are processed or user quits
+
+## Configuration
+
+The `config.ini` file allows you to customize various settings:
+
+```ini
+[DEFAULT]
+# Ollama settings
+ollama_host = http://localhost:11434
+ollama_model = qwen2.5vl:7b
+ollama_fallback_model = gemma2:2b
+
+# File paths
+classifications_csv = video_classifications.csv
+playlist_url = https://www.youtube.com/playlist?list=WL
+
+# LLM timeout settings (in seconds)
+llm_primary_timeout = 60
+llm_fallback_timeout = 60
+
+# Processing settings
+enable_delete = false
+enable_playlist_creation = false
+```
+
+## CSV Output Format 📋
+
+The script creates a comprehensive CSV file with the following columns:
+
+- `video_title`: Title of the video
+- `video_url`: YouTube URL of the video
+- `thumbnail_url`: Path to the saved thumbnail
+- `classification`: AI-generated category
+- `language`: Detected language of the video
+- `channel_name`: Name of the YouTube channel
+- `channel_link`: URL to the channel
+- `video_length_seconds`: Duration in seconds
+- `video_date`: Upload date
+- `detailed_subtags`: AI-generated specific tags
+- `playlist_name`: Source playlist name
+- `playlist_link`: Source playlist URL
+- `image_data`: Base64-encoded thumbnail data
+- `timestamp`: When the classification was made
+
+## File Structure 📁
+
+```
+├── script.py                 # Main classification script
+├── config.ini               # Configuration settings
+├── requirements.txt         # Python dependencies
+├── video_classifications.csv # Generated results (created when first run)
+└── README.md               # This file
+```
+
+## Features in Detail
+
+### AI Classification System
+- **Primary Model**: Qwen2.5-VL 7B for high-quality vision-language analysis
+- **Fallback Model**: Gemma2 2B for faster processing when primary model is slow
+- **Timeout Management**: Automatically increases timeout periods if models are struggling
+- **Continuous Retry**: Keeps trying until successful or user cancels
+
+### Data Extraction
+- **Video Metadata**: Title, URL, duration, upload date
+- **Channel Information**: Name and link to channel
+- **Thumbnail Capture**: Screenshots saved as base64 in CSV
+- **Playlist Context**: Source playlist name and URL
+
+### Browser Automation
+- **Multiple Chrome Paths**: Automatically finds Chrome/Chromium installation
+- **WebDriver Management**: Handles chromedriver setup and fallbacks
+- **Robust Selectors**: Multiple CSS selectors for reliable element finding
+- **Error Recovery**: Graceful handling of UI changes and loading delays
+
+### User Experience
+- **Rich Console Output**: Colored logging with emojis and status indicators
+- **Progress Tracking**: Clear indication of current processing status
+- **Safe Exit**: Press 'q' at any time to cleanly stop processing
+- **Error Reporting**: Detailed error messages for troubleshooting
+
+## Testing Your Setup
+
+Before running the main script, you can test individual components:
+
+1. **Test Ollama Connection**:
+   ```python
+   import requests
+   response = requests.get('http://localhost:11434/api/tags')
+   print(response.json())
+   ```
+
+2. **Test Browser Automation**:
+   Run the script and check if Chrome opens correctly
+
+3. **Test Model Response**:
+   The script will verify model availability on startup
+
+## Troubleshooting 🔧
+
+### Common Issues
+
+**Ollama Connection Error**:
+- Ensure Ollama is running: `ollama serve`
+- Check the host URL in config.ini
+- Verify models are installed: `ollama list`
+
+**Browser Issues**:
+- Install Chrome or Chromium
+- Update chromedriver if needed
+- Check if browser is in PATH
+
+**Model Timeout**:
+- The script automatically handles timeouts with fallback
+- Consider increasing timeout values in config.ini
+- Ensure sufficient system resources
+
+**Selenium Errors**:
+- YouTube may have changed their HTML structure
+- Check for browser updates
+- Verify you're logged into YouTube
+
+### Performance Tips
+
+- **For faster processing**: Use smaller models like `gemma2:2b` as primary
+- **For better accuracy**: Use larger models like `qwen2.5vl:7b` as primary
+- **For stability**: Keep both models installed for automatic fallback
+- **For large playlists**: Consider running in smaller batches
+
+## Contributing
+
+1. Fork the repository
+2. Create a feature branch
+3. Test your changes thoroughly
+4. Submit a pull request
+
+## License
+
+MIT License - see LICENSE file for details

 ---

-## Requirements 📦
-
- Python **3.11.10** 🐍
-
---
-
-## Setup & Usage (English) 🇬🇧
-
-### ⚙️ Requirements
-
- Python **3.11.10** 🐍
-
-### 1️⃣ Create a Virtual Environment
-
-**With `venv`:**
-```bash
-python3.11 -m venv venv
-```
-
-**With `virtualenv`:**
-```bash
-python3.11 -m pip install virtualenv
-python3.11 -m virtualenv venv
-```
-
-### 2️⃣ Activate the Virtual Environment
-
-**On Linux/macOS:**
-```bash
-source venv/bin/activate
-```
-**On Windows:**
-```bash
-venv\Scripts\activate
-```
-
-### 3️⃣ Install Dependencies
-
-```bash
-pip install -r requirements.txt
-```
-
-### 4️⃣ Run the Script 🚀
-
-```bash
-python script.py
-```
-
---
-
-## Features & Customization 🛠️
-
-### Features
-
- 🖼️ **Image Recognition:**  
-  Uses screenshots in the `img` folder to locate and interact with UI elements (browser icon, YouTube buttons, etc).
-
- 🔍 **Region-based Search:**  
-  The script splits the screen into left and right halves to speed up image searches.
-
- 🗂️ **Automatic Tab Handling:**  
-  Automatically opens/closes tabs and navigates to your "Watch Later" playlist.
-
- ⏱️ **Timing Controls:**  
-  You can adjust `sleep` and `duration` values in the code to match your PC's speed.
-
- 🛑 **Hotkey Listener:**  
-  Press `q` at any time to safely stop the script.
-
- 🖱️ **Easy Customization:**  
-  Change the images in the `img` folder or tweak the logic in functions like `locate_img` and `change_to_not_available` to adapt to UI changes or other platforms.
-
-### Customization
-
- 🖼️ **Browser Image:**  
-  Replace the browser icon image in the `img` folder with a screenshot of your browser's icon. Make sure the filename matches the value of the `browser_img` variable in `script.py` (e.g., `brave.png` for Brave browser).
-
- 🔗 **Playlist URL:**  
-  You can change the playlist URL by editing the value of the `playlist_url` variable at the top of the `script.py` file.
-
- ⏱️ **Adjust Timing:**  
-  The script uses `sleep` and `duration` values to wait for your PC to respond. You may need to increase or decrease these values depending on your computer's speed and internet connection.
-
- 📌 **Pin Your Browser:**  
-  For the script to work, your browser must be pinned to your taskbar.
-
---
-
-# AutoDelete YouTube Videos 🗑️🎬
-
-Este script automatiza el proceso de eliminar videos de tu lista de "Ver más tarde" en YouTube usando PyAutoGUI.
-
---
-
-## Requisitos 📦
-
- Python **3.11.10** 🐍
-
---
-
-## Configuración y Uso (Español) 🇪🇸
-
-### ⚙️ Requisitos
-
- Python **3.11.10** 🐍
-
-### 1️⃣ Crear un Entorno Virtual
-
-**Con `venv`:**
-```bash
-python3.11 -m venv venv
-```
-
-**Con `virtualenv`:**
-```bash
-python3.11 -m pip install virtualenv
-python3.11 -m virtualenv venv
-```
-
-### 2️⃣ Activar el Entorno Virtual
-
-**En Linux/macOS:**
-```bash
-source venv/bin/activate
-```
-**En Windows:**
-```bash
-venv\Scripts\activate
-```
-
-### 3️⃣ Instalar las Dependencias
-
-```bash
-pip install -r requirements.txt
-```
-
-### 4️⃣ Ejecutar el Script 🚀
-
-```bash
-python script.py
-```
-
---
-
-## Funcionalidades y Personalización 🛠️
-
-### Funcionalidades
-
- 🖼️ **Reconocimiento de Imágenes:**  
-  Usa capturas en la carpeta `img` para localizar e interactuar con elementos de la interfaz (icono del navegador, botones de YouTube, etc).
-
- 🔍 **Búsqueda por Regiones:**  
-  El script divide la pantalla en mitades izquierda y derecha para acelerar la búsqueda de imágenes.
-
- 🗂️ **Manejo Automático de Pestañas:**  
-  Abre/cierra pestañas y navega automáticamente a tu lista de "Ver más tarde".
-
- ⏱️ **Control de Tiempos:**  
-  Puedes ajustar los valores de `sleep` y `duration` en el código según la velocidad de tu PC.
-
- 🛑 **Escucha de Teclas:**  
-  Presiona `q` en cualquier momento para detener el script de forma segura.
-
- 🖱️ **Fácil Personalización:**  
-  Cambia las imágenes en la carpeta `img` o ajusta la lógica en funciones como `locate_img` y `change_to_not_available` para adaptarlo a cambios en la interfaz o a otras plataformas.
-
-### Personalización
-
- 🖼️ **Imagen del Navegador:**  
-  Reemplaza la imagen del icono de tu navegador en la carpeta `img` con una captura de pantalla del icono de tu navegador. Asegúrate de que el nombre del archivo coincida con el valor de la variable `browser_img` en `script.py` (por ejemplo, `brave.png` para el navegador Brave).
-
- 🔗 **URL de la Playlist:**  
-  Puedes cambiar la URL de la playlist que se usa modificando el valor de la variable `playlist_url` al inicio del archivo `script.py`.
-
- ⏱️ **Ajusta los Tiempos:**  
-  El script utiliza valores de `sleep` y `duration` para esperar la respuesta de tu PC. Puede que necesites aumentar o disminuir estos valores dependiendo de la velocidad de tu computadora y conexión a internet.
-
- 📌 **Ancla tu Navegador:**  
-  Para que el script funcione, es indispensable que tengas tu navegador anclado a tu barra de tareas.
+**Note**: This tool is for personal use and educational purposes. Please respect YouTube's Terms of Service and rate limits.