Using autoencoders to compress video frames efficiently with minimal quality loss
This project demonstrates how neural networks can be used for video compression tasks. By training an autoencoder model to compress video frames into a low-dimensional latent space and reconstruct them with minimal loss, we can achieve competitive compression results compared to traditional codecs like H.264.
The project showcases an important application of machine learning in multimedia processing, with potential applications in video streaming, storage, and transmission for mobile devices.
The project is implemented in four stages, each building on the previous:
Extracting individual frames from source videos using OpenCV to prepare data for neural compression.
# Example frame extraction
frames = extract_frames("input_video.mp4", "extracted_frames", interval=1)
Building and training a neural network with encoder, quantizer, and decoder components.
# Autoencoder architecture
model = VideoAutoencoder(latent_dim=64, num_bits=8)
# Train model
model = train_autoencoder(model, dataloader, num_epochs=5)
Comparing our neural compression against traditional H.264 using objective quality metrics.
# Evaluate compression methods
autoencoder_results = evaluate_autoencoder(model, dataloader, device)
h264_results = evaluate_h264(frames_dir, crf=23)
Generating comprehensive visual reports to analyze compression performance.
# Create visualizations
visualizer = VideoComparisonVisualizer(original_frames, ae_frames, h264_frames)
visualizer.generate_summary_report()
The quantizer module simulates real-world bit constraints by reducing the precision of the latent representation. It uses a straight-through estimator technique to allow gradient flow during training, despite the non-differentiable quantization operation.
Side-by-side comparison of original (left), autoencoder (middle), and H.264 (right) compressed frames
Metric | Autoencoder | H.264 |
---|---|---|
Average PSNR | 21.71 dB | 23.52 dB |
Average SSIM | 0.5373 | 0.8966 |
Min PSNR | 19.58 dB | 18.63 dB |
Min SSIM | 0.4353 | 0.8451 |
Max PSNR | 24.21 dB | 26.69 dB |
Max SSIM | 0.6296 | 0.9264 |
Compression Ratio | 48:1 | 24.47:1 |
PSNR and SSIM metrics across video frames
Distribution of PSNR and SSIM values for both compression methods
To run this project yourself, follow these steps:
# 1. Clone the repository
git clone https://github.com/NiharP31/ML_ViC.git
cd ml-video-compression
# 2. Install dependencies
pip install -r requirements.txt
# 3. Run the complete pipeline
python extract_frames.py --video input_video.mp4 --output extracted_frames
python frame_autoencoder.py --frames extracted_frames --epochs 10
python compression_evaluation.py
python results_visualization.py
The visualization results will be saved in the visualization_results directory, including comparison images, metrics reports, and the side-by-side video.