Text Summarization

Python NLP Transformers Abstractive Summarization Extractive Summarization

Project Overview

This GenAI Text Summarization project showcases both abstractive and extractive summarization techniques using cutting-edge transformer models. Built with Python and Hugging Face Transformers, it simplifies lengthy documents into concise summaries ideal for news articles, legal texts, academic papers, and content previews. The project integrates BART for natural language generation and MiniLM for semantic sentence extraction using cosine similarity.

Key Insights

Dual summarization approach allows flexibility between abstractive and extractive outputs based on user needs.
BART's generative model produces fluent and human-like summaries, ideal for content creation and compression.
MiniLM-based extractive summarization ensures semantic relevance by selecting top-ranking sentences via cosine similarity.
Supports document-based summarization from PDFs and .txt files, enhancing usability for professionals.

Technical Implementation

Model Architecture:
- Implemented abstractive summarization using facebook/bart-large-cnn from Hugging Face Transformers.
- Used sentence-transformers/all-MiniLM-L6-v2 for extractive summarization via sentence embeddings and cosine similarity.
Natural Language Processing (NLP):
- Used NLTK for text tokenization and preprocessing.
- Employed Scikit-learn for calculating cosine similarity to rank sentence relevance.
Interface & Deployment:
- Integrated with Streamlit and Flask for interactive summarization through a web interface.
- Enabled support for PDF and .txt file uploads to allow document-based summarization.

Video Preview

Key Learnings

Abstractive models like BART can generate natural, human-like summaries by understanding contextual semantics.
Extractive summarization using MiniLM and cosine similarity is effective for shorter texts and preserves factual accuracy.
Transformer-based models require careful handling of input lengths due to token limitations.
Sentence embeddings are crucial for semantic similarity-based extraction and ranking.

View Live GitHub

Text Summarization Using Transformers

Project Overview

Key Insights

Technical Implementation

Video Preview

Key Learnings