Portfolio Details

SMS/Email Spam Classifier (NLP Project)

Overview

Spam messages pose a significant challenge for communication platforms, leading to wasted time and potential security risks. This project tackles this issue by leveraging Natural Language Processing (NLP) and Machine Learning to classify SMS and email messages as Spam or Not Spam. The system preprocesses text data, extracts meaningful features, and uses a trained machine learning model to provide real-time classifications through an intuitive web interface. This project highlights my expertise in data preprocessing, feature engineering, and building end-to-end NLP applications with a focus on solving real-world problems.

Project Description

This application employs NLP techniques to clean and analyze text data and uses a Logistic Regression model for classification. It processes raw user input (SMS or email) to make predictions with high accuracy, offering an efficient tool for spam detection. The simple yet powerful web interface makes it accessible to a wide range of users.

Key Features

Real-time message classification : Enter an SMS or email, and the system instantly classifies it as Spam or Not Spam.
NLP preprocessing pipeline : Text data undergoes cleaning, tokenization, stemming, and vectorization for better model performance.
Interactive UI : The user-friendly interface built using Streamlit makes the tool accessible to non-technical users.

Data Analysis and NLP Workflow

1 . Data Collection and Cleaning :

Labeled dataset with categories: Spam and Not Spam.
Exploratory analysis to understand the distribution and frequency of spam messages.

2 . Preprocessing Pipeline :

Text is converted to lowercase.
Tokenized into words using NLTK.
Stopwords and punctuation are removed.
Words are reduced to their base form using Porter Stemmer.

3 . Feature Engineering :

TF-IDF Vectorization converts text into meaningful numerical features.

4 . Model Development :

Logistic Regression was trained on the processed data, achieving an accuracy of 95%.
Evaluated using metrics like accuracy and confusion matrix.

Conclusion

The SMS/Email Spam Classifier project is an excellent example of how NLP and machine learning can be combined to solve real-world challenges. By building this project, I gained hands-on experience in text preprocessing, feature extraction, and building an interactive application for end users. This project not only showcases my skills in data analysis and machine learning but also my ability to create user-friendly, impactful applications. Future enhancements may include deploying the application to cloud platforms and incorporating deep learning models for improved performance.

Code link :

Click here to access the code