Descriptive Alt Text

Laptop Data Cleaning and Transformation Using SQL

Overview
This project demonstrates advanced SQL techniques for cleaning and transforming raw data into a structured format suitable for analysis. The dataset contains information about laptops, including specifications like RAM, weight, and price.

Objective
The Laptop Data Cleaning and Transformation Using SQL project focuses on preparing raw laptop specifications data for analysis. The dataset contained several inconsistencies, such as units within numeric fields (e.g., 'GB', 'kg') and missing values. This project involved creating a clean and standardized version of the data using advanced SQL techniques. By addressing these challenges, the dataset was transformed into a reliable and structured format suitable for further exploratory analysis and insights generation.

Key Findings
1. Data Cleaning :
  • Removed inconsistent units (e.g., 'GB', 'kg') for numeric fields.
  • Converted data types to enhance usability (e.g., RAM to INTEGER, screen size to DECIMAL).
2. Data Transformation :
  • Used Common Table Expressions (CTEs) for efficient updates.
  • Standardized and normalized key attributes.
3. Exploratory Analysis :
  • Identified missing or null values.
  • Assessed dataset completeness.

4. Scalability :
  • Created a new table structure compatible with the original data.
  • Ensured data integrity through robust SQL operations.

SQL Code Snippet


                    WITH updated_ram AS (
                        SELECT `Unnamed: 0`, REPLACE(Ram, 'GB', '') AS new_Ram
                        FROM laptop_backup
                    )
                    UPDATE laptop_backup l1
                    JOIN updated_ram t
                    ON l1.`Unnamed: 0` = t.`Unnamed: 0`
                    SET l1.Ram = t.new_Ram;
                    
                    ALTER TABLE laptop_backup MODIFY COLUMN Ram INTEGER;
                            

Conclusion
The project successfully demonstrates the importance of data cleaning and transformation in the data analysis pipeline. By leveraging SQL, the raw laptop dataset was transformed into a well-structured format with consistent data types and no missing values.
Dataset link(Uncleaned) :
Click here to access Dataset
Code link :
Click here to access the code