Guides
Drop all duplicate rows across multiple columns in Python Pandas
If you're working with data in Python Pandas, you may find yourself needing to drop duplicate rows across multiple columns. This can be a tricky task, but luckily there are a few different ways to go about it. In this blog post, we'll explore how to drop all duplicate rows and how to drop duplicate rows across multiple columns in Python Pandas.
Drop duplicates method
The first and the easiest way to remove duplicate rows in your Pandas Dataframe is to use the drop_duplicates()
method.
Pandas drop_duplicates() method returns Dataframe with duplicate rows removed.
Syntax:
DataFrame.drop_duplicates(subset=None, *, keep='first', inplace=False, ignore_index=False)
By default, this method removes duplicate rows based on all columns. So if you simply need to delete all duplicate rows from the Dataframe you can use df.drop_duplicates()
without specifying parameters. Please note that if you want to modify Dataframe inplace you will need to specify inplace parameterdf.drop_duplicates(inplace=True)
.
Example: Delete all duplicate rows from the Dataframe
Let's create a Dataframe with some duplicate rows.
In this Dataframe row 0 is a duplicate of row 5, and row 2 is a duplicate of row 6. So when you use the df.drop_duplicates()
method, those duplicates will be removed.
However if you need to drop duplicate rows across specific column(s), use subset
.
Your subset can be one column if you want to identify duplicates in this one specific column and keep first ocurance.
df.drop_duplicates(subset=['Your Column'])
Or it can contain multiple columns that you want to use to identify duplicates.
df.drop_duplicates(subset=['Your Column 1', 'Your Column 2' ])
drop_duplicates() method with subset columns for removing duplicates
Groupby() method
The second way to drop duplicate rows across multiple columns is to use the df.groupby()
method.
Lets have a look at the Pandas Dataframe which contains duplicates values according to two columns (A and B) and where you want to remove duplicates keeping the row with max value in column C. This can be achieved by using groupby method.
Groupby method -to remove duplicate rows
Identify duplicate rows with df.duplicated()
You may also find useful to know about about df.duplicated()
which returns boolean series denoting duplicate rows.
Use df.duplicated()
to identify duplicate rows
By default, for each set of duplicated values, the first occurrence is set on False and all others on True.
By using one of these methods, you can easily drop all duplicate rows across multiple columns in Python Pandas. With a few lines of code, you can quickly and easily clean up your data and make sure that it is accurate and up-to-date.
Continue Reading
Apps
Timestripe - my new favourite productivity app
March 5, 2023
Guides
How to scrape tables from websites using Pandas read_html() function
February 2, 2023
Guides
How to create effective prompts for AI image generation
August 15, 2022
Guides
Generate Huge Datasets With Fake Data Easily and Quickly using Python and Faker
April 16, 2022
Guides
How to change or update a specific cell in Python Pandas Dataframe
March 25, 2021
Guides
How to add a row at the top in Pandas dataframe
March 22, 2021
Guides
Creating WordClouds in Python from a single-column in Pandas dataframe
November 15, 2020
Guides
Python Regex examples - How to use Regex with Pandas
September 9, 2020
Guides
Python regular expressions (RegEx) simple yet complete guide for beginners
September 15, 2020
Guides
8 Python Pandas Value_counts() tricks that make your work more efficient
May 31, 2020
Guides
Exploring Correlation in Python: Pandas, SciPy
May 5, 2020
Guides
How to add new columns to Pandas dataframe?
March 22, 2020
Guides
Delete column/row from a Pandas dataframe using .drop() method
February 2, 2020
Guides
How to visualize data with Matplotlib from a Pandas Dataframe
November 15, 2019
Guides
The ultimate beginners guide to Group by in Python Pandas
August 8, 2019
Guides
Guide to renaming columns with Python Pandas
July 2, 2019
Guides
How to suppress scientific notation in Pandas
July 12, 2019
Guides
The complete beginners guide to Pandas
June 29, 2019
Guides
Data project #1: Stockmarket analysis
June 29, 2019
Guides
Use Jupyter notebooks anywhere
June 10, 2019