Guides
How to add new columns to Pandas dataframe?
In this article, I will use examples to show you how to add columns to a dataframe in Pandas. There is more than one way of adding columns to a Pandas dataframe, let’s review the main approaches.
Create a Dataframe
As usual let's start by creating a dataframe.
Create a simple dataframe with a dictionary of lists, and column names: name, age, city, country.
# Creating simple dataframe # List of Tuples students = [ ('Jack', 34, 'Sydeny' , 'Australia') , ('Riti', 30, 'Delhi' , 'India' ) , ('Tom', 31, 'Mumbai' , 'India' ) , ('Neelu', 32, 'Bangalore' , 'India' ) , ('John', 16, 'New York' , 'US') , ('Mike', 17, 'las vegas' , 'US') ] #Create a DataFrame object df = pd.DataFrame(students, columns = ['Name' , 'Age', 'City' , 'Country'], index=['a', 'b', 'c' , 'd' , 'e' , 'f'])
I. Add a column to Pandas Dataframe with a default value
When trying to set the entire column of a dataframe to a specific value, use one of the four methods shown below.
By declaring a new
list
as a columnloc
.assign()
.insert()
Method I.1: By declaring a new list as a column
df['New_Column']='value'
will add the new column and set all rows to that value.
In this example, we will create a dataframe df
and add a new column with the name Course
to it.
Your Dataframe before we add a new column:
# Method 1: By declaring a new list as a column df['Course'] = 'Computer science' df
Your Dataframe after adding a new column:
Some of you may get the following warning -
"A value is trying to be set on a copy of a slice from a DataFrame"
.
This error is usually a result of creating a slice of the original dataframe before declaring your new column. To avoid the error add your new column to the original dataframe and then create the slice:
.loc[row_indexer,col_indexer] = value instead
.
Python can do unexpected things when new objects are defined from existing ones. A slice of dataframe is just a stand-in for the rows stored in the original dataframe object: a new object is not created in memory.
To avoid these issues altogether use the copy
or deepcopy
module, which explicitly forces objects to be copied in memory so that methods called on the new objects are not applied to the source object.
Or you can use the .loc[]
method as suggested by Pandas error message.
For more information, see the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html.
Method I.2: Using .loc[]
The pandas.DataFrame.loc allows to access a group of rows and columns by label(s) or a boolean array.
.loc[]
is primarily label based, but may also be used with a boolean array.
Allowed inputs are:
A single label, e.g.
5
or'a'
, (note that5
is interpreted as a label of the index, and never as an integer position along the index).A list or an array of labels, e.g.
['a', 'b', 'c']
.A slice object with labels, e.g.
'a':'f'
.
df.loc[:,'New_Column'] = 'value'
- You can use '.loc' with ':' to add a specified value for all rows.
Your Dataframe before we add a new column:
# Method 2: Using .loc[] df.loc[:,'Grade'] = 'A' df
Your Dataframe after adding a new column:
The .loc[]
has two limitations: it mutates the dataframe in-place, and it can't be used with method chaining. If you are experiencing this problem, use the .assign()
method.
Method I.3: Using the .assign() function
The .assign()
function returns a new object with all original columns as well as the new ones. Existing columns that are re-assigned will be overwritten. The column names are keywords. If the values are callable, they are computed on the dataframe and assigned to the new columns.
df = df.assign(New_Column='value')
Your Dataframe before we add a new column:
# Method 3: Using .assign() function df = df.assign(Year='3') df
Your Dataframe after adding a new column:
Method I.4: Using the .insert() function
An advantage of the .insert()
method is that it gives the freedom to add a column at any position we like and not just at the end of the Dataframe. It also provides different options for inserting column values.
Parameters for .insert()
:
loc: loc is an integer which is the location of a column where we want to insert a new column. This will shift the existing column at that position to the right.
column: column is a string which is the name of a column to be inserted.
value: value is simply the value to be inserted. It can be an integer, a string, a float or even a series / list of values. Providing only one value will set the same value for all rows.
allow_duplicates : allow_duplicates is a boolean value which checks wheather or not a column with the same name already exists.
With the .insert()
function you can set an entire column of a Dataframe to a specific value by - df.insert(2, 'New_Column', 'value', True)
Your Dataframe before we add a new column:
Your Dataframe after adding a new column:
II. Add a new column with different values
All the methods that are cowered above can also be used to assign a new column with different values to a dataframe.
Method II.1: By declaring a new list as a column
You can append a new column with different values to a dataframe using method I.1 but with a list that contains multiple values. So instead of df['New_Column']='value'
use
df['New_Column']=['value1','value2','value 3']
When using this method you will need to keep the following in mind:
If values provided in the list are less than a number of indexes, then it will give a Value Error.
If a column already exists, then all of its values will be replaced.
Your Dataframe before we add a new column:
Your Dataframe after adding a new column:
Method II.2: Using .loc[]
In this case you will need to change method I.2
df.loc[:,'New_Column'] = 'value'
to
df.loc[:, 'New_Column'] = ['value1','value2','value3']
Your Dataframe before we add a new column:
Your Dataframe after adding a new column:
Method II.3 Using the .assign() function
When you want to add a new column with different values to a dataframe using the .assign()
function you will need to change
df = df.assign(New_Column='value')
to
df = df.assign(New_column=['value1', 'value2', 'value3'])
Your Dataframe before we add a new column:
Your Dataframe after adding a new column:
Method II.4 Using the .insert() function
You can use the.insert()
function to inserting a column in a specific location. To add a new column with different values to a dataframe use:
df.insert(loc=1, column="New Column", value=['value1', 'value2','value3'])
Your Dataframe before we add a new column:
Your Dataframe after adding a new column:
Please note that there are many more ways of adding a column to a Pandas dataframe. However, knowing these four should be more than sufficient.
Conclusion:
Now you should understand the basics of adding columns to a dataset in Pandas. I hope you've found this post helpful. If you want to go deeper into the subject, there are some great answers on StackOverflow.
Continue Reading
Apps
Timestripe - my new favourite productivity app
March 5, 2023
Guides
How to scrape tables from websites using Pandas read_html() function
February 2, 2023
Guides
Drop all duplicate rows across multiple columns in Python Pandas
January 28, 2023
Guides
How to create effective prompts for AI image generation
August 15, 2022
Guides
Generate Huge Datasets With Fake Data Easily and Quickly using Python and Faker
April 16, 2022
Guides
How to change or update a specific cell in Python Pandas Dataframe
March 25, 2021
Guides
How to add a row at the top in Pandas dataframe
March 22, 2021
Guides
Creating WordClouds in Python from a single-column in Pandas dataframe
November 15, 2020
Guides
Python Regex examples - How to use Regex with Pandas
September 9, 2020
Guides
Python regular expressions (RegEx) simple yet complete guide for beginners
September 15, 2020
Guides
8 Python Pandas Value_counts() tricks that make your work more efficient
May 31, 2020
Guides
Exploring Correlation in Python: Pandas, SciPy
May 5, 2020
Guides
Delete column/row from a Pandas dataframe using .drop() method
February 2, 2020
Guides
How to visualize data with Matplotlib from a Pandas Dataframe
November 15, 2019
Guides
The ultimate beginners guide to Group by in Python Pandas
August 8, 2019
Guides
Guide to renaming columns with Python Pandas
July 2, 2019
Guides
How to suppress scientific notation in Pandas
July 12, 2019
Guides
The complete beginners guide to Pandas
June 29, 2019
Guides
Data project #1: Stockmarket analysis
June 29, 2019
Guides
Use Jupyter notebooks anywhere
June 10, 2019