How to add new columns to Pandas dataframe?
In this article, I will use examples to show you how to add columns to a dataframe in Pandas. There is more than one way of adding columns to a Pandas dataframe, let’s review the main approaches.
Create a Dataframe
As usual let's start by creating a dataframe.
Create a simple dataframe with a dictionary of lists, and column names: name, age, city, country.
# Creating simple dataframe
# List of Tuples
students = [ ('Jack', 34, 'Sydeny' , 'Australia') ,
('Riti', 30, 'Delhi' , 'India' ) ,
('Tom', 31, 'Mumbai' , 'India' ) ,
('Neelu', 32, 'Bangalore' , 'India' ) ,
('John', 16, 'New York' , 'US') ,
('Mike', 17, 'las vegas' , 'US') ]
#Create a DataFrame object
df = pd.DataFrame(students, columns = ['Name' , 'Age', 'City' , 'Country'], index=['a', 'b', 'c' , 'd' , 'e' , 'f'])
I. Add a column to Pandas Dataframe with a default value
When trying to set the entire column of a dataframe to a specific value, use one of the four methods shown below.
- By declaring a new
list
as a column loc
.assign()
.insert()
Method I.1: By declaring a new list as a column
df['New_Column']='value'
will add the new column and set all rows to that value.
In this example, we will create a dataframe df
and add a new column with the name Course
to it.
Your Dataframe before we add a new column:
# Method 1: By declaring a new list as a column
df['Course'] = 'Computer science'
df
Your Dataframe after adding a new column:
Some of you may get the following warning -
"A value is trying to be set on a copy of a slice from a DataFrame"
.
This error is usually a result of creating a slice of the original dataframe before declaring your new column. To avoid the error add your new column to the original dataframe and then create the slice:
.loc[row_indexer,col_indexer] = value instead
.
Python can do unexpected things when new objects are defined from existing ones. A slice of dataframe is just a stand-in for the rows stored in the original dataframe object: a new object is not created in memory.
To avoid these issues altogether use the copy
or deepcopy
module, which explicitly forces objects to be copied in memory so that methods called on the new objects are not applied to the source object.
Or you can use the .loc[]
method as suggested by Pandas error message.
For more information, see the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html.
Method I.2: Using .loc[]
The pandas.DataFrame.loc allows to access a group of rows and columns by label(s) or a boolean array.
.loc[]
is primarily label based, but may also be used with a boolean array.
Allowed inputs are:
- A single label, e.g.
5
or'a'
, (note that5
is interpreted as a label of the index, and never as an integer position along the index). - A list or an array of labels, e.g.
['a', 'b', 'c']
. - A slice object with labels, e.g.
'a':'f'
.
df.loc[:,'
- You can use '.loc' with ':' to add a specified value for all rows. New_Column
'] = 'value'
Your Dataframe before we add a new column:
# Method 2: Using .loc[]
df.loc[:,'Grade'] = 'A'
df
Your Dataframe after adding a new column:
The .loc[]
has two limitations: it mutates the dataframe in-place, and it can't be used with method chaining. If you are experiencing this problem, use the .assign()
method.
Method I.3: Using the .assign() function
The .assign()
function returns a new object with all original columns as well as the new ones. Existing columns that are re-assigned will be overwritten. The column names are keywords. If the values are callable, they are computed on the dataframe and assigned to the new columns.
df = df.assign(
='value')New_Column
Your Dataframe before we add a new column:
# Method 3: Using .assign() function
df = df.assign(Year='3')
df
Your Dataframe after adding a new column:
Method I.4: Using the .insert() function
An advantage of the .insert()
method is that it gives the freedom to add a column at any position we like and not just at the end of the Dataframe. It also provides different options for inserting column values.
Parameters for .insert()
:
- loc: loc is an integer which is the location of a column where we want to insert a new column. This will shift the existing column at that position to the right.
- column: column is a string which is the name of a column to be inserted.
- value: value is simply the value to be inserted. It can be an integer, a string, a float or even a series / list of values. Providing only one value will set the same value for all rows.
- allow_duplicates : allow_duplicates is a boolean value which checks wheather or not a column with the same name already exists.
With the .insert()
function you can set an entire column of a Dataframe to a specific value by - df.insert(2, 'New_Column', 'value', True)
Your Dataframe before we add a new column:
Your Dataframe after adding a new column:
II. Add a new column with different values
All the methods that are cowered above can also be used to assign a new column with different values to a dataframe.
Method II.1: By declaring a new list as a column
You can append a new column with different values to a dataframe using method I.1 but with a list that contains multiple values. So instead of df['New_Column']='value'
use
df['New_Column']=['value1','value2','value 3']
When using this method you will need to keep the following in mind:
- If values provided in the list are less than a number of indexes, then it will give a Value Error.
- If a column already exists, then all of its values will be replaced.
Your Dataframe before we add a new column:
Your Dataframe after adding a new column:
Method II.2: Using .loc[]
In this case you will need to change method I.2
df.loc[:,'
New_Column
'] = 'value'
to
df.loc[:, '
'] = ['value1','value2','value3']New_Column
Your Dataframe before we add a new column:
Your Dataframe after adding a new column:
Method II.3 Using the .assign() function
When you want to add a new column with different values to a dataframe using the .assign()
function you will need to change
df = df.assign(
='value')New_Column
to
df = df.assign(New_column=['value1', 'value2', 'value3'])
Your Dataframe before we add a new column:
Your Dataframe after adding a new column:
Method II.4 Using the .insert() function
You can use the.insert()
function to inserting a column in a specific location. To add a new column with different values to a dataframe use:
df.insert(loc=1, column="New Column", value=['value1', 'value2','value3'])
Your Dataframe before we add a new column:
Your Dataframe after adding a new column:
Please note that there are many more ways of adding a column to a Pandas dataframe. However, knowing these four should be more than sufficient.
Conclusion:
Now you should understand the basics of adding columns to a dataset in Pandas. I hope you've found this post helpful. If you want to go deeper into the subject, there are some great answers on StackOverflow.