Guides

How to add new columns to Pandas dataframe?

March 22, 2020

March 22, 2020

In this article, I will use examples to show you how to add columns to a dataframe in Pandas. There is more than one way of adding columns to a Pandas dataframe, let’s review the main approaches.

Create a Dataframe

As usual let's start by creating a dataframe.

Create a simple dataframe with  a dictionary of lists, and column names: name, age, city, country.

# Creating simple dataframe # List of Tuples students = [ ('Jack', 34, 'Sydeny' , 'Australia') , ('Riti', 30, 'Delhi' , 'India' ) , ('Tom', 31, 'Mumbai' , 'India' ) , ('Neelu', 32, 'Bangalore' , 'India' ) , ('John', 16, 'New York' , 'US') , ('Mike', 17, 'las vegas' , 'US') ] #Create a DataFrame object df = pd.DataFrame(students, columns = ['Name' , 'Age', 'City' , 'Country'], index=['a', 'b', 'c' , 'd' , 'e' , 'f'])

I. Add a column to Pandas Dataframe with a default value

When trying to set the entire column of a dataframe to a specific value, use one of the four methods shown below.

  • By declaring a new list as a column

  • loc

  • .assign()

  • .insert()

Method I.1: By declaring a new list as a column

df['New_Column']='value' will add the new column and set all rows to that value.

In this example, we will create a dataframe df and add a new column with the name Course to it.

Your Dataframe before we add a new column:

# Method 1: By declaring a new list as a column df['Course'] = 'Computer science' df

Your Dataframe after adding a new column:

Some of you may get the following warning -

"A value is trying to be set on a copy of a slice from a DataFrame".

This error is usually a result of creating a slice of the original dataframe before declaring your new column. To avoid the error add your new column to the original dataframe and then create the slice:

.loc[row_indexer,col_indexer] = value instead.

Python can do unexpected things when new objects are defined from existing ones. A slice of dataframe is just a stand-in for the rows stored in the original dataframe object: a new object is not created in memory.

To avoid these issues altogether use the copyor deepcopy module, which explicitly forces objects to be copied in memory so that methods called on the new objects are not applied to the source object.

Or you can use the .loc[] method as suggested by Pandas error message.

For more information, see the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html.

Method I.2: Using .loc[]

The pandas.DataFrame.loc allows to access a group of rows and columns by label(s) or a boolean array.

.loc[] is primarily label based, but may also be used with a boolean array.

Allowed inputs are:

  • A single label, e.g. 5 or 'a', (note that 5 is interpreted as a label of the index, and never as an integer position along the index).

  • A list or an array of labels, e.g. ['a', 'b', 'c'].

  • A slice object with labels, e.g. 'a':'f'.

df.loc[:,'New_Column'] = 'value' - You can use '.loc' with ':' to add  a specified value for all rows.

Your Dataframe before we add a new column:

# Method 2: Using .loc[] df.loc[:,'Grade'] = 'A' df

Your Dataframe after adding a new column:

The .loc[] has two limitations: it mutates the dataframe in-place, and it can't be used with method chaining. If you are experiencing this problem, use the .assign() method.

Method I.3: Using the .assign() function

The .assign() function returns a new object with all original columns as well as the new ones. Existing columns that are re-assigned will be overwritten. The column names are keywords. If the values are callable, they are computed on the dataframe and assigned to the new columns.

df = df.assign(New_Column='value')

Your Dataframe before we add a new column:

# Method 3: Using .assign() function df = df.assign(Year='3') df

Your Dataframe after adding a new column:

Method I.4: Using the .insert() function

An advantage of the .insert() method is that it gives the freedom to add a column at any position we like and not just at the end of the Dataframe. It also provides different options for inserting column values.

Parameters for .insert() :

  • loc: loc is an integer which is the location of a column where we want to insert a new column. This will shift the existing column at that position to the right.

  • column: column is a string which is the name of a column to be inserted.

  • value: value is simply the value to be inserted. It can be an integer, a string, a float or even a series / list of values. Providing only one value will set the same value for all rows.

  • allow_duplicates : allow_duplicates is a boolean value which checks wheather or not a column with the same name already exists.

With the .insert() function you can set an entire column of a Dataframe to a specific value by - df.insert(2, 'New_Column', 'value', True)

Your Dataframe before we add a new column:

Your Dataframe after adding a new column:

II. Add a new column with different values

All the methods that are cowered above can also be used to assign a new column with different values to a dataframe.

Method II.1: By declaring a new list as a column

You can append a new column with different values to a dataframe using method I.1 but with a list that contains multiple values.  So instead of df['New_Column']='value' use

df['New_Column']=['value1','value2','value 3']

When using this method you will need to keep the following in mind:

  • If values provided in the list are less than a number of indexes, then it will give a Value Error.

  • If a column already exists, then all of its values will be replaced.

Your Dataframe before we add a new column:

Your Dataframe after adding a new column:

Method II.2: Using .loc[]

In this case you will need to change method I.2

df.loc[:,'New_Column'] = 'value'

to

df.loc[:, 'New_Column'] = ['value1','value2','value3']

Your Dataframe before we add a new column:

Your Dataframe after adding a new column:

Method II.3 Using the .assign() function

When you want to add a new column with different values to a dataframe using the .assign()  function you will need to change

df = df.assign(New_Column='value')

to

df = df.assign(New_column=['value1', 'value2', 'value3'])

Your Dataframe before we add a new column:

Your Dataframe after adding a new column:

Method II.4 Using the .insert() function

You can use the.insert()function to inserting a column in a specific location. To add a new column with different values to a dataframe use:

df.insert(loc=1, column="New Column", value=['value1', 'value2','value3'])

Your Dataframe before we add a new column:

Your Dataframe after adding a new column:

Please note that there are many more ways of adding a column to a Pandas dataframe. However, knowing these four should be more than sufficient.

Conclusion:

Now you should understand the basics of adding columns to a dataset in Pandas. I hope you've found this post helpful. If you want to go deeper into the subject, there are some great answers on StackOverflow.

Subscribe

Get fresh web design stories, tips, and resources delivered straight to your inbox every week.

Get fresh web design stories, tips, and resources delivered straight to your inbox every week.

Continue Reading

Apps

Timestripe - my new favourite productivity app

March 5, 2023

Guides

How to scrape tables from websites using Pandas read_html() function

February 2, 2023

Guides

Drop all duplicate rows across multiple columns in Python Pandas

January 28, 2023

Guides

How to create effective prompts for AI image generation

August 15, 2022

Guides

Generate Huge Datasets With Fake Data Easily and Quickly using Python and Faker

April 16, 2022

Guides

How to change or update a specific cell in Python Pandas Dataframe

March 25, 2021

Guides

How to add a row at the top in Pandas dataframe

March 22, 2021

Guides

Creating WordClouds in Python from a single-column in Pandas dataframe

November 15, 2020

Guides

Python Regex examples - How to use Regex with Pandas

September 9, 2020

Guides

Python regular expressions (RegEx) simple yet complete guide for beginners

September 15, 2020

Guides

8 Python Pandas Value_counts() tricks that make your work more efficient

May 31, 2020

Guides

Exploring Correlation in Python: Pandas, SciPy

May 5, 2020

Guides

Delete column/row from a Pandas dataframe using .drop() method

February 2, 2020

Guides

How to visualize data with Matplotlib from a Pandas Dataframe

November 15, 2019

Guides

The ultimate beginners guide to Group by in Python Pandas

August 8, 2019

Guides

Guide to renaming columns with Python Pandas

July 2, 2019

Guides

How to suppress scientific notation in Pandas

July 12, 2019

Guides

The complete beginners guide to Pandas

June 29, 2019

Guides

Data project #1: Stockmarket analysis

June 29, 2019

Blue and red light digital wallpaper
Blue and red light digital wallpaper

Guides

Use Jupyter notebooks anywhere

June 10, 2019