Pandas DataFrame Append

Python is an great language for data analysis due to large data-centric Python packages. One of these packages is Pandas, which makes importing and analyzing data more manageable. This lesson will show you how to attach new rows to a Pandas dataframe or object using the Pandas append approach. We’ll show you step-by-step examples and explain what the append technique does and how the syntax works.

Append Rows to a Pandas DataFrame

The append() function adds rows from another dataframe to the end of the current dataframe. It then returns a new dataframe object. Columns not present in the original dataframes are created as new columns, and the new cells are filled with a NaN value.

The Pandas append method adds new rows to an existing Pandas object. It is a widespread technique for data cleansing and data wrangling in Python.

The effectiveness of this strategy is determined by how we use the syntax. Let’s look at the syntax and optional parameters with that in mind.

We’ll go over the syntax for the Pandas add method here and both Pandas dataframes and Pandas Series objects that have their syntax, which we’ll discuss.

Before we get into the syntax, bear the following in mind:

  • First and foremost, these syntax explanations presume you’ve already installed the Pandas package. You can accomplish this by using the following code:
import pandas as pd
  • Second, these grammar explanations presume you already have two Pandas dataframes or other objects to join.

The actual syntax is as follows:

DataFrame.append(other, ignore_index=False, verify_integrity=False, sort=None)

It’s pretty easy to use the append method on a dataframe. The first dataframe’s name is typed first, followed by .append(), which is vital to invoke the method.

The name of the second dataframe, which you wish to append to the end of the first, is then typed inside the parenthesis. You can also utilize additional optional arguments, which I’ll go over in the parameters section.

The syntax for adding items to a series

The syntax for appending to a Series is similar to that of a dataframe. The first Series’ name is typed first, followed by .append() to invoke the method.

series_1.append(series_2)

Then you write the name of the second series, which you wish to append to the end of the first, inside the parenthesis. Once again, some optional alternatives can alter the method’s behavior significantly.

Parameters :

These parameters comprise the following:

other

A DataFrame, a Series, or a dict-like object, or a collection of these

ignore_index

If True, the index labels are not used. You can manage the index of the new output Pandas object with the ignore_index option.

It is set to ignore_index = False by default. Pandas preserve the original index values from the two input dataframes in this situation. Remember that this can result in duplicate index values, which can cause issues.

If you set ignore_index = True, Pandas will disregard the index values in the inputs and produce a new index for the output. The index values will be labeled 0, 1,… n – 1.

verify_integrity(Optional)

If true, raise a ValueError when establishing an index with duplicates.

The verify_integrity argument verifies the new index’s “integrity.” Python will generate an error message if you set verify_integrity = True and the index has duplicates.

Verify_integrity = False is the default value for this argument. Python will accept duplicates in this case.

Sort (Optional)

If the columns of self and others are not aligned, sort them. In a future pandas version, the default sorting is replaced with not-sorting. To turn off the warning and sort, explicitly pass sort=True. To disable the warning and avoid sorting, explicitly give sort=False. If the two input dataframes have distinct columns, the sort argument determines how the columns are sorted.

This parameter is set to sort = False by default. When the columns are concatenated together in this scenario, they are not sorted. Pandas will re-sort the columns in the output if you specify sort = True.

Returns

DataFrame is attached to the results.

This technique is adaptable because you may use it on various Pandas objects. This method can be applied to the following situations:

  • dataframes
  • Series

When we use append on dataframes, the columns in the dataframes are frequently the same. However, if the input dataframes have distinct columns, the output dataframe will include both inputs’ columns.

Pandas’ append output

The input determines the append’s output. In most cases, the result will be a new Pandas object with the second object’s rows attached to the bottom of the original object.

If the inputs are dataframes, the output will also be a dataframe. If the inputs are all series, the outcome will be a series.

It’s also worth noting that the append() method creates a new object while leaving the two original input objects alone. Beginners may be confused by this, so keep in mind that the process creates a new object.

Example 1: Appending new rows to a Pandas object

Let’s look at a couple of instances of how to add new rows to a Pandas object using append.

You must accomplish two things before running any of the examples:

  • importing Pandas
  • make the dataframes we’ll be using

importing Pandas

Let’s start by importing Pandas. You can accomplish this by using the following code:

import pandas as pd

It allows us to invoke pandas functions with the pd prefix, the standard.

Make a dataframe

Let’s make two dataframes now. First, we’ll construct dataframes with imitation sales data in this section. In fact, you can make these using the code below:

sales_data_one = pd.DataFrame({"name":["William","Emma","Sofia","Markus","Edward"]
,"region":["East",np.nan,"East","South","West"]
,"sales":[50000,52000,90000,np.nan,42000]
,"expenses":[42000,43000,np.nan,44000,38000]})

sales_data_two = pd.DataFrame({"name":["Thomas","Ethan","Olivia","Arun","Anika","Paulo"]
,"region":["West","South","West","West","East","South"]
,"sales":[72000,49000,np.nan,67000,65000,67000]
,"expenses":[39000,42000,np.nan,39000,44000,45000]})

Let’s print these out so you can have a rough idea of what’s inside:

print(sales_data_one)
print(sales_data_two)

These dataframes, as you can see, contain sales information such as name, region, total sales, and expenses. It’s also worth noting that, despite having identical columns, the dataframes have distinct rows. To append the rows from sales_data_two to sales_data_one, we’ll utilize the append() method.

Append new Rows to a DataFrame

Let’s begin with the basics. We’ll append the sales_data_two to the end (or bottom) of sales_data_one. So, let’s execute the code first, and then we’ll explain what we’ve done:

sales_data_one.append(sales_data_two)

It is a straightforward procedure. To invoke the method, type the first dataframe’s name, sales_data_one. However, to invoke the process, use append(). We have the name of the second dataframe, sales_data_two, inside the parentheses. In addition, the rows of both are piled on top of each other in the resulting dataframe.

However, one thing to note is that there are duplicate values in the numeric index on the left. Because the indexes of the two original input dataframes both contained comparable values, this is the case (i.e., the index for both started at 0 and incremented by 1 for each row).

These index duplications could be harmful. As a result, in the following example, we’ll correct it. Ignore the index and reset it when you insert new rows

We’ll join the rows of the two dataframes here, but it will reset the resulting dataframe’s index. It will produce a new numeric index with a value of 0. Set ignore_index = True to accomplish this. It effectively causes Python to “ignore” the index in the input dataframes and build a new index for the output dataframes:

sales_data_one.append(sales_data_two, ignore_index = True)

The index in the output starts at 0 and increases by 1 for each row until it reaches 10. It is a new index for the output dataframes, and it essentially eliminates any duplicate index labels from the input dataframes.

Ensure the index is still intact when you attach new rows to the index.

Instead of resetting the index, let’s double-check it. We’ll achieve this by setting verify_integrity to True. It will look for duplicate index labels in the inputs. Pandas will throw an error if there are identical index labels.

Let’s have a look at some examples:

sales_data_one.append(sales_data_two, verify_integrity = True)

Verify_integrity is set to True in this case. This function looked for duplicate index labels in the incoming dataframes. Running this code resulted in a ValueError, as you can see. The cause for this is that the two input dataframes had duplicate index labels. They both had rows with 0, 1, 2, 3, and 4 written on them.

You may need to do some data cleaning on your input data to remove duplicate rows if you get an error like this. Alternatively, you may ignore the index, as we did in the example above. The way you address things is highly dependent on the situation.

Example #2: Make two data frames and append them to each other.

# Importing pandas as pd
import pandas as pd

# Creating the first Dataframe using dictionary
dFrameOne = df = pd.DataFrame({"a":[11, 12, 13, 14], "b":[15, 16, 17, 18]})

# Creating the Second Dataframe using dictionary
dFrameTwo = pd.DataFrame({"a":[11, 12, 13], "b":[15, 16, 17]})

# Print  DataFrame 1
print(dFrameOne, "\n")

# Print DataFrame 2
dFrameTwo


dFrameTwo should now be appended to the end of dFrameOne.

# to append dFrameTwo at the end of dFrameOne dataframe
dFrameOne.append(dFrameTwo)

It’s worth noting that the index value from the second data frame is preserved in the added data frame. We can set ignore_index=True if we don’t want that to happen.

# A continuous index value will be maintained
# across the rows in the new appended data frame.
dFrameOne.append(dFrameTwo, ignore_index = True)

Example 3: Append a dataframe of a different shape like an example

Non-existent values in one of the data frames are filled with NaN values if the number of columns in the data frame is uneven.

# Importing pandas as pd
import pandas as pd

# Creating the first Dataframe using dictionary
dFrameOne = pd.DataFrame({"a":[1, 2, 3, 4], "b":[5, 6, 7, 8]})

# Creating the Second Dataframe using dictionary
dFrameTwo = pd.DataFrame({"a":[1, 2, 3], "b":[5, 6, 7], "c":[1, 5, 4]})

# for appending df2 at the end of df1
dFrameOne.append(dFrameTwo, ignore_index = True)

The new cells are filled with NaN values, as you can see.

Example 4: Appending Housing Data

In the case of real estate investing, we’re seeking to merge the 50 dataframes, including housing data, into a single dataframe. We do this for a variety of reasons. Combining these is easier and makes sense for a couple of reasons, but it also uses less RAM. A date and value column is present in every dataframe. This date column appears in all of the dataframes, but it should be shared by them, essentially halving our overall column count.

You may have many objectives in mind when integrating dataframes. For example, you might want to “append” to them, which means you’ll be adding rows to the end. Here are some dataframes to get you started:

import pandas as pd

df1 = pd.DataFrame({'HPI':[80,85,88,85],
                    'Int_rate':[2, 3, 2, 2],
                    'US_GDP_Thousands':[50, 55, 65, 55]},
                   index = [2001, 2002, 2003, 2004])

df2 = pd.DataFrame({'HPI':[80,85,88,85],
                    'Int_rate':[2, 3, 2, 2],
                    'US_GDP_Thousands':[50, 55, 65, 55]},
                   index = [2005, 2006, 2007, 2008])

df3 = pd.DataFrame({'HPI':[80,85,88,85],
                    'Int_rate':[2, 3, 2, 2],
                    'Low_tier_HPI':[50, 52, 50, 53]},
                   index = [2001, 2002, 2003, 2004])

There are two significant differences between these two. The indexes of df1 and df3 are the same, but the columns are different. Different indices and columns distinguish df2 and df3. We can discuss various approaches to bringing these together with concatenation if you so wish. Appending is similar to concatenation, but it is a little more forceful in that the dataframe will just be appended to, adding to the rows. Let’s look at an example of how it works typically, as well as where it could go wrong:

df4 = df1.append(df2)
print(df4)

That’s what an append is supposed to do. Most of the time, you’ll do something similar to this as if you were putting a new record into a database. Dataframes were not designed to be efficiently added; instead, they were intended to be changed based on their starting date, but you can append if necessary. What happens when data with the same index is appended?

df4 = df1.append(df3)
print(df4)

That’s unfortunate, to say the least. Some people wonder why concatenation and to append both fails. That is the reason. Because the columns shared contain the same data and index, combining these dataframes is significantly more efficient. Another example is to append a series maybe. Because of the nature of append, you’re more likely to be adding a series rather than an entire dataframe.

To this point, we haven’t discussed the series. A series is a dataframe with only one column. Although a series has an index, all that remains is the data when converted to a list. The return is a series whenever we say something like df[‘column’].

s = pd.Series([80,2,50], index=['HPI','Int_rate','US_GDP_Thousands'])
df4 = df1.append(s, ignore_index=True)
print(df4)

We must ignore the index when adding a series because it is the law unless the series has a name.

Conclusion

The merging of two or more data sets into a single data set is known as data merging. This approach is usually required when you have raw data stored in multiple files, workbooks, or data tables that you want to analyze all at once.

Pandas come with several built-in techniques for merging DataFrames. Append() is a special case of concat() that adds row(s) to the end of the current DataFrame (axis=0 and join=’outer’). Although the method appears to be quite simple to apply, there are a few strategies you should be aware of to speed up your data analysis.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *