Converting Column to Integer values in Pandas

To change a column’s data type to int (float/string to integer/int64/int32 dtype), use the pandas DataFrame.astype(int) and DataFrame.apply() methods. If you are converting a float, you probably already know that it is larger than an int type and would remove any number with a decimal point.

Be aware that the fraction values are truncated when converting a float to an int without any rounding or flooring (anything after .). This post will describe many methods for converting columns with float values to integer values.

Converting Column with float values to Integer values in Pandas

This article aims to help you learn how to use DataFrame.astype() and DataFrame.apply() function to convert column string to int and float to int. Additionally, we will explore how to convert strings and floats to integers when a column contains Nan or null values.

Preparing pandas

We first need to set up Pandas in our Python environment before delving into how to carry out the conversion operation. Pandas is probably installed if you’re using the Anaconda interpreter’s base environment. On a native Python installation, you must manually install it. Run the following command to do that:

$ pip install pandas

On Linux, run the following command:

$ sudo pip3 install pandas

Install pandas with conda in Anaconda or Miniconda environments.

$ conda install pandas
$ sudo conda install pandas

Sample DataFrame by Pandas

To serve as an example in this lesson, let’s set up a sample DataFrame. Then, you can use your DataFrame or copy the code below.

import pandas as pd
df = pd.DataFrame({'id': ['1', '2', '3', '4', '5'],
                   'name': ['Mike Tyson', 'Smith Raw', 'David Silva', 'White Brown', 'Steve Wright'],
                   'points': ['70000', '90899', '90000', '101000', '310000']})

We can examine the data once the DataFrame has been formed.

Pandas Display Column Type

Before changing a column from one type to an int, it is wise to ascertain whether the current type can be cast to an int. For instance, converting a name-containing column to an int is impossible. Using the dtypes property, we can see a DataFrame’s type.

Utilize the syntax:

DataFrame.dtypes

The column types in our sample DataFrame are as follows:

df.dtypes

None of the columns in the output above include an int type, as seen.

Pandas Convert Column to Int from String

We may change a single column to an int using the astype() function and the target data type as the input.

The syntax for the function:

DataFrame.astype(dtype, copy=True, errors='raise')
  • dtype: This specifies the type in Python or a NumPy dtype to which the item is being transformed.
  • copy – Instead of acting in place, you can return a copy of the object with the copy command.
  • errors – describes what to do in the event of a mistake. The function will raise the errors by default.

As seen in the following code, we can use the astype() function to change the id column in our sample DataFrame to an int type:

df['id'] = df['id'].astype(int)

The ‘id’ column is designated as the target object in the code above. The astype() function is called with an int as the type argument. For each column in the DataFrame, we may examine the new data type:

df.dtypes

While the other columns are unchanged, the id column has been changed to an int.

Multiple Columns are Converted to Int by Pandas

We can convert many columns to a particular type using the astype() function. The id and points columns can be changed to an int type, for instance, by running the following code.

df[['id', 'points']] = df[['id', 'points']].astype(int)

Using the square bracket syntax, we define many columns in this case. It enables us to change the columns’ data type to the one requested by the astype() method. We should receive the following output if we verify the column type:

df.dtypes

Now that the id and points columns have been changed to an int32 type, it is clear to observe.

Multiple Columns to Multiple Types Converted using Pandas

We can specify a column and target type as a dictionary using the astype() function. Let’s say we want to change the points column to float64 and the id column to int32.

The following code can be executed:

convert_to = {"id": int, "points": float}
df = df.astype(convert_to)

In the code mentioned above, we begin by constructing a dictionary with the target type as the value. And the target column as the key. The dictionary’s columns are then changed to the set types using the astype() function. The results of checking the column types should be:

df.dtypes

Remember that the points column is a float32 type, whereas the id column is an int32 type.

Examples of Converting Column to Int in Pandas

Here are a few examples of changing a column in a DataFrame to an integer dtype if you’re in a hurry. Let’s start by constructing a DataFrame with a few rows and columns, run a few examples, and check the output. The columns in our DataFrame have names like purchases, cost, duration, and bonus.

import pandas as pd
import numpy as np
employees= {
    'purchases':["Laptop","Phone","Tv Set","Cooker","Projector"],
    'cost' :["42000","45000","43000","44000","46000"],
    'duration':['19days','39days','24days', '29days','44days'],
    'bonus':[500.10,1800.15,500.5,700.22,2000.20]
          }
df = pd.DataFrame(employees)
print(df)
print(df.dtypes)

Using the employee DataFrame above, we can create the following easy-to-digest example.

# conversion of "cost" from String to int
cost_df = df.astype({'cost':'int'})

# Convert all columns to int dtype.
# This returns error in our DataFrame
#cost_df  = cost_df .astype('int')

# Convert single column to int dtype.
cost_df ['cost'] = cost_df['cost'].astype('int')

# conversion of "bonus" from Float to int
bonus_df = df.astype({'bonus':'int'})

# Converting Multiple columns to int
df = pd.DataFrame(employees)
df = df.astype({"cost":"int","bonus":"int"})

# conversion of "cost" from float to int and replace NaN values
df['cost'] = df['cost'].fillna(0).astype(int)
print(df)
print(df.dtypes)

Conversion of a column to an Integer

You can apply this to a single column or to the entire DataFrame by using the pandas DataFrame.astype() function to convert a column to an int (integer). Use int64, numpy.int64, numpy.int_, or int as a parameter to convert a data type to a 64-bit signed integer. Use numpy.int32 or int32 to cast to a 32-bit signed integer in Python.

The example below changes the cost column’s string dtype to an int64. This method also accepts numpy.int64 as a parameter.

# conversion of "cost" from String to int
df = df.astype({'cost':'int'})
print(df.dtypes)

Following the simple steps listed below, you can easily convert a DataFrame to an int dtype if it has all string columns with integer values. It gives an error if you have any columns with alpha-numeric values. You will encounter an issue if you attempt to use our DataFrame.

# Converting all the provided columns to int dtype.
df = df.astype('int')

You can also convert a particular column using Series.astype(). Since every column in a DataFrame is a pandas Series, I will use the astype() function to obtain the column from the DataFrame as a Series. The Series object is returned by df.cost or df[‘cost’] in the example below.

# Convert single column to int dtype.
df['cost'] = df['cost'].astype('int')

Float to Int dtype conversion

Let’s now change the float column in the pandas DataFrame to an int (integer) type using the same methods and astype(). Be aware that the fraction values are truncated when converting a float to an int without any rounding or flooring (anything after .). The example below uses the DataFrame.astype() function to convert column bonus holding float values to ints.

# conversion of "bonus" from Float to int
df = df.astype({'bonus':'int'})
print(df.dtypes)

In a similar vein, you can likewise cast all columns or just one. For details, see the examples in the preceding section.

Converting several Columns to an Integer

By passing a dict of column name -> data type to the astype() method, you may also convert many columns to integers. The sample below changes the column’s cost and bonus types from String to int and float to int, respectively.

# Conversion of Multiple columns to int
df = pd.DataFrame(employees)
df = df.astype({"cost":"int","bonus":"int"})
print(df.dtypes)

Applying np.int64 to Cast to Integer with apply

You can also change the Fee column in pandas from a string to an integer using the DataFrame.apply() method. As you can see, numpy.int64 is what we’re using in this case.

import numpy as np
# conversion of "cost" from float to int using DataFrame.apply(np.int64)
df["cost"] = df["cost"].apply(np.int64)
print(df.dtypes)

Column with NaNs to Astype(int) Conversion

Let’s develop a DataFrame to illustrate various NaN/Null values. Replace NaN values with zero on a pandas DataFrame before using astype() to convert a column with a mix of float and NaN values to int.

# import Pandas library
import pandas as pd
# subsequently, import the NumPy library
import numpy as np

employees= {
    'cost' :[42000.30,45000.40,np.nan,44000.50,46000.10,np.nan]
          }
df = pd.DataFrame(employees)
print(df)
print(df.dtypes)

To replace NaN values with the integer value zero, use DataFrame.fillna().

# conversion of "cost" from float to int and replace NaN values
df['cost'] = df['cost'].fillna(0).astype(int)
print(df)
print(df.dtypes)

Conclusion

Working with scientific data is simple thanks to Pandas, a Python toolkit that is free and open-source and offers quick, adaptable, and expressive data structures. One of Python’s most useful packages for manipulating and analyzing data is called Pandas.

It provides options like unique data structures constructed on top of Python. In a Pandas DataFrame, changing a column from one data type to an int type have covered in this article. Generally, this post provided thorough instructions and examples for changing a Pandas DataFrame from one type to another.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *