To change a column’s data type to int (float/string to integer/int64/int32 dtype), use the pandas DataFrame.astype(int) and DataFrame.apply() methods. If you are converting a float, you probably already know that it is larger than an int type and would remove any number with a decimal point.
Be aware that the fraction values are truncated when converting a float to an int without any rounding or flooring (anything after .). This post will describe many methods for converting columns with float values to integer values.
Converting Column with float values to Integer values in Pandas
This article aims to help you learn how to use DataFrame.astype() and DataFrame.apply() function to convert column string to int and float to int. Additionally, we will explore how to convert strings and floats to integers when a column contains Nan or null values.
Preparing pandas
We first need to set up Pandas in our Python environment before delving into how to carry out the conversion operation. Pandas is probably installed if you’re using the Anaconda interpreter’s base environment. On a native Python installation, you must manually install it. Run the following command to do that:
$ pip install pandas
On Linux, run the following command:
$ sudo pip3 install pandas
Install pandas with conda in Anaconda or Miniconda environments.
$ conda install pandas $ sudo conda install pandas
Sample DataFrame by Pandas
To serve as an example in this lesson, let’s set up a sample DataFrame. Then, you can use your DataFrame or copy the code below.
import pandas as pd df = pd.DataFrame({'id': ['1', '2', '3', '4', '5'], 'name': ['Mike Tyson', 'Smith Raw', 'David Silva', 'White Brown', 'Steve Wright'], 'points': ['70000', '90899', '90000', '101000', '310000']})
We can examine the data once the DataFrame has been formed.
Pandas Display Column Type
Before changing a column from one type to an int, it is wise to ascertain whether the current type can be cast to an int. For instance, converting a name-containing column to an int is impossible. Using the dtypes property, we can see a DataFrame’s type.
Utilize the syntax:
DataFrame.dtypes
The column types in our sample DataFrame are as follows:
df.dtypes
None of the columns in the output above include an int type, as seen.
Pandas Convert Column to Int from String
We may change a single column to an int using the astype() function and the target data type as the input.
The syntax for the function:
DataFrame.astype(dtype, copy=True, errors='raise')
- dtype: This specifies the type in Python or a NumPy dtype to which the item is being transformed.
- copy – Instead of acting in place, you can return a copy of the object with the copy command.
- errors – describes what to do in the event of a mistake. The function will raise the errors by default.
As seen in the following code, we can use the astype() function to change the id column in our sample DataFrame to an int type:
df['id'] = df['id'].astype(int)
The ‘id’ column is designated as the target object in the code above. The astype() function is called with an int as the type argument. For each column in the DataFrame, we may examine the new data type:
df.dtypes
While the other columns are unchanged, the id column has been changed to an int.
Multiple Columns are Converted to Int by Pandas
We can convert many columns to a particular type using the astype() function. The id and points columns can be changed to an int type, for instance, by running the following code.
df[['id', 'points']] = df[['id', 'points']].astype(int)
Using the square bracket syntax, we define many columns in this case. It enables us to change the columns’ data type to the one requested by the astype() method. We should receive the following output if we verify the column type:
df.dtypes
Now that the id and points columns have been changed to an int32 type, it is clear to observe.
Multiple Columns to Multiple Types Converted using Pandas
We can specify a column and target type as a dictionary using the astype() function. Let’s say we want to change the points column to float64 and the id column to int32.
The following code can be executed:
convert_to = {"id": int, "points": float} df = df.astype(convert_to)
In the code mentioned above, we begin by constructing a dictionary with the target type as the value. And the target column as the key. The dictionary’s columns are then changed to the set types using the astype() function. The results of checking the column types should be:
df.dtypes
Remember that the points column is a float32 type, whereas the id column is an int32 type.
Examples of Converting Column to Int in Pandas
Here are a few examples of changing a column in a DataFrame to an integer dtype if you’re in a hurry. Let’s start by constructing a DataFrame with a few rows and columns, run a few examples, and check the output. The columns in our DataFrame have names like purchases, cost, duration, and bonus.
import pandas as pd import numpy as np employees= { 'purchases':["Laptop","Phone","Tv Set","Cooker","Projector"], 'cost' :["42000","45000","43000","44000","46000"], 'duration':['19days','39days','24days', '29days','44days'], 'bonus':[500.10,1800.15,500.5,700.22,2000.20] } df = pd.DataFrame(employees) print(df) print(df.dtypes)
Using the employee DataFrame above, we can create the following easy-to-digest example.
# conversion of "cost" from String to int cost_df = df.astype({'cost':'int'}) # Convert all columns to int dtype. # This returns error in our DataFrame #cost_df = cost_df .astype('int') # Convert single column to int dtype. cost_df ['cost'] = cost_df['cost'].astype('int') # conversion of "bonus" from Float to int bonus_df = df.astype({'bonus':'int'}) # Converting Multiple columns to int df = pd.DataFrame(employees) df = df.astype({"cost":"int","bonus":"int"}) # conversion of "cost" from float to int and replace NaN values df['cost'] = df['cost'].fillna(0).astype(int) print(df) print(df.dtypes)
Conversion of a column to an Integer
You can apply this to a single column or to the entire DataFrame by using the pandas DataFrame.astype() function to convert a column to an int (integer). Use int64, numpy.int64, numpy.int_, or int as a parameter to convert a data type to a 64-bit signed integer. Use numpy.int32 or int32 to cast to a 32-bit signed integer in Python.
The example below changes the cost column’s string dtype to an int64. This method also accepts numpy.int64 as a parameter.
# conversion of "cost" from String to int df = df.astype({'cost':'int'}) print(df.dtypes)
Following the simple steps listed below, you can easily convert a DataFrame to an int dtype if it has all string columns with integer values. It gives an error if you have any columns with alpha-numeric values. You will encounter an issue if you attempt to use our DataFrame.
# Converting all the provided columns to int dtype. df = df.astype('int')
You can also convert a particular column using Series.astype(). Since every column in a DataFrame is a pandas Series, I will use the astype() function to obtain the column from the DataFrame as a Series. The Series object is returned by df.cost or df[‘cost’] in the example below.
# Convert single column to int dtype. df['cost'] = df['cost'].astype('int')
Float to Int dtype conversion
Let’s now change the float column in the pandas DataFrame to an int (integer) type using the same methods and astype(). Be aware that the fraction values are truncated when converting a float to an int without any rounding or flooring (anything after .). The example below uses the DataFrame.astype() function to convert column bonus holding float values to ints.
# conversion of "bonus" from Float to int df = df.astype({'bonus':'int'}) print(df.dtypes)
In a similar vein, you can likewise cast all columns or just one. For details, see the examples in the preceding section.
Converting several Columns to an Integer
By passing a dict of column name -> data type to the astype() method, you may also convert many columns to integers. The sample below changes the column’s cost and bonus types from String to int and float to int, respectively.
# Conversion of Multiple columns to int df = pd.DataFrame(employees) df = df.astype({"cost":"int","bonus":"int"}) print(df.dtypes)
Applying np.int64 to Cast to Integer with apply
You can also change the Fee column in pandas from a string to an integer using the DataFrame.apply() method. As you can see, numpy.int64 is what we’re using in this case.
import numpy as np # conversion of "cost" from float to int using DataFrame.apply(np.int64) df["cost"] = df["cost"].apply(np.int64) print(df.dtypes)
Column with NaNs to Astype(int) Conversion
Let’s develop a DataFrame to illustrate various NaN/Null values. Replace NaN values with zero on a pandas DataFrame before using astype() to convert a column with a mix of float and NaN values to int.
# import Pandas library import pandas as pd # subsequently, import the NumPy library import numpy as np employees= { 'cost' :[42000.30,45000.40,np.nan,44000.50,46000.10,np.nan] } df = pd.DataFrame(employees) print(df) print(df.dtypes)
To replace NaN values with the integer value zero, use DataFrame.fillna().
# conversion of "cost" from float to int and replace NaN values df['cost'] = df['cost'].fillna(0).astype(int) print(df) print(df.dtypes)
Conclusion
Working with scientific data is simple thanks to Pandas, a Python toolkit that is free and open-source and offers quick, adaptable, and expressive data structures. One of Python’s most useful packages for manipulating and analyzing data is called Pandas.
It provides options like unique data structures constructed on top of Python. In a Pandas DataFrame, changing a column from one data type to an int type have covered in this article. Generally, this post provided thorough instructions and examples for changing a Pandas DataFrame from one type to another.