There are various approaches to counting the number of rows and columns in Pandas. These include: “len(),” “df.shape[0],” “df[df.columns[0]].count(),” “df.count(),” and “df.size().” Note that len()is the fastest of these methods. As a result, we will be centering on len() to explore its functionality, its use, and why one should opt to use it. It does not mean we are ignoring the other methods. They are equally important, and it is very vital also to grasp how they work. Further, it would be best if you endeavored to integrate it into your work, especially when dealing with DataFrames.
Counting Rows with Condition in Pandas
Let’s begin learning it by putting the example codes into practice.
Using the Len() method in Pandas
The “len()” function is our strategy in this example. Let’s investigate its functioning.
Various tools are applicable in running the example case we will be using. Some of these include “Spyder”, “PyCharm”, “Jypter Notebook,” or the Python REPL. Whichever tool excites you, knock yourself out. We assume you know how to install your tool of choice and are ready to run the application on your desktop or laptop computer.
After the installation is complete, we launch the tool and select a new file with the “.py” extension.
Here, “py” stands for Python. Hopefully, you are not so new to the Python programming language. If so, we advise you to look at an introductory course for the Python language before proceeding with this article. Essentially, we assume you have a basic grasp of the Python language to ace this article easily.
Various requirements must be included before we can begin creating our code. The title of our article briefly explains that any techniques we employ must be compatible with the “pandas” package. Pandas is a Python package that is heavily used in data science, general Python programming, Machine Learning, and Artificial intelligence in general.
As a result, we must write the script “import pandas as pd” to add a Pandas library. The Pandas library is now imported, and Pandas was declared accessible throughout the program by writing “pd” rather than the complete form “pandas.” Next, we create a Pandas DataFrame to practice the selected Pandas method. Pandas give us a very straightforward and helpful process for building DataFrames called “pd.DataFrame(),” where “pd” stands for “Pandas” and “DataFrame” is the keyword used in creating the DataFrame.
This approach was used in our script. We initialized three columns in their parentheses. The name of our first column is “group,” and it contains the eight-string values “Q”, “Q”, “Q”, ” Q “, “W”, “W”, “W”, and “W”. The DataFrame’s second column, “pos,” similarly contains eight string values. They are “Xu,” “Yo,” “Yo”,”Yo”, “Xu,” “Xu,” and “Yu.” The final column in this table, “scores,” contains eight integer values, namely “22,” “26,” “21,” “18,” “18,” “15,” “24,” and “31.” To store the generated DataFrame, we must create a variable or DataFrame object.
Here, “res” is the variable we made with the intent mentioned above. The output obtained by calling the “pd.DataFrame()” method is then given this function. The “print()” method, which shows the result, was used to construct this DataFrame on the terminal. Let’s run this Python program:
import pandas as pd # create DataFrame res = pd.DataFrame({'group':['Q','Q','Q','Q', 'W','W','W','W'], 'pos':['Xu','Yo','Yo','Yo','Xu','Xu','Yo','Yo'], 'scores':[22,26,21,18,18,15,24,31]}) # view DataFrame print(res)
Using the Len() Method on One Condition
We must now determine how many rows in the DataFrame’s chosen column satisfy the given criterion. We will first apply the condition to a single column to determine how many rows match the criteria. The DataFrame’s numerous columns are then subjected to it. We used Pandas’ “len()” method for both approaches. The following syntax is offered for this technique to apply the conditions to a single column:
len(df[df['col1']=='value1')
We used the “len()” method, which counts the number of rows per the syntax shown. We defined a condition with the name of the DataFrame and the DataFrame column inside its brackets. From our DataFrame, we chose the “group” column and gave it a condition.
The condition instructs us to determine whether any value in the “group” column equals “Q.” The “len()” method counts the row that contains the criteria each time it is met.
We now added a variable called “count” to keep the counted value of the rows that met the requirement. To display a text on the terminal prior to the counted rows, we used the “print()” method. We once more used the “print()” method with the “count” variable as the parameter to observe the output of the shown counted rows.
import pandas as pd # create DataFrame res = pd.DataFrame({'group':['Q','Q','Q','Q', 'W','W','W','W'], 'pos':['Xu','Yo','Yo','Yo','Xu','Xu','Yo','Yo'], 'scores':[22,26,21,18,18,15,24,31]}) # view DataFrame print(res) print() # count the number of values in the group column where the value is equal to 'Q' count = len(res[res['group']=='Q']) print('Number of Rows in given dataframe in which "group" is "Q":') print(count)
We can see both our DataFrame and the counted rows that match the condition on the terminal.
The DataFrame has “4” rows that match the criterion, as can be seen. Additionally, you can confirm it by comparing it with the prior DataFrame. The Pandas “len()” technique is used to generate the “group” column because it has 4 “Q” values.
Len() Method Use with Multiple Conditions
In the preceding example, we counted the number of rows that satisfied the requirement for a single column. We will now discover how to add up the rows for two columns. Its syntax is as follows:
len(df[(df['col1']=='value1') & (df['col2']=='value2')])
The function “len()”is responsible for counting the number of rows that satisfy the conditions, which helps to explain this syntax. The name of the DataFrame whose rows we wanted to measure was then specified. The name of the first column with the provided condition is now followed by the name of the second column in the DataFrame. The latter also has a similar condition. The “&” operator is present between both of these requirements. The “And” operator is the name of this operation. The rows will only be counted if both conditions are satisfied between two conditional statements.
We choose the “group” and the “pos” columns for our example. To both of these columns, the conditions were applied. The “group” column condition verifies that the values in this specific column equal “W.” While the “pos” state confirms that the values match “Yo.” The “&” operator verifies the condition and evaluates the values from the output of both values. Therefore, we require the count of rows with “group” equal to “Q” and “pos” similar to “Yo”.
Another variable called “cal” was made. The “len()” function counts the number of rows when the conditions are verified and stores the result in the “outcome” variable. Finally, we used two “print()” techniques: one to display a text and the other to print the rows that were tallied using the “len()” function and saved in the “cal” variable.
import pandas as pd # create DataFrame res = pd.DataFrame({'group':['Q','Q','Q','Q', 'W','W','W','W'], 'pos':['Xu','Yo','Yo','Yo','Xu','Xu','Yo','Yo'], 'scores':[22,26,21,18,18,15,24,31]}) # view DataFrame print(res) print() # count the number of values in the group column where the value is equal to 'Q' count = len(res[(res['group']=='Q']) & (res['pos']=='Yo')]) print('Number of Rows in given dataframe in which "group" is "Q" and "pos" is "Yo" :') print(cal)
Only three rows in the DataFrame match the required criterion, as a result, represents. From the “group” and “pos” columns, only three rows with “Q,” “group,” and “pos” is “Yo.” Spend a few seconds checking the DataFrame shown in the accompanying screenshot to see if the generated output is accurate on your own.
You learned how to use the two columns’ conditions. Applying them to several columns now won’t cause you any problems. Now that all three columns in the DataFrame have the constraints applied to them, we can only count the rows that satisfy all three criteria.
The “group” column is subjected to the first condition, which verifies whether any values are equivalent to “W.” Subsequently, the “group” values, including “W” and the “pos,” is “Yo. ” The last condition, which contains all the others, indicates that the “group” equals “W,” the “pos” is “Yo,” and the “scores” are more than “15.” Pull those records out of the DataFrame. The “len()” function counts the rows and stores the results in the variable “outcome”. Display the output by using the “print()” method.
import pandas as pd # create DataFrame res = pd.DataFrame({'group':['Q','Q','Q','Q', 'W','W','W','W'], 'pos':['Xu','Yo','Yo','Yo','Xu','Xu','Yo','Yo'], 'scores':[22,26,21,18,18,15,24,31]}) # view DataFrame print(res) print() # count rows where the group is 'W' and pos is 'Yo' and points > 15 outcome = len(res[(res['group']=='W']) & (res['pos']=='Yo') & (res['scores'] > 15)]) print('Number of Rows in given dataframe in which "group" is "W," "pos" is "Yo" and "scores" is greater than "15" is:') print(outcome)
Conclusion
Among the most popular tools for data cleaning and processing in data science and machine learning is called Pandas. When using the Pandas DataFrame to store and analyze your data, you might need to acquire several rows that are already existing in the DataFrame.
You might need to quickly count the instances of the same or different entries in your entire dataset or in specific rows that satisfy any given criterion for the data handling procedure. By counting the number of rows and columns in a DataFrame, Pandas enables us to determine the shape of the DataFrame.
There are many ways to comprehend the idea of counting the number of rows and columns in Pandas. These include the methods “len(),” “df.shape[0],” “df[df.columns[0]].count(),” “df.count(),” and “df.size().”