Python Pandas Basics in Hindi | Data Handling Part 2 for Data Analysts
Data Handling Part 2: Pandas 📊
Agar aap data analyst banna chahte hain, to Pandas aapka sabse important tool hai.
Data science, analytics, aur machine learning ki duniya me sabse pehla practical tool jo log sikte hain, wo Pandas hota hai. Isse aap raw data ko readable format me la sakte ho, usko clean kar sakte ho, filter kar sakte ho, aur analysis ke liye ready bana sakte ho. Jo kaam Excel me bahut time leta hai, Pandas usko kuch hi lines of code se kar deta hai.
Pandas is widely used in:
- Data analysis
- Data cleaning
- Machine learning pipelines
- Business analytics
- Financial analysis
Agar aap future me reporting, dashboarding, ya AI/ML projects pe kaam karna chahte ho, to Pandas ek aisi skill hai jo har jagah kaam aayegi. Isi liye, agar aap data analyst ke roop me career banana chahte hain, to Pandas essential hai.
1️⃣ Install Pandas
Sabse pehle aapko Pandas install karna hota hai:
pip install pandas
Ye command aapko Command Prompt, Terminal, ya Anaconda Prompt me run karni hoti hai. Ek baar install ho gaya to aap har Python project me isko import karke use kar sakte hain.
2️⃣ Import Pandas
Pandas ko code me import karne ka standard tareeka:
import pandas as pd
Yaha pd Pandas ka standard alias hai jo sab use karte hain.
Jab bhi aap Pandas ke functions use karte ho, aap pd.function_name() ya df.method() jaisa syntax dekhoge.
3️⃣ What is a Series?
Series ek one‑dimensional labeled array hota hai. Isko aap Excel ke single column ya Python list ke advanced version ke taur par samajh sakte hain, jisme har value ke sath ek index hota hai.
import pandas as pd data = [10, 20, 30, 40] s = pd.Series(data) print(s)
Output:
0 10 1 20 2 30 3 40
4️⃣ What is a DataFrame?
DataFrame ek table jaisa structure hota hai, bilkul Excel sheet ki tarah. Isme rows aur columns hote hain, aur har column ek Series ki tarah behave karta hai.
import pandas as pd
data = {
"Name": ["Amit", "Deepak", "Ravi"],
"Marks": [78, 85, 90]
}
df = pd.DataFrame(data)
print(df)
Output:
Name Marks 0 Amit 78 1 Deepak 85 2 Ravi 90
Series → single column jaisa data.
DataFrame → complete table jisme multiple columns hote hain.
Short comparison: Pandas vs Excel
Excel beginners ke liye easy hai, lekin jab data bada ho jata hai (lakho rows), formulas complex ho jate hain, aur bar‑bar same kaam repeat karna padta hai, tab Pandas zyada useful ho jata hai.
- Excel me kaam mostly manual hota hai, Pandas me code se automate ho jata hai.
- Excel me limit hoti hai, Pandas me aap millions of rows handle kar sakte ho (system RAM ke hisaab se).
- Version control, reproducible reports, aur ML integration Pandas me bahut easy hota hai.
5️⃣ Reading CSV Files
Data analysis me CSV files bahut common hoti hain. Pandas se CSV ko read karna bahut easy hai:
import pandas as pd
df = pd.read_csv("students.csv")
print(df)
Isse aap external file ko DataFrame me laa kar usko filter, sort, group, ya visualize kar sakte ho. Real projects me aap aksar CSV, Excel, database, ya API se data load karoge.
6️⃣ Basic Data Exploration
Data load karne ke baad sabse pehle usko explore karna zaroori hai:
print(df.head()) # first rows print(df.tail()) # last rows print(df.shape) # rows and columns print(df.columns) # column names
Agar output me (100, 5) aaye to iska meaning hai: 100 rows, 5 columns.
Isse aapko turant pata chal jata hai ki dataset kitna bada hai aur usme kaun‑kaun se columns hain.
7️⃣ Selecting Columns
Single column select karne ke liye:
print(df["Name"])
Multiple columns select karne ke liye:
print(df[["Name", "Marks"]])
Ye step important hai kyunki analysis me aksar aapko kuch specific columns ke sath kaam karna hota hai, saare columns ke nahi.
8️⃣ Basic Statistics
Marks column par basic statistics:
print(df["Marks"].mean()) print(df["Marks"].max()) print(df["Marks"].min())
In teen functions se aap quick insights nikal sakte ho, jaise class ka average score, highest marks, aur lowest marks. Similar tareeke se aap sum, median, standard deviation etc. bhi nikal sakte hain.
Real Example — Sales Data
import pandas as pd
data = {
"Product": ["A","B","C"],
"Sales": [100,150,200]
}
df = pd.DataFrame(data)
print("Total Sales:", df["Sales"].sum())
Is example me humne total sales calculate ki. Isi logic ko aap monthly sales report, product analysis, ya revenue tracking me bhi apply kar sakte ho.
Real‑world mini use case: Student result analysis
Socho aapke paas ek CSV hai jisme 500 students ke naam, marks aur subjects ki details hain. Excel me manually topper find karna, average nikalna, fail/pass count karna kaafi time lega.
Pandas me aap:
- Average marks per subject nikal sakte ho.
- Top 10 students filter kar sakte ho.
- Fail hone wale students ka separate list bana sakte ho.
Ye sab kaam kuch hi lines of code se ho jata hai, jo data analyst ke daily workflow ko bahut fast bana deta hai.
Practice Tasks
1. Create DataFrame of students with name and marks.
2. Print first 3 rows using head().
3. Find average marks.
4. Find highest marks.
5. Select only Name column.
✅ Practice Task Solutions
Task 1. Create DataFrame of students with name and marks
import pandas as pd
data = {
"Name": ["Amit", "Deepak", "Ravi", "Neha"],
"Marks": [78, 85, 90, 82]
}
df = pd.DataFrame(data)
print(df)
Output example:
Name Marks 0 Amit 78 1 Deepak 85 2 Ravi 90 3 Neha 82
Task 2. Print first 3 rows using head()
print(df.head(3))
Output:
Name Marks 0 Amit 78 1 Deepak 85 2 Ravi 90
Task 3. Find average marks
print("Average Marks:", df["Marks"].mean())
Example output: Average Marks: 83.75
Task 4. Find highest marks
print("Highest Marks:", df["Marks"].max())
Output: Highest Marks: 90
Task 5. Select only Name column
print(df["Name"])
Output:
0 Amit 1 Deepak 2 Ravi 3 Neha Name: Name, dtype: object
✅ Key Learning
pd.DataFrame()→ table create karne ke liyedf.head(n)→ preview first rowsdf["column"].mean()→ average valuedf["column"].max()→ highest valuepd.read_csv()→ dataset load karne ke liye- Series → single column
- DataFrame → table / dataset
Pandas data analysis ko simple aur fast bana deta hai. Jitni zyada practice karoge, utni jaldi aap real‑world datasets handle kar paoge, chahe vo students ka data ho, sales ho, ya website analytics.

टिप्पणियाँ
एक टिप्पणी भेजें