Python for Data Science

Ankit Rathi
9 min readJun 25, 2021

This blog post is about learning Python for Data Science. These are the topics that are covered in this post:

  1. Introduction
  2. Syntax & Variable Types
  3. Data Types and Conversion
  4. Basic Operators and Loops
  5. Functions, Exceptions and Modules
  6. Data Science Specific Modules

Please note that Python programming language being a vast topic, this post doesn’t cover each and every nitty-gritty of the language in detail, just provides the introduction to various capabilities of the language with respect to data science. Links are given after each topic to explore it in detail further.

Introduction

Python is interpreted, interactive, object-oriented & beginner’s language. It is easy to learn, read and maintain. It provides all the latest features of advanced programming language like portability, extendability, scalability etc.

Refer to the following article to get an overview of Python:

Installation

Python supports all the major Operating Systems (OS) like Windows, Linux, MacOS etc. Installation has mainly two simple steps:

  1. Download the appropriate python installer.
  2. Run the installer on the OS.

You can refer the following link for installing Python step-by-step.

Syntax & Variable Types

Interactive mode

Command line environment by default brings up the prompt, which is an interactive mode.

# simply printprint("Hello world")

Script mode

Invoking the interpreter with a script parameter begins the execution of the script and continues until the script is finished.

# run a scriptpython test.py

Identifiers

A Python identifier is a name used to identify a variable, function, class, module or other objects. Python is a case-sensitive programming language.

# different identifiers (case-sensitive)DataScience = 'AI'
datascience = 'not AI'

Reserved words

Reserved words are which you cannot use as constant or variable or any other identifier names i.e. and, or, if, for, else, while, etc.

Lines & Indentation

In Python, line indentation defines the block structure.

# this is a blockif True:
print("True")
else:
print("False")

Quotations

Quotations are used to denote string literals. Python accepts single (‘), double (“) and triple (‘’’ or “””) quotes.

# using single, double & triple quotesword = 'word'
sentence = "This is a sentence."
paragraph = """This is a paragraph. It is
made up of multiple lines and sentences."""

Comments

Comments are used to explain the code and make it more readable. Single and multi-line comments are used like this:

# first comment
print "Hello, Python!"
# second comment
'''
This is a multi-line
comment.

'''

You can learn more about Python syntax in the following article:

Standard Data Types

Python has five standard data types −Numbers, String, List, Tuple, Dictionary

Number data types store numeric values.

# assign an integervar1 = 2

Strings are identified as a contiguous set of characters represented in the quotation marks.

# assign a stringstr1 = 'Lets Learn Python'

A list contains items separated by commas and enclosed within square brackets ([]).

# initialize the listlist = [ 'abcd', 786 , 2.23, 'john', 70.2 ]
tinylist = [123, 'john']

A tuple consists of a number of values separated by commas. Unlike lists, however, tuples are enclosed within parentheses. List elements and size can be changed, while tuples cannot be updated.

# initialize the tupletuple = ( 'abcd', 786 , 2.23, 'john', 70.2  )
tinytuple = (123, 'john')

Python’s dictionaries are kind of hash table types. A dictionary key can be almost any Python type, but are usually numbers or strings. Values, on the other hand, can be any arbitrary Python object.

# initialize the dictionarydict = {}
dict['one'] = "This is one"
dict[2] = "This is two"
tinydict = {'name': 'john','code':6734, 'dept': 'sales'}

Data Type Conversion

Many times, you need to perform conversions between the built-in types. To convert between types, you simply use the type name as a function. There are several built-in functions to perform conversion from one data type to another.

The following article provides more information about variable types in Python:

Basic Operators

Operators are the constructs that can manipulate the value of operands. Python supports the following types of operators: Arithmetic, Comparison, Assignment, Logical, Bitwise, Membership, Identity Operators

Arithmetic Operators

Addition (+), substraction (-), multiplication (*), division (/), exponent (**) are main arithmetic operators.

# arithmetic operatorsa = 20
b = 10
a+b # returns 30
a-b # returns 10

Comparison Operators

Equal to (==), not equal to (!= or <>), greater than (>), lower than (<) are main comparison operators.

# comparison operatorsa = 20
b = 10
a == b # returns False
a <> b # returns True

Assignment Operators

Assign (=), assign & add (+=), assign & subtract (-=), assign & multiply (*=) are main assignment operators.

# assignment operatorsc = a+b
c += a # equivalent to c = c+a
c *= a # equivalent to c = c*a

Bitwise Operators

Binary AND (&), OR (|), XOR (^) are the main bit-wise operators.

# bitwise operatorsa = 0011 1100
b = 0000 1101
a & b # returns 0000 1100
a | b # returns 0011 1101

Logical Operators

Logical AND (and), OR (or), NOT (not) are the main logical operators.

# logical operatorsa = True
b = False
a and b # returns False
a or b # returns True

You can learn more about Python basic operators in the following article:

Decision Making & Loops

Decision Making

Decision-making is anticipation of conditions occurring while execution of the program and specifying actions taken according to the conditions. Conditions can be nested for complex decision-making.

if statement

# if statementvar1 = 100
if var1 == 100:
print("Amount is 100")

if-else statement

# if else statementvar1 = 100
if var1 == 100:
print("Amount is 100")
else:
print("Amount is not 100")

nested if-else statement

# nested if-else statementvar = 100
if var < 200:
print "Amount is less than 200"
if var > 100:
print "Amount is more than 100"
elif var <= 100:
print "Amount is less than or equal to 100"
else:
print "Amount is more than or equal to 200"

The following article provides more information about decision making in Python:

Loops

A loop statement allows us to execute a statement or group of statements multiple times. Mainly while & for loops are used, loops can be nested for complex iterations.

while loop statement

# while loop statementcount = 0
while (count < 9):
print('The count is:', count)
count = count + 1

for loop statement

# for loop statementcities = ['Agra', 'Delhi',  'Lucknow']
for city in cities: # Second Example
print('Current city :', city)

nested loop statement

# nested loop statementA = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
f = 1
print(A)
for i in range(0, 3):
f *= 10
for j in range(0, 3):
A[i][j] *= f
print(A)

You can learn more about Python loops in the following article:

Functions, Exceptions & Modules

Functions

A function is a block of organized, reusable code that is used to perform a single, related action. Functions provide better modularity for your application and a high degree of code reusing.

  1. Defining a function
  2. Calling a function
# define a functiondef my_info( name, age ):
print("My name is ", name)
print("I am %d year old", age)
return;# call the function
my_info(name="Tom", age=25)

The following article provides more information about using functions in Python:

Exceptions

Python provides two very important features to handle any unexpected error in your Python programs and to add debugging capabilities in them:

Exceptions: An exception is an event, which occurs during the execution of a program that disrupts the normal flow of the program’s instructions.

# raise an exception if no read/write permissiontry:
fh = open("file", "r")
fh.write("This is the file for exception handling!!")
except IOError:
print "Error: can\'t find file or read/write data"
else:
print "Content written in the file successfully"

Assertions: An assertion is a sanity check that you can turn on or turn off when you are done with your testing of the program.

# applying assertiondef k2f(tmp):
assert (temp >= 0),"Its colder than O!"
return ((temp-273)*1.8)+32
print(int(k2f(500)))
print(k2f(-5))

You can learn more about Python exceptions in the following article:

Modules

A module allows you to logically organize your Python code. Grouping related code into a module makes the code easier to understand and use. A module is a file consisting of Python code. A module can define functions, classes and variables. A module can also include runnable code.

# import built-in module mathimport mathcontent = dir(math)
print(content)

The following article provides more information about modules in Python:

Data Science Specific Modules

NumPy

NumPy is a library consisting of multidimensional array objects and a collection of routines for processing those arrays. Using NumPy, mathematical and logical operations on arrays can be performed.

Using NumPy, a developer can perform the following operations −

  • Mathematical and logical operations on arrays.
  • Fourier transforms and routines for shape manipulation.
  • Operations related to linear algebra. NumPy has in-built functions for linear algebra and random number generation.
# create & transpose array using NumPyimport numpy as npa = np.arange(12).reshape(3,4) 
print(a)
b = np.transpose(a)
print(b)

You can learn more about NumPy module in following article:

SciPy

SciPy, a scientific library for Python is an open-source, BSD-licensed library for mathematics, science, and engineering. The SciPy library depends on NumPy, which provides convenient and fast N-dimensional array manipulation. The main reason for building the SciPy library is that it should work with NumPy arrays.

# optimize an array using minimize function using SciPyimport numpy as np
from scipy.optimize import minimize
def rosen(x):
x0 = np.array([1.3, 0.7, 0.8, 1.9, 1.2])
res = minimize(rosen, x0, method='nelder-mead')print(res.x)

The following article provides more information about SciPy module:

Pandas

Pandas is an open-source Python library providing high-performance data manipulation and analysis tools using its powerful data structures. The name Pandas is derived from the word Panel Data — an Econometrics from Multidimensional data.

# read csv file into a dataframeimport pandas as pd
df_train = pd.read_csv(PATH, names = COLUMNS, index_col=False)
df_train.head(5)

You can learn more about the Pandas module in the following article:

MatPlotLib

Matplotlib is a plotting library for Python. It is used along with NumPy to provide an environment that is an effective open-source alternative for MatLab. It can also be used with graphics toolkits like PyQt and wxPython.

# display sine wave using MatPlotLibimport numpy as np 
import matplotlib.pyplot as plt
# compute the x and y coordinates for points on a sine curve x = np.arange(0, 3 * np.pi, 0.1)
y = np.sin(x)
plt.title("sine wave form")
# plot the points using matplotlib
plt.plot(x, y)
plt.show()

The following article provides more information about MatPlotLib module:

Scikit-Learn

Scikit-Learn is an open-source Python library for machine learning. The library supports state-of-the-art algorithms such as kNN, XGBoost, random forest, SVM among others. It is built on top of NumPy.

# split dataframe into train and test using sklearnfrom sklearn.model_selection import train_test_splitX_train, X_test, y_train, y_test = train_test_split(df_train[features], df_train.label, test_size = 0.3, random_state=0)

You can learn more about SkLearn module in the following article:

Concluding Thoughts

After going through these different topics in Python, I believe data science starters may start feeling confident that they can start working with Python on various Data Science problems/projects.

One topic that I have not captured here is OOPs concepts in Python, which also being a big topic in itself, I will cover in a separate blog post.

Ankit Rathi is a Principal Data Scientist, published author & well-known speaker. His interest lies primarily in building end-to-end AI applications/products following best practices of Data Engineering and Architecture.

--

--