3. Python as a Data-Analysis Tool
Why Python?
- Python is and object-oriented programming language. Hence creating classes and objects are easy.
- Simple syntax.
- Runs on an interpreter system, means that code runs as soon as it is written hence prototyping becomes easier.
- Huge collection of standard libraries.
- Can easily get integrated with third party tools hence it become cost saving for companies.
- Python is open source and has a huge community support.
Introduction to Jupyter Notebook
- An IDE (Integrated Development Environment): An IDE is a software suite that consolidates the basic tools that developers need to write and test software.
- Jupyter Notebook: Jupyter Notebook was previously called IPython Notebook.
- Versatility and Shareability: Jupyter Notebook is versatile and shareable.
- Data Visualization: It has the ability to display data visualizations in the same window.
- Open-Source Web Application: Jupyter Notebook is an open-source web application.
- Notebook: A notebook refers to a collection of codes, documents, and visualizations all in one place.
- Independent Cells: Jupyter Notebook has independent cells for different parts of the code, allowing execution of specific code sections without running the entire code.
Features of Jupyter Notebook
- Python3 Notebook Dropdown Menu: In Jupyter Python3 Notebooks, there is a 4-option dropdown menu:
- Code: This option provides a cell where you can write Python code.
- Text: This option provides a cell where you can write notes in text format.
- Raw NB Convert: This option provides a cell for converting the notebook into other formats, such as HTML.
- Header: This option takes you to the section where you can write a header for the notebook. Note that you need to use a double pound (##) sign to ensure that anything after it is treated as a header.
Data Types in Python
Python Data Types
- int: Integer, a whole number without decimals, e.g., 5, -10, 42.
- float: Floating-point number, a number with decimals, e.g., 5.6, -3.14, 2.0.
- str: String, a sequence of characters, e.g., "Hello", "12345".
- bool: Boolean, representing True or False values.
- list: Ordered collection of items, e.g., [1, 2, 3], ["apple", "banana"].
- tuple: Immutable ordered collection of items, e.g., (1, 2, 3), ("apple", "banana").
- set: Unordered collection of unique items, e.g., {1, 2, 3}, {"apple", "banana"}.
- dict: Dictionary, a collection of key-value pairs, e.g., {"name": "John", "age": 25}.
- NoneType: Represents the absence of a value or a null value, e.g., None.
Python Data Types - Key Points
To check data type of any variable: Use the inbuilt type()
function.
type(var) will return the type of the variable passed as an argument.
Integers: In Python, there is no limit to how long an int value
can be. Integers are represented by the int class.
Floats: Floats can be represented using e or E
notation, which is called scientific notation.
Floating-point numbers are represented by the float class.
Sequence Data Types: In Python, sequences are ordered collections of similar or different data types. Sequences allow storing multiple values in an organized and efficient manner.
Strings: Strings in Python are arrays of bytes representing UNICODE
characters. A string is a collection of one or more characters enclosed in single, double,
or triple quotes. Strings are represented by the str class.
Individual characters of strings can be accessed by indexing.
Booleans: Booleans are Python data types with two built-in values:
True and False.
Note that only these two formats are valid for True and False in Python.
Basic Operations
Python Basic Operations - Key Points
Arithmetic Operations: Python supports basic arithmetic operations like addition, subtraction, multiplication, division, and modulus.
- Addition:
a + b - Subtraction:
a - b - Multiplication:
a * b - Division:
a / b - Modulus (Remainder):
a % b - Exponentiation (Power):
a ** b - Floor Division:
a // b
Comparison Operations: Used to compare two values and return a boolean result.
- Equal to:
a == b - Not equal to:
a != b - Greater than:
a > b - Less than:
a < b - Greater than or equal to:
a >= b - Less than or equal to:
a <= b
Logical Operations: Used to perform logical operations and return boolean results.
- AND:
a && b - OR:
a || b - NOT:
not a
Assignment Operations: Used to assign values to variables.
- Simple assignment:
a = b - Add and assign:
a += b - Subtract and assign:
a -= b - Multiply and assign:
a *= b - Divide and assign:
a /= b - Modulo and assign:
a %= b - Exponentiate and assign:
a **= b - Floor divide and assign:
a //= b
Membership Operations: Used to check if a value is present in a sequence (e.g., a list, tuple, or string).
- In:
a in b - Not in:
a not in b
Identity Operations: Used to compare the memory locations of two objects.
- Is:
a is b - Is not:
a is not b
Condition Statements & Loops
Branching in Python (if-else)
Basic Syntax:
if condition:
# Execute if condition is True
elif another_condition:
# Execute if another_condition is True
else:
# Execute if no condition is True
Example:
def check_number(num):
if num > 0:
print("Positive number")
elif num < 0:
print("Negative number")
else:
print("Zero")
check_number(10) # Positive number
Key Points:
- if - checks the condition; executes if True.
- elif - checks an alternative condition if the first is False.
- else - executes if all the previous conditions are False.
- Conditions are evaluated in order. Once a condition is found to be True, no other conditions are checked.
For Loop in Python
Basic Syntax:
for item in iterable:
# Execute for each item in iterable
Example:
fruits = ["apple", "banana", "cherry"]
for fruit in fruits:
print(fruit) # apple, banana, cherry
Key Points:
- Used to iterate over sequences (lists, tuples, strings, etc.).
- Can loop over a range using
range(start, stop, step). - Loops through each element one by one.
While Loop in Python
Basic Syntax:
while condition:
# Execute as long as condition is True
Example:
count = 1
while count <= 5:
print(count)
count += 1 # Increment count to avoid infinite loop
Key Points:
- The loop continues as long as the condition is True.
- Ensure that the condition will eventually become False to avoid infinite loops.
- Useful when the number of iterations is not known beforehand.
Functions in Python
Functions in Python
Basic Syntax:
def function_name(parameters):
# Function body
# Code to execute
return result # Optional, to return a value
Example:
def greet(name):
return "Hello, " + name + "!"
message = greet("Alice")
print(message) # Output: Hello, Alice!
Key Points:
- def is used to define a function in Python.
- The function can take parameters (input) and return a value (output).
- Functions can be called multiple times with different arguments.
- If no
returnstatement is used, the function returnsNoneby default. - Functions allow you to organize code, reuse logic, and make the code more modular and readable.
- You can define default values for function parameters (e.g.,
def greet(name="Guest")).
Basic Libraries
Python Libraries
Definition:
A library is a collection of functions and methods which allows the user to perform actions without writing lengthy code to achieve a task.
Common Libraries used in Data Analysis:
- Pandas: Helps in data manipulation and analysis.
- NumPy: Helps in performing complicated mathematical calculations for large datasets.
- Matplotlib: Helps in plotting different types of graphs in Python.
Installing Libraries in Conda Environment:
To install a library, use the following command in the Conda environment:
(for Anaconda Environment): conda install <library name>
(for Python Environment): pip install <library name>