TABLE OF CONTENTS (HIDE)

Python

Getting Started and Basics

I shall assume that you are familiar with some programming languages such as C/C++/Java. This article is not meant to be an introduction to programming.

Python By Examples

This section is for experience programmers to look at Python's syntaxes and those who need to refresh their memory.

Example 1: wc.py (Word Count)

Create the following source file and save as wc.py.

#!/usr/bin/env python3
# -*- coding: UTF-8 -*-
"""
wc - word count
~~~~~~~~~~~~~~~
Read a file given in the command-line argument and print the number of
lines, words and characters - similar to UNIX's wc utility.

Usage: ./wc.py <filename>
"""
    # The above is the module's doc-string for documentation
    # Multi-line strings are delimited by triple single/double quotes

# Check if a filename is given in command-line arguments
import sys              # Using 'sys.argv' and 'sys.exit()'
if len(sys.argv) != 2:  # Command-line arguments are kept in a list 'sys.argv'
    print('Usage: ./wc.py <filename>')
    sys.exit(1)         # Return a non-zero value to indicate abnormal termination
    # Note: Python uses indentation instead of {} for body block
 
# Python's variable has no type. No prior declaration needed.
# Variables are created via the initial assignments.
num_words = num_lines = num_chars = 0  # chain assignment
 
# Get input file name from 'sys.argv'
# sys.argv[0] is the script name, sys.argv[1] is the filename.
with open(sys.argv[1]) as infile: # 'with-as' closes the file automatically
    for line in infile:           # Process each line (including newline) in a for-loop
        num_lines += 1            # No ++ operator in Python?!
        num_chars += len(line)
        line = line.strip()       # Remove leading and trailing whitespaces
        words = line.split()      # Split into a list using whitespace as delimiter
        num_words += len(words)
 
# Print results
print('Number of Lines is %d' % num_lines)  # C-like printf()
print('Number of Words is %d' % num_words)
print('Number of Characters is %d' % num_chars)

To run the script:

$ python wc.py <filename>  # Invoke Python Interpreter to run the script

# OR
$ chmod u+x wc.py      # Set the script executable
$ ./wc.py <filename>   # Execute the script
Dissecting the Program
  1. Line 1 is applicable to the Unix environment only. It is known as the Hash-Bang (or She-Bang) for specifying the Python Interpreter, so that the script can be executed directly as a standalone program.
  2. The optional Line 2 specifies the source encoding scheme. We choose and recommend UTF-8 for internationalization. This special format is recognized by many popular editors for saving the source code in the specified encoding.
  3. The script begins by the so-called doc-string to provide the documentation. Python's string can be delimited by single quotes or double quotes. Python also supports multi-line string, delimited by triple single/double quotes.
  4. Python's comment begins with a '#' and lasts until the end-of-line. Python does not support multi-line comments.
  5. In this script, we use the sys module from the Python's standard library to retrieve the command-line arguments (sys.argv) and to terminate the program (sys.exit()). In Python, you need to import the module before using it.
  6. The command-line arguments are stored in a variable sys.argv, which is a list (similar to array). The first item of the list (sys.argv[0]) is the script name, followed by the other command-line arguments.
  7. We use the built-in function len() to verify that the length of the list is 2. Take note that Python uses indentation to indicate body-block instead of { } (as in C/C++/C#/Java).
  8. Python's variable has no type, and does not need to be declared. A variables is created via the initial assignment.
  9. We open the file via with-as statement, which closes the file automatically upon exit.
  10. We use a for-in loop to process each line of the file (count the lines, words, and characters).
  11. We format the output via the '%' operator, in the form of 'formatting-str' % args. The formatting-string could contain C's printf-like format-specifiers such as %d (for integer), %6.2f (for float), %s (for string).
  12. [TODO] more

[TODO] More examples

Introduction

Python is created by Dutch Guido van Rossum around 1991. Python is an open-source project. The mother site is www.python.org.

The main features of Python are:

  • Python is an easy and intuitive language. Python scripts are easy to read and understand.
  • Python (like Perl) is expressive. A single line of Python code can do many lines of code in other general-purpose languages (such as C/C++/Java).
  • Python is free and open-source. It is cross-platform and runs on Windows, Linux/UNIX, and Mac OS.
  • Python is well suited for rapid application development (RAD). You can code an application in Python in much shorter time than other general-purpose languages (such as C/C++/Java). Python can be used to write small applications and rapid prototypes, but it also scales well for developing large-scale project.
  • Python is a scripting language. Like most of the scripting languages (e.g., Perl, JavaScript), Python associates types with objects, instead of variables. That is, a variable can be assigned a value of any type, a list (array) can contain objects of different types.
  • Python provides high-level data types such as dynamic array and dictionary (or associative array).
  • Python is object-oriented.
  • Python is not a fully compiled language. It is compiled into internal byte-codes, which is then interpreted. Hence, Python is not as fast as fully-compiled languages such as C/C++.
  • Python comes with a huge set of libraries including graphical user interface (GUI) toolkit, web programming library, networking, and etc.

Python has 3 versions:

  • Python 1: the initial version.
  • Python 2: released in 2000, with many new features such as garbage collector and support for Unicode.
  • Python 3 (Python 3000 or py3k): A major upgrade released in 2008. Python 3 is NOT backward compatible with Python 2.
Python 2 or Python 3?

Currently, two versions of Python are supported in parallel, version 2.7 and version 3.5. There are unfortunately incompatible. This situation arises because when Guido Van Rossum (the creator of Python) decided to bring significant changes to Python 2, he found that the new changes would be incompatible with the existing codes. He decided to start a new version called Python 3, but continue maintaining Python 2 without introducing new features. Python 3.0 was released in 2008, while Python 2.7 in 2010.

AGAIN, TAKE NOTE THAT PYTHON 2 AND PYTHON 3 ARE NOT COMPATIBLE!!! You need to decide whether to use Python 2 or Python 3.

Installation and Getting Started

Installation

Ubuntu 16.04LTS

Both the Python 2.7 and Python 3.5 should have already installed by default. You can verify via these commands:

$ python3 --version
Python 3.5.2
$ python --version
Python 2.7.12

Otherwise, you can install Python via:

# Installing Python 2
$ sudo apt-get install python
# Installing Python 3
$ sudo apt-get install python3

To verify the Python installation:

# List packages beginning with python
$ dpkg --get-selections | grep python
python						install
python2.7					install
python3						install
python3.5					install
......

# Show status of specific package
$ dpkg --status python2.7
Version: 2.7.12-1ubuntu0~16.04.1
......
$ dpkg --status python3.5
Version: 3.5.2-2ubuntu0~16.04.1
......

# Locate the Python Interpreters
$ which python
/usr/bin/python
$ which python3
/usr/bin/python3
$ ll /usr/bin/python*
lrwxrwxrwx 1 root root       9 xxx xx  xxxx python -> python2.7*
lrwxrwxrwx 1 root root       9 xxx xx  xxxx python2 -> python2.7*
-rwxr-xr-x 1 root root 3345416 xxx xx  xxxx python2.7*
lrwxrwxrwx 1 root root       9 xxx xx  xxxx python3 -> python3.5*
-rwxr-xr-x 2 root root 3709944 xxx xx  xxxx python3.5*
-rwxr-xr-x 2 root root 3709944 xxx xx  xxxx python3.5m*
lrwxrwxrwx 1 root root      10 xxx xx  xxxx python3m -> python3.5m*
      # Clearly,
      # "python" and "python2" are symlinks to "python2.7".
      # "python3" is a simlink to "python3.5".
      # "python3m" is a simlink to "python3.5m".
      # "python3.5" and "python3.5m" are hard-linked (having the same inode and hard-link count of 2), i.e., identical.
Windows

From http://www.python.org/download/, download the 32-bit or 64-bit MSI installer, and run the downloaded installer.

Mac OS

[TODO]

Checking Python Version

To check the version for a Python Interpreter, use --version (or -V) flag, e.g.,

$ python --version
Python 2.7.12
$ python3 --version
Python 3.5.2

Documentation

Python documentation and language reference are provided online @ http://docs.python.org.

Getting Started

Interactive Python Command-Line Shell

You can run the Python Interpreter in interactive mode under a command-line shell.

  • In Ubuntu/Mac OS:
    $ python
    Python 2.7.12 
    ......
    Type "help", "copyright", "credits" or "license" for more information.
    >>>
    $ python3
    Python 3.5.2
    ......
    Type "help", "copyright", "credits" or "license" for more information.
    >>>
  • In Windows: Click the START button ⇒ Python ⇒ Python (Command-line); or run "python.exe" from the Python installed directory.

The Python's command-prompt is denoted as >>>. You can enter Python statement at the command-prompt, e.g.,

>>> print('hello, world')
hello, world
>>> x = 123
>>> x
123
>>>
Exiting Python Command-Line Session

To exit an interactive command-line session:

  • Type exit(), or
  • (For Ubuntu/Mac OS) Press Ctrl-D (for End-of-File (EOF)), or
  • (For Windows) Press Ctrl-Z + Enter
First Python Script - hello.py

Use a programming text editor to write the following Python script and save as "hello.py" in a directory of your choice:

1
2
3
4
5
print('Hello, world')         # Print a string
print(2 ** 88)                # Print 2 raises to the power of 88
                              # Python's integer is unlimited in size!
print(8.01234567890123456789) # Print a float
print((1+2j) * (3*4j))        # Python supports complex numbers!

Program Notes:

  • Statements beginning with a # until the end-of-line are comments.
  • The print() function can be used to print a value to the console.
  • Python's strings can be enclosed with single quotes '...' (Line 1) or double quotes "...".
  • Python's integer is unlimited in size (Line 2).
  • Python support floats (Line 4).
  • Python supports complex numbers (Line 5) and other high-level data types.
  • By convention, Python script (module) filenames are in all-lowercase.

The expected outputs in Python 2 are:

Hello, world
309485009821345068724781056
8.0123456789
(-24+12j)

The expected outputs in Python 3 are:

Hello, world
309485009821345068724781056
8.012345678901234
(-24+12j)
Running Python Scripts

You can develop/run a Python script in many ways - explained in the following sections.

Running Python Scripts via System Command Shell

You can run a python script via the Python Interpreter under the System's Command Shell (e.g., Windows Command Prompt, Linux/UNIX/Mac OS Terminal/Bash Shell).

  • In Linux/Mac OS Bash Shell:
    $ cd <dirname>     # Change directory to where you stored the script
    $ python hello.py  # Run the script via the Python 2 interpreter (or "python3 hello.py")
  • In Windows Command Prompt: Start a CMD by entering "cmd" in the start menu.
    > cd <dirname>     # Change directory to where you stored the script
     
    > python hello.py  # Run the script via the Python Interpreter
    > hello.py         # if ".py" file is associated with Python Interpreter
Unix Executable Shell Script

In Linux/UNIX/Mac OS, you can turn a Python script into an executable program (called Shell Script or Executable Script) by:

  1. Start with a line beginning with #! (called "hash-bang" or "she-bang"), followed by the full-path name to the Python Interpreter, e.g.,
    #!/usr/bin/python3
    print('Hello, world')
    print(2 ** 88)
    print(8.01234567890123456789)
    print((1+2j) * (3*4j))
    To locate the Python Interpreter, use command "which python" or "which python3".
  2. Make the file executable via chmod (change file mode) command:
    $ cd /path/to/project-directory
    $ chmod u+x hello.py  # enable executable for user-owner
    $ ls -l hello.py      # list to check the executable flag
    -rwxrw-r-- 1 uuuu gggg 314 Nov  4 13:21 hello.py
  3. You can then run the Python script just like any executable program. The system will look for the Python Interpreter from the she-bang line.
    $ cd /path/to/project-directory
    $ ./hello.py

The drawback is that you have to hard code the path to the Python Interpreter, which may prevent the program from being portable across different machines.

Alternatively, you can use

#!/usr/bin/env python3
......

The env utility will locate the Python Interpreter (from the PATH entries). This approach is recommended as it does not hardcode the Python's path.

Running Python Scripts inside Python Command-Line Shell

To run a script inside Python's command-line shell:

# Python 3 and Python 2
$ python3
......
>>> exec(open('/path/to/hello.py').read())

# Python 2
$ python2
......
>>> execfile('/path/to/hello.py')
# OR
>>> exec(open('/path/to/hello.py'))
  • You can use either absolute or relative path for the filename. But, '~' (for home directory) does not work?!
  • The open() function open the file, in default read-only mode.
  • The read() function reads the entire file.
Environment Variables PATH/PYTHONPATH and Python System Variable sys.path

The environment variable PATH shall include the path to Python Interpreter "python".

Python system variable sys.path is a list of directories for searching Python modules. It is initialized from the environment variable PYTHONPATH, plus an installation-dependent default. The environment variable PYTHONPATH, by default, is empty.

To show the sys.path for the Python Interpreter:

$ python
Python 2.7.12 
......
>>> import sys
>>> sys.path
['', '/usr/lib/python2.7', '/usr/local/lib/python2.7/dist-packages',
 '/usr/lib/python2.7/dist-packages', ......]
$ python3
Python 3.5.2 
......
>>> import sys
>>> sys.path
['', '/usr/lib/python3.5', '/usr/local/lib/python3.5/dist-packages',
 '/usr/lib/python3/dist-packages', ......]

To show the PATH and PYTHONPATH environment variables in Ubuntu/MacOS:

$ printenv PATH
......:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:......
      # Path entries are separated by colon (:) in Unix and semicolon(;) in Winddows
      # Environment variables are case-sensitive in Unix and not case sensitive in Windows
$ printenv PYTHONPATH
      # Empty (default)

Python IDEs and Debuggers

Picking a IDE with a powerful debugger is CRITICAL in program development!!!

IDLE

Python IDLE (Interactive DeveLopment Environment) is a simple IDE with features such as syntax highlighting, automatic code indentation and debugger. I strongly recommend it for learning Python.

Installing/Launching IDLE
  • For Ubuntu: To install Python IDLE for Python 2 and Python 3, respectively:
    # Install IDLE for Python 2
    $ sudo apt-get install idle
    # Install IDLE for Python 3
    $ sudo apt-get install idle3
    
    # Verify the installation
    $ which idle
    /usr/bin/idle
    $ which idle3
    /usr/bin/idle3
    $ ll /usr/bin/idle*
    -rwxr-xr-x 1 root root 91 xxx xx  xxxx /usr/bin/idle*
    -rwxr-xr-x 1 root root 92 xxx xx  xxxx /usr/bin/idle3*
    -rwxr-xr-x 1 root root 94 xxx xx  xxxx /usr/bin/idle-python2.7*
    -rwxr-xr-x 1 root root 94 xxx xx  xxxx /usr/bin/idle-python3.5*
    $ dpkg --status idle
    Version: 2.7.11-1
    ......
    $ dpkg --status idle3
    Version: 3.5.1-3
    ......
    To launch IDLE for Python 2 or Python 3:
    $ idle
    $ idle3
  • For Windows: IDLE is bundled in the installation. Click the START button ⇒ Python ⇒ IDLE (Python GUI). To exit, choose "File" menu ⇒ Exit. IDLE is written in Python and is kept under "Lib\idlelib". You can also use "idle.bat", "idle.py", "idle.pyw" to start the IDLE.
  • For Mac OS X: [TODO]
Using IDLE

In Python IDLE, you can enter Python statements interactively, similar to interactive Python command-line shell. E.g.,

$ idle3
Python 3.5.2 
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> print('Hello, world')
Hello, world
>>> print(2 ** 88)
309485009821345068724781056
>>> print(8.01234567890123456789)
8.012345678901234
>>> print((1+2j) * (3*4j))
(-24+12j)
>>>
Writing Python Script in IDLE

To write a Python script under IDLE, choose "File" menu ⇒ "New File". Enter the above script and save as "hello.py" in a directory of your choice (the Python script must be saved with the ".py" extension). To run the script, choose "Run" Menu ⇒ "Run Module". You shall see the outputs on the IDLE console.

Notes:

  • You can use Alt-P/Alt-N to retrieve the previous/next command in the command history.
  • You can read the Python Manual via the "Help" menu.
Debugging Python Script in IDLE

In the main window of IDLE console, choose "Debug" ⇒ "Debugger" to pop out the "Debug Control". In the text edit window, choose "Run" ⇒ "Run Module" to start debugging the script. You can then step over (Over), step into (Step), step out (Out) through the script from the "Debug Control". You can also set breakpoint by right clicking on the editor window.

Eclipse PyDev

There are several Eclipse plug-ins for Python - notably PyDev @ http://www.pydev.org/.

Installing PyDev

To install PyDev plug-in for Eclipse: Launch Eclipse ⇒ Help ⇒ Install New Software ⇒ In "Work With", enter "http://pydev.org/updates" ⇒ Add ⇒ Select "PyDev".

Caution: When you are prompted for PyDev certificate, you MUST manually select (check) the certificate, before pressing OK. Otherwise, the installation aborts without any warning.

Configuring PyDev

After the installation, you need to configure Python Interpreter: Launch Eclipse ⇒ Window ⇒ Preference ⇒ PyDev ⇒ Expand "Interpreters" node ⇒ Python Interpreter ⇒ New to configure your Python Interpreter. For examples, in Ubuntu, /usr/bin/python for Python 2.7 or /usr/bin/python3 for Python 3.5; in Windows, python.exe. You can configure many interpreters and choose the desire one for each of your project.

Writing Python Script using PyDev

To start a new Python project: Launch Eclipse ⇒ File ⇒ New ⇒ Project ⇒ PyDev ⇒ PyDev Project ⇒ Enter "Project name", e.g., "HelloPython" ⇒ Choose the "Interpreter" (configured earlier) and "Grammar Version" ⇒ "Finish".

To write a Python script: Right-click on the project ⇒ New ⇒ PyDev Module ⇒ Leave the "Package" empty ⇒ In "Name", enter "hello" ⇒ Key in the following script:

"""First Python module to say Hello"""
def sayHello(name):
    return "Hello, " + name

print(sayHello('Peter'))
Running Python Script

To run the script, right-click on the script ⇒ Run As ⇒ Python Run.

Debugging Using PyDev

To debug a script, set a breakpoint by double-clicking on the left-margin of the the desired line ⇒ right-click on the script ⇒ Debug as ⇒ Python run.

You can then trace the execution via "Step Over (current statement) (F5)", "Step Into (function) (F6)", "Step Out (function) (F7)", "Resume (to next breakpoint) (F8)", "Terminate", and etc.

NetBeans

[TODO]

LightTable

Reference: Light Table - the next generation code editor @ http://lighttable.com/.


LightTable is called the next generation source code editor, with many interesting features not available this this-generation editor.

Installing LightTable

For Ubuntu:

  1. Download the tarball, e.g., lighttable-0.8.0-alpha-linux.tar.gz.
  2. Un-tar into a directory of your choice, e.g., /usr/local:
    $ cd <target-directory>
    $ tar xzvf /path/to/lighttable-0.8.0-alpha-linux.tar.gz
Using LightTable for Python Program Development

To write a simple Python script:

  1. File ⇒ New ⇒ Enter the script ⇒ Save as module_name.py.
  2. Show the console-pane by selecting View ⇒ Console
  3. To run the script, place the cursor anywhere in the script, and press Ctrl-Shift-Enter.
  4. To evaluate a line, place the cursor on the line and press Ctrl-Enter.
  5. To evaluate selected lines, select the lines and press Ctrl-Enter.

To use a workspace, which is a collection of files and folders:

  1. View ⇒ Workspace ⇒ Right-click to add file and add folder into the workspace.

You can activate the command-pane by pressing Ctrl-Space (or View ⇒ Commands).

Debugging????

Python Basics

Python Syntax

Comments

A Python comment begins with a hash sign (#) and last till the end of the current line. Comments are ignored by the Python Interpreter, but they are critical in providing explanation and documentation for others (and yourself) to read your program. Use comments liberally.

There is NO multi-line comment in Python?!

Statements

A Python statement is delimited by a newline. A statement cannot cross line boundaries, except:

  1. An expression in parentheses (), square bracket [], and curly braces {} can span multiple lines.
  2. A backslash (\) at the end of the line denotes continuation to the next line. This is an old rule and is NOT recommended as it is error-prone.

Unlike C/C++/Java, you don't place a semicolon (;) at the end of a Python statement. But you can place multiple statements on a single line, separated by semicolon (;). For examples,

# One Python statement in one line.
# A Python statement is terminated by a newline.
# There is no semicolon at the end of a statement.
>>> x = 1     # Assign variable x to 1
>>> print(x)  # Print the value of x
1

# You can place multiple statements in one line, separated by semicolon
>>> print(x); print(x+1); print(x+2)
1
2
3

# An expression in brackets can span multiple lines
>>> x = [1,
         22,
         333]  # Re-assign variable x to a list
>>> print(x)
[1, 22, 333]

# To break a long expression into several lines, enclosed it with parentheses
>>> x =(1 +
        2
        + 3)
>>> x
6

# You can break a long string into several lines with parentheses too
>>> s = ('testing '   # No commas needed
         'hello, '
         'world!')
>>> s
'testing hello, world!'
Block and Indentation

A block is a group of statements executing as a unit. Unlike C/C++/Java, which use braces {} to group statements in a body block, Python uses indentation for body block. In other words, indentation is syntactically significant in Python - the body block must be properly indented. This is a good syntax to force you to indent the blocks correctly for ease of understanding!!!

Compound Statements

A compound statement, such as def (function definition) and while loop, begins with a header line terminated with a colon (:); followed by the indented body block. Python does not specify how much indentation to use, but all statements of the SAME body block must start at the SAME distance from the right margin. You can use either space or tab for indentation but you cannot mix them in the SAME body block. It is recommended to use 3 spaces (or 4 spaces) for each indentation level. For example,

# Define the function main()
def main():
    """Main function"""
    print(sum_1_to_n(100))
   
# Define the function sum_1_to_n()
def sum_1_to_n(n):
    """Sum from 1 to the given n"""
    sum = 0;
    i = 0;
    while (i <= n):
        sum += i
        i += 1
    return sum      
      
# Invoke function main()
main()

Notes:

  • Use IDLE to create the above script called "SumNumber.py" (File ⇒ New File). Run the script (Run ⇒ Run Module).
  • We define two functions: main() and sum_1_to_n(), via two def compound statements.
  • The trailing colon (:) signals the start of a body block. All statements belonging to the SAME block must be indented at the SAME distance from the right margin.
  • The first line of the function body block is a documentation string (or doc-string), followed by the function definition statements.

The trailing colon (:) and body indentation is probably the most strange feature in Python, if you come from C/C++/Java. Python imposes strict indentation rules to force programmers to write readable codes!

Naming Conventions and Coding Styles (PEP 8 & PEP 257)

These are the recommended naming conventions in Python:

  • Variable names: use a noun in lowercase words (optionally joined with underscore if it improves readability), e.g., num_students.
  • Function names: use a verb in lowercase words (optionally joined with underscore if it improves readability), e.g., getarea() or get_area().
  • Class names: use a noun in camel-case (initial-cap all words), e.g., MyClass, IndexError.
  • Constant names: use a noun in uppercase words joined with underscore, e.g., PI, MAX_STUDENTS.
Coding Styles

Read:

The recommended styles are:

  • Use 4 spaces for indentation. Don't use tab.
  • Lines shall not exceed 79 characters.
  • Use blank lines to separate functions and classes.
  • Use a space before and after an operator.
  • [TODO] more

Console Input/Output: Functions input() and print()

You can use function input() to read input from the console (as a string) and print() to print output to the console. For example,

>>> x = input('Enter a number: ')
Enter a number: 5
>>> x
'5'
>>> type(x)
<class 'str'>
>>> print(x)
5
 
# Test print()
>>> print('apple')
apple
>>> print('apple', 'orange')  # More items separated by commas
apple orange
>>> print('apple', 'orange', 'banana')
apple orange banana
print() without newline

The print() function prints a newline at the end of the output by default. You can use the keyword argument "end" to specify another delimiter (Python 3). For examples,

>>> for i in range(5): 
       print(i)  # default a newline at the end
0
1
2
3
4
>>> for i in range(5): 
       print(i, end=',')  # print a comma at the end
0,1,2,3,4,
>>> for i in range(5): 
       print(i, end='--')
0--1--2--3--4-- 
>>> for i in range(5): 
       print(i, end='')   # nothing at the end
01234
print in Python 2 vs Python 3

Recall that Python 2 and Python 3 are NOT compatible. In Python 2, you can use "print item", without the parentheses (because print is a keyword in Python 2). In Python 3, parentheses are required as print() is a function. For example,

# Python 3
>>> print('hello')
hello
>>> print 'hello'
  File "<stdin>", line 1
    print 'hello'
                ^
SyntaxError: Missing parentheses in call to 'print'
>>> print('aaa', 'bbb')
aaa bbb
   # Treated as multiple arguments, printed without parentheses

# Python 2
>>> print('Hello')
Hello
>>> print 'hello'
hello
>>> print('aaa', 'bbb')
('aaa', 'bbb')
   # Treated as a tuple (of items). Print the tuple with parentheses
>>> print 'aaa', 'bbb'
aaa bbb
   # Treated as multiple arguments

IMPORTANT: Always use print() function with parentheses, for portability!

Source Code Encoding

To specify the character encoding scheme of your source code, e.g., in UTF-8:

#!/usr/bin/env python3
# -*- coding: UTF-8 -*-
......

The default encoding is 7-bit ASCII.

I strongly encourage you to encode in UTF-8 for internationalization (i18n).

Data Types and Dynamic Typing

Python has a large number of built-in data types, such as Numbers (Integer, Float, Boolean, Complex Number), String, List, Tuple, Set, Dictionary and File. More high-level data types, such as Decimal and Fraction, are supported by external modules.

Number Types

Python supports these built-in number types:

  1. Integers (type int): e.g., 123, -456. Unlike C/C++/Java, integers are of unlimited size in Python. For example,
    >>> 123 + 456 - 789
    -210
    >>> 123456789012345678901234567890 + 1
    123456789012345678901234567891
    >>> 1234567890123456789012345678901234567890 + 1
    1234567890123456789012345678901234567891
    >>> 2 ** 888     # Raise 2 to the power of 888
    ......
    >>> len(str(2 ** 888))  # Convert integer to string and get its length
    268                     # 2 to the power of 888 has 268 digits
    >>> type(123)    # Get the type
    <class 'int'>
    >>> help(int)    # Show the help menu for type int
    You can also express integers in hexadecimal with prefix 0x (or 0X); in octal with prefix 0o (or 0O); and in binary with prefix 0b (or 0B). For examples, 0x1abc, 0X1ABC, 0o1776, 0b11000011.
  2. Floating-point numbers (type float): e.g., 1.0, -2.3, 3e4, -3E-4, with a decimal point and an optional exponent (in e or E). Floats are 64-bit double precision floating-point numbers. For example,
    >>> 1.23 * -4e5
    -492000.0
    >>> type(1.2)        # Get the type
    <class 'float'>
    >>> import math      # Using the math module
    >>> math.pi
    3.141592653589793
    >>> import random    # Using the random module
    >>> random.random()  # Generate a random number in [0, 1)
    0.890839384187198
  3. Booleans (type bool): takes a value of either True or False (take note of the spelling in initial-capitalized).
    >>> 8 == 8      # Compare
    True
    >>> 8 == 9
    False
    >>> type(True)  # Get type
    <class 'bool'>
    In Python, integer 0, an empty value (such as empty string '', "", empty list [], empty tuple (), empty dictionary {}), and None are treated as False; anything else are treated as True.
    Booleans can also act as integers in arithmetic operations with 1 for True and 0 for False. For example,
    >>> bool(0)
    False
    >>> bool(1)
    True
    >>> True + 3
    4
    >>> False + 1
    1
  4. Complex Numbers (type complex): e.g., 1+2j, -3-4j. Complex numbers have a real part and an imaginary part denoted with suffix of j (or J). For example,
    >>> x = 1 + 2j  # Assign variable x to a complex number
    >>> x           # Display x
    (1+2j)
    >>> x.real      # Get the real part
    1.0
    >>> x.imag      # Get the imaginary part
    2.0
    >>> type(x)     # Get type
    <class 'complex'>
    >>> x * (3 + 4j)  # Multiply two complex numbers
    (-5+10j)
  5. Other number types are provided by external modules, such as decimal module for decimal fixed-point numbers, fraction module for rational numbers.
    # floats are imprecise
    >>> 0.1 * 3
    0.30000000000000004
    
    # Decimal are precise
    >>> import decimal  # Using the decimal module
    >>> x = decimal.Decimal('0.1')  # Construct a Decimal object
    >>> x * 3    # Multiply  with overloaded * operator
    Decimal('0.3')
    >>> type(x)  # Get type
    <class 'decimal.Decimal'>

The None Value

Python provides a special value called None (take note of the spelling in initial-capitalized), which can be used to initialize an object (to be discussed in OOP later). For example,

>>> x = None
>>> type(x)   # Get type
<class 'NoneType'>
>>> print(x)
None

# Use 'is' and 'is not' to check for 'None' value.
>>> print(x is None)
True
>>> print(x is not None)
False

Dynamic Typing and Assignment Operator

Like most of the scripting languages (such as Perl, JavaScript, PHP) and unlike general-purpose programming language (such as C/C++/Java/C#), Python is dynamic typed. It associates types with objects, instead of variables. That is, a variable does not have a fixed type and can be assigned an object of any type. A variable simply provides a reference to an object.

You do not need to declare a variable. A variable is created automatically when a value is first assigned, which links the object to the variable. You can use built-in function type(var_name) to get the object type referenced by a variable.

>>> x = 1         # Assign an int value to create variable x
>>> x             # Display x
1
>>> type(x)       # Get the type of x
<class 'int'>
>>> x = 1.0       # Re-assign x to a float
>>> x
1.0
>>> type(x)       # Show the type
<class 'float'>
>>> x = 'hello'   # Re-assign x to a string
>>> x             
'hello'
>>> type(x)       # Show the type
<class 'str'>
>>> x = '123'     # Re-assign x to a string (of digits)
>>> x
'123'
>>> type(x)       # Show the type
<class 'str'>
Type Conversion

You can perform type conversion via built-in functions int(), float(), str(), bool(), etc. For example,

>>> x = '123'
>>> type(x)
<class 'str'>
>>> x = int(x)    # Parse str to int, and assign back to x
>>> x
123
>>> type(x)
<class 'int'>
>>> x = float(x)  # Convert x from int to float, and assign back to x
>>> x
123.0
>>> type(x)
<class 'float'>
>>> x = str(x)    # Convert x from float to str, and assign back to x
>>> x
'123.0'
>>> type(x)
<class 'str'>
>>> len(x)        # Get the length of the string
5
>>> x = bool(x)   # Convert x from str to boolean, and assign back to x
>>> x             # Non-empty string is converted to True
True
>>> type(x)
<class 'bool'>
>>> x = str(x)    # Convert x from bool to str
>>> x
'True'

In summary, a variable does not associate with a type. Instead, a type is associated with an object. A variable provides a reference to an object (of a certain type).

The Assignment Operator (=)

In Python, you do not need to declare variables before they are used. The initial assignment creates a variable and links the value to the variable. For example,

>>> x = 8        # Create a variable x by assigning a value
>>> x = 'Hello'  # Re-assign a value (of a different type) to x

>>> y            # Cannot access undefined (unassigned) variable
NameError: name 'y' is not defined
del

You can use del statement to delete a variable. For example,

>>> x = 8     # Create variable x via assignment
>>> x
8
>>> del x     # Delete variable x
>>> x
NameError: name 'x' is not defined
Pair-wise Assignment and Chain Assignment

For example,

>>> a = 1  # Ordinary assignment
>>> a
1
>>> b, c, d = 123, 4.5, 'Hello'  # Pair-wise assignment of 3 variables and values
>>> b
123
>>> c
4.5
>>> d
'Hello'
>>> e = f = g = 123  # Chain assignment
>>> e
123
>>> f
123
>>> g
123

Assignment operator is right-associative, i.e., a = b = 123 is interpreted as (a = (b = 123)).

Number Operations

Arithmetic Operators

Python supports these arithmetic operators:

Operator Description Examples
+ Addition  
- Subtraction  
* Multiplication  
/ Float Division
(returns a float)
1 / 2 ⇒ 0.5
-1 / 2 ⇒ -0.5
// Integer Division
(returns the floor integer)
1 // 2 ⇒ 0
-1 // 2 ⇒ -1
8.9 // 2.5 ⇒ 3.0
-8.9 // 2.5 ⇒ -4.0
-8.9 // -2.5 ⇒ 3.0
** Exponentiation 2 ** 5 ⇒ 32
1.2 ** 3.4 ⇒ 1.858729691979481
% Modulus (Remainder) 9 % 2 ⇒ 1
-9 % 2 ⇒ 1
9 % -2 ⇒ -1
-9 % -2 ⇒ -1
9.9 % 2.1 ⇒ 1.5
-9.9 % 2.1 ⇒ 0.6000000000000001

Notes:

  • Python does not support increment (++) and decrement (--) operators (as in C/C++/Java). You need to use i = i + 1 or i += 1 for increment.
  • Each of the operators has a corresponding shorthand assignment counterpart, i.e., +=, -=, *=, /=, //=, **= and %=. For example i += 1 is the same as i = i + 1.
  • For mixed-type operations, e.g., 1 + 2.3 (int + float), the value of the "smaller" type is first promoted to the "bigger" type. It then performs the operation in the "bigger" type and returns the result in the "bigger" type. In Python, int is "smaller" than float, which is "smaller" than complex.
Bitwise Operators

Python supports these bitwise operators:

Operator Description Example
x=0b10000001
y=0b10001111
& bitwise AND x & y ⇒ 0b10000001
| bitwise OR x | y ⇒ 0b10001111
~ bitwise NOT (or negate) ~x ⇒ -0b10000010
^ bitwise XOR x ^ y ⇒ 0b00001110
<< bitwise Left-Shift (padded with zeros) x << 2 ⇒ 0b1000000100
>> bitwise Right-Shift (padded with zeros) x >> 2 ⇒ 0b100000
Built-in Functions

Python provides many built-in functions for numbers, including:

  • Mathematical functions: round(), pow(), abs().
  • Type conversion functions: int(), float(), str(), bool(); and type() to get the type.
  • Base radix conversion functions: hex(), bin(), oct().

# Test built-in function round()
>>> x = 1.23456
>>> type(x)
<type 'float'>

# Python 3
>>> round(x)     # Round to the nearest integer
1
>>> type(round(x))   
<class 'int'>

# Python 2
>>> round(x)
1.0
>>> type(round(x))
<type 'float'>

>>> round(x, 1)  # Round to 1 decimal place
1.2
>>> round(x, 2)  # Round to 2 decimal places
1.23
>>> round(x, 8)  # No change - not for formatting
1.23456

# Test other built-in functions
>>> pow(2, 5)
32
>>> abs(-4.1)
4.1
  
# Base radix conversion
>>> hex(1234)
'0x4d2'
>>> bin(254)
'0b11111110'
>>> oct(1234)
'0o2322'
>>> 0xABCD  # Shown in decimal by default
43981

# List built-in functions
>>> dir(__builtins__)
['type', 'round', 'abs', 'int', 'float', 'str', 'bool', 'hex', 'bin', 'oct',......]

# Show number of built-in functions
>>> len(dir(__builtins__))  # Python 3
151
>>> len(dir(__builtins__))  # Python 2
144

# Show documentation of __builtins__ module
>>> help(__builtins__)
Relational (Comparison) Operators

Python supports these relational (comparison) operators that return a bool value of either True or False.

Operator Description Example
<, <=, >, >=, ==, != Comparison  
in, not in x in y check if x is contained in the sequence y  
is, is not x is y is True if x and y are referencing the same object  

Example: [TODO]

Logical Operators

Python supports these logical (boolean) operators, that operate on boolean numbers.

Operator Description Example
and Logical AND  
or Logical OR  
not Logical NOT  

Notes:

  • Python's logical operators are typed out in word, unlike C/C++/Java which uses symbols &&, || and !.
  • There is no exclusive-or (xor) (?)

Example: [TODO]

String

Strings can be delimited by a pair of single quotes ('...'), double quotes ("..."), triple single quotes ('''...'''), or triple double quotes ("""..."""). In Python, single-quoted string is the SAME as double-quoted string.

To place a single quote (') inside a single-quoted string, you need to use escape sequence \'. Similarly, to place a double quote (") inside a double-quoted string, use \". There is no need for escape sequence to place a single quote inside a double-quoted string; or a double quote inside a single-quoted string.

A triple-single-quoted or triple-double-quoted string can span multiple lines. There is no need for escape sequence to place a single/double quote inside a triple-quoted string. Triple-quoted strings are useful for multi-line documentation, HTML and other codes.

Python 3 uses Unicode character set.

>>> s1 = 'apple'
>>> s1
'apple'
>>> s2 = "orange"
>>> s2
'orange'
>>> s3 = "'orange'"   # Escape sequence not required
>>> s3
"'orange'"
>>> s3 ="\"orange\""  # Escape sequence needed
>>> s3
'"orange"'

# A triple-single/double-quoted string can span multiple lines
>>> s4 = """testing
testing"""
>>> s4
'testing\ntesting'
Escape Sequences

Like C/C++/Java, you need to use escape sequences (a back-slash + a code):

  • for special non-printable characters, such as tab (\t), newline (\n), carriage return (\r); and
  • to resolve ambiguity, such as \" (for "), \' (for '), \\ (for \).
Raw Strings

You can prefix a string by r to disable the interpretation of escape sequences, i.e., the back-slash. For example, r'\w{6,10}' is the same as '\\w{6,10}' (where escape needed for \). Raw string are used in Regex extensively.

Strings are Immutable

Strings are immutable, i.e., their contents cannot be modified. String functions such as upper(), replace() returns a new string object instead of modifying the string under operation.

Built-in Functions and Operators for Strings

You can operate on strings using:

  • built-in functions such as len();
  • operators such as in (contains), + (concatenation), * (repetition), indexing [i], and slicing [m:n:step].

Note: These functions and operators are applicable to all sequence data types including string, list, and tuple (to be discussed later).

Function/Operator Description Examples
s = 'Hello'
len() Length len(s) ⇒ 5
in Contain? 'ell' in s ⇒ True
'he' in s ⇒ False
+ Concatenation s + '!' ⇒ 'Hello!'
* Repetition s * 2 ⇒ 'HelloHello'
[i], [-i] Indexing to get a character.
The front index begins at 0; back index begins at -1 (=len()-1).
s[1] ⇒ 'e'
s[-4] ⇒ 'e'
[m:n], [m:], [:n], [m:n:step] Slicing to get a substring.
From index m (included) to n (excluded) with an optional step size.
The default m=0, n=-1, step=1.
s[1:3] ⇒ 'el'
s[1:-2] ⇒ 'el'
s[3:] ⇒ 'lo'
s[:-2] ⇒ 'Hel'
s[:] ⇒ 'Hello'
s[0:5:2] ⇒ 'Hlo'

For examples,

>>> s = "Hello, world"   # Assign a string literal to the variable s
>>> type(s)              # Get data type of s
<class 'str'>
>>> len(s)       # Length
12
>>> 'ello' in s  # The in operator
True

# Indexing
>>> s[0]       # Get character at index 0; index begins at 0
'H'
>>> s[1]
'e'
>>> s[-1]      # Get Last character, same as s[len(s) - 1]
'd'
>>> s[-2]      # 2nd last character
'l'

# Slicing
>>> s[1:3]     # Substring from index 1 (included) to 3 (excluded)
'el'
>>> s[1:-1]
'ello, worl'
>>> s[:4]      # Same as s[0:4], from the beginning
'Hell'
>>> s[4:]      # Same as s[4:-1], till the end
'o, world'
>>> s[:]       # Entire string; same as s[0:len(s)]
'Hello, world'

# Concatenation (+) and Repetition (*)
>>> s = s + " again"  # Concatenate two strings
>>> s
'Hello, world again'
>>> s * 3             # Repeat 3 times
'Hello, world againHello, world againHello, world again'

# String is immutable
>>> s[0] = 'a'
TypeError: 'str' object does not support item assignment
Character Type?

Python does not have a dedicated character data type. A character is simply a string of length 1. You can use the indexing operator to extract individual character from a string, as shown in the above example; or process individual character using for-in loop (to be discussed later).

The built-in functions ord() and chr() operate on character, e.g.,

# ord(c) returns the integer ordinal (Unicode) of a one-character string
>>> ord('A')
65
>>> ord('水')
27700

# chr(i) returns a one-character string with Unicode ordinal i; 0 <= i <= 0x10ffff.
>>> chr(65)
'A'
>>> chr(27700)
'水'
Unicode vs ASCII

In Python 3, strings are defaulted to be Unicode. ASCII strings are represented as byte strings, prefixed with b, e.g., b'ABC'.

In Python 2, strings are defaulted to be ASCII strings (byte strings). Unicode strings are prefixed with u.

String-Specific Member Functions

The str class provides many member functions. Since string is immutable, most of these functions return a new string. The commonly-used member functions are as follows, supposing that s is a str object:

  • s.strip(), s.rstrip(), s.lstrip(): the strip() strips the leading and trailing whitespaces. The rstrip() strips the right (trailing) whitespaces; while lstrip() strips the left (leading) whitespaces.
  • s.upper(), s.lower():
  • s.isupper(), s.islower():
  • s.find(s):
  • s.index(s):
  • s.startswith(s):
  • s.endswith(s):
  • s.split(delimiter-str), delimiter-str.join(list-of-strings):
>>> dir(str)      # List all attributes of the class str
......

>>> s = 'Hello, world'
>>> type(s)
<class 'str'>

>>> dir(s)         # List all attributes of the object s
.......

>>> help(s.find)   # Show the documentation of member function find
.......
>>> s.find('ll')   # Find the beginning index of the substring
2
>>> s.find('app')  # find() returns -1 if not found
-1

>>> s.index('ll')  # index() is the same as find(), but raise ValueError if not found
2
>>> s.index('app')
......
ValueError: substring not found

>>> s.startswith('Hell')
True
>>> s.endswith('world')
True
>>> s.replace('ll', 'xxx')
'Hexxxo, world'
>>> s.isupper()
False
>>> s.upper()
'HELLO, WORLD'

>>> s.split(', ')    # Split into a list with the given delimiter
['Hello', 'world']
>>> ', '.join(['hello', 'world', '123'])  # Join all strings in the list using the delimiter
'hello, world, 123'
 
>>> s = '  testing testing   '
>>> s.strip()        # Strip leading and trailing whitespaces
'testing testing'
>>> s.rstrip()       # Strip trailing (right) whitespaces
'  testing testing'
>>> s.lstrip()       # Strip leading (left) whitespaces
'testing testing   '

# List all the whitespace characters - in module string, attribute whitespace
>>> import string
>>> string.whitespace   # All whitespace characters
' \t\n\r\x0b\x0c'
>>> string.digits       # All digit characters
'0123456789'
>>> string.hexdigits    # All hexadecimal digit characters
'0123456789abcdefABCDEF'
String Formatting 1 (Old Style): Using % operator

There are a few ways to produce a formatted string for output. The old style (in Python 2) is to use the % operator, with C-like printf() format specifiers. For examples,

# %s for str
# %ns for str with field-width of n (default right-align)
# %-ns for left-align
>>> '|%s|%8s|%-8s|more|' % ('Hello', 'world', 'again')
'|Hello|   world|again   |more|'

# %d for int
# %nd for int with field-width of n
# %f for float
# %n.mf for float with field-with of n and m decimal digits
>>> '|%d|%4d|%6.2f|' % (11, 222, 33.333)   
'|11| 222| 33.33|'
String Formatting 2 (New Style): Using str.format() function

Python 3 introduces a new style in the string's format() function with {} as place-holders (called format fields). For examples,

# Replace format fields {} by arguments in format() in the same order
>>> '|{}|{}|more|'.format('Hello', 'world')
'|Hello|world|more|'

# You can use positional index in the form of {0}, {1}, ...
>>> '|{0}|{1}|more|'.format('Hello', 'world')
'|Hello|world|more|'
>>> '|{1}|{0}|more|'.format('Hello', 'world')
'|world|Hello|more|'

# You can use keyword inside {}
>>> '|{greeting}|{name}|'.format(greeting='Hello', name='Peter')
'|Hello|Peter|'

# Mixing positional and keyword
>>> '|{0}|{name}|more|'.format('Hello', name='Peter')
'|Hello|Peter|more|'
>>> '|{}|{name}|more|'.format('Hello', name='Peter')
'|Hello|Peter|more|'

# You can specify field width and alignment in the form of i:n or key:n,
# where i is the positional index, key is the keyword, and n is the field width.
>>> '|{1:8}|{0:7}|'.format('Hello', 'Peter')
'|Peter   |Hello  |'      # Default left-aligned
>>> '|{1:8}|{0:>7}|{2:-<10}|'.format('Hello', 'Peter', 'again')
'|Peter   |  Hello|again-----|'    # > (left align), < (right align), -< (fill char)
>>> '|{greeting:8}|{name:7}|'.format(name='Peter', greeting='Hi')
'|Hi      |Peter  |'

# Format int using 'd' or 'nd'
# Format float using 'f' or 'n.mf'
>>> '|{0:.3f}|{1:6.2f}|{2:4d}|'.format(1.2, 3.456, 78)
'|1.200|  3.46|  78|'
# With keywords
>>> '|{a:.3f}|{b:6.2f}|{c:4d}|'.format(a=1.2, b=3.456, c=78)
'|1.200|  3.46|  78|'

When you pass tuples, lists or dictionaries as arguments into the format() function, you can reference the sequence's elements in the format fields with [index]. For examples,

>>> tup = ('a', 11, 22.22)
>>> tup = ('a', 11, 11.11)
>>> lst = ['b', 22, 22.22]
>>> '|{0[2]}|{0[1]}|{0[0]}|'.format(tup)  # {0} matches tup, indexed via []
'|11.11|11|a|'
>>> '|{0[2]}|{0[1]}|{0[0]}|{1[2]}|{1[1]}|{1[0]}|'.format(tup, lst)  # {0} matches tup, {1} matches lst
'|11.11|11|a|22.22|22|b|'

>>> dict = {'c': 33, 'cc': 33.33}
>>> '|{0[cc]}|{0[c]}|'.format(dict)
'|33.33|33|'
>>> '|{cc}|{c}|'.format(**dict)  # As keywords via **  
'|33.33|33|'
String Formatting 3: Using str.rjust(n), str.ljust(n), str.center(n), str.zfill(n)

You can also use string's functions like str.rjust(n) (where n is the field-width), str.ljust(n), str.center(n), str.zfill(n) to format a string. For example,

# Setting field width and alignment
>>> '123'.rjust(5)
'  123'
>>> '123'.ljust(5)
'123  '
>>> '123'.center(5)
' 123 '
>>> '123'.zfill(5)  # Pad with leading zeros
'00123'

# Floats
>>> '1.2'.rjust(5)
'  1.2'
>>> '-1.2'.zfill(6)
'-001.2'
Conversion between String and Number: int(), float() and str()

You can use built-in functions int() and float() to parse a "numeric" string to an integer or a float; and str() to convert a number to a string. For example,

>>> s = '12345'
>>> s
'12345'
>>> int(s)    # Convert string to int
12345
>>> s = '55.66'
>>> s
'55.66'
>>> float(s)  # Convert string to float
55.66
>>> int(s)
ValueError: invalid literal for int() with base 10: '55.66'
>>> i = 8888
>>> str(i)    # Convert number to string
'8888'
Concatenate a String and a Number?

You CANNOT concatenate a string and a number (which results in TypeError). Instead, you need to use the str() function to convert the number to a string. For example,

>>> 'Hello' + 123
TypeError: cannot concatenate 'str' and 'int' objects
>>> 'Hello' + str(123)
'Hello123'
The isinstance() Built-in Function

You can use the built-in function isinstance(instance, type) to check if the instance belong to the type. For example,

>>> isinstance(123, int)
True
>>> isinstance('a', int)
False
>>> isinstance('a', str)
True

List [v1, v2,...]

Python has a powerful built-in list for dynamic array.

  • A list is enclosed by square brackets [].
  • A list can contain items of different types.
  • A list grows and shrinks in size automatically (dynamically). You do not have to specify its size during initialization.
Built-in Functions and Operators for Lists

A list, like string, is a sequence. Hence, you can operate lists using:

  • built-in sequence functions such as len().
  • built-in sequence functions for list of numbers such as max(), min(), and sum().
  • operators such as in (contains), + (concatenation) and * (repetition), del, indexing [i], and slicing [m,n,step].

Notes:

  • You can index the items from the front with positive index, or from the back with negative index. E.g., if x is a list, x[0] and x[1] refer to its first and second items; x[-1] and x[-2] refer to the last and second-to-last items.
  • You can also refer to a sub-list (or slice) using slice notation x[m:n] (from index m (included) to index n (excluded)), x[m:] (to last item), x[:n] (from first item), x[:] (all items), and x[m,n,step] (in step size).
Operator Description Examples
lst = [8, 9, 6, 2]
in Contain? 9 in lst ⇒ True
5 in lst ⇒ False
+ Concatenation lst + [5, 2]
⇒ [8, 9, 6, 2, 5, 2]
* Repetition lst * 2
⇒ [8, 9, 6, 2, 8, 9, 6, 2]
[i], [-i] Indexing to get an item.
Front index begins at 0; back index begins at -1 (or len-1).
lst[1] ⇒ 9
lst[-2] ⇒ 6
lst[1] = 99 ⇒ modify an existing item
[m:n], [m:], [:n], [m:n:step] Slicing to get a sublist.
From index m (included) to n (excluded) with an optional step size.
The default m is 0, n is len-1.
lst[1:3] ⇒ [9, 6]
lst[1:-2] ⇒ [9]
lst[3:] ⇒ [2]
lst[:-2] ⇒ [8, 9]
lst[:] ⇒ [8, 9, 6, 2]
lst[0:4:2] ⇒ [8, 6]
newlst = lst[:] ⇒ copy the list
lst[4:] = [1, 2] ⇒ modify a sub-list
del Delete one or more items
(for mutable sequences only)
del lst[1] ⇒ lst is [8, 6, 2]
del lst[1:] ⇒ lst is [8]
del lst[:] ⇒ lst is [] (clear all items)
Function Description Examples
lst = [8, 9, 6, 2]
len() Length len(lst) ⇒ 4
max(), min() Maximum and minimum value (for list of numbers only) max(lst) ⇒ 9
min(lst) ⇒ 2
sum() Sum (for list of numbers only) sum(lst) ⇒ 16

List, unlike string, is mutable. You can insert, remove and modify its items.

For examples,

>>> lst = [123, 4.5, 'hello']  # A list can contains items of different types
>>> lst
[123, 4.5, 'hello']
>>> len(lst)   # Length
3
>>> type(lst)
<class 'list'>
>>> lst[0]     # Indexing to get an item
123
>>> lst[2] = 'world'  # Re-assign
>>> lst
[123, 4.5, 'world']
>>> lst[0:2]   # Slicing to get a sub-list
[123, 4.5]
>>> lst[:2]
[123, 4.5]
>>> lst[1:]
[4.5, 'world']
>>> 123 in lst
True
>>> 1234 in lst
False
>>> lst + [6, 7, 8]   # Concatenation
[123, 4.5, 'world', 6, 7, 8]
>>> lst * 3           # Repetition
[123, 4.5, 'world', 123, 4.5, 'world', 123, 4.5, 'world']
>>> del lst[1]        # Removal
>>> lst
[123, 'world']

# Lists can be nested
>>> lst = [123, 4.5, ['a', 'b', 'c']]
>>> lst
[123, 4.5, ['a', 'b', 'c']]
>>> lst[2]
['a', 'b', 'c']
Appending Items to a list
>>> lst = [123, 'world']
>>> lst[2]     # Python performs index bound check
IndexError: list index out of range
>>> lst[len(lst)] = 4.5  # Cannot append using indexing
IndexError: list assignment index out of range
>>> lst[len(lst):] = [4.5]  # Need to append a list using slicing
>>> lst
[123, 'world', 4.5]
>>> lst[len(lst):] = [6, 7, 8]  # Append a list using slicing
>>> lst
[123, 'world', 4.5, 6, 7, 8]
>>> lst.append('nine')  # Append an item via append() function
>>> lst
[123, 'world', 4.5, 6, 7, 8, 'nine']
>>> lst.extend(['a', 'b'])  # extend() takes a list
>>> lst
[123, 'world', 4.5, 6, 7, 8, 'nine', 'a', 'b']

>>> lst + ['c']  # '+' returns a new list; while slicing-assignment modifies the list and returns None
[123, 'world', 4.5, 6, 7, 8, 'nine', 'a', 'b', 'c']
>>> lst  # No change
[123, 'world', 4.5, 6, 7, 8, 'nine', 'a', 'b']
Copying a List
>>> l1 = [123, 4.5, 'hello']
>>> l2 = l1[:]   # Make a copy via slicing
>>> l2
[123, 4.5, 'hello']
>>> l2[0] = 8    # Modify new copy
>>> l2
[8, 4.5, 'hello']
>>> l1           # No change in original
[123, 4.5, 'hello']

>>> l3 = l1.copy()   # Make a copy via copy() function, same as above

# Contrast with direct assignment
>>> l4 = l1    # Direct assignment (of reference)
>>> l4
[123, 4.5, 'hello']
>>> l4[0] = 8  # Modify new copy
>>> l4
[8, 4.5, 'hello']
>>> l1         # Original also changes
[8, 4.5, 'hello']
List-Specific Member Functions

The list class provides many member functions. Suppose lst is a list object:

  • lst.append(item): append the given item behind the lst and return None; same as lst[len(lst):] = [item].
  • lst.extend(lst2): append the given list lst2 behind the lst and return None; same as lst[len(lst):] = lst2.
  • lst.insert(index, item): insert the given item before the index and return None. Hence, lst.insert(0, item) inserts before the first item of the lst; lst.insert(len(lst), item) inserts at the end of the lst which is the same as lst.append(item).
  • lst.index(item): return the index of the first occurrence of item; or error.
  • lst.remove(item): remove the first occurrence of item from the lst and return None; or error.
  • lst.pop(): remove and return the last item of the lst.
  • lst.pop(index): remove and return the indexed item of the lst.
  • lst.clear(): remove all the items from the lst and return None; same as del lst[:].
  • lst.count(item): return the occurrences of item.
  • lst.reverse(): reverse the lst in place and return None.
  • lst.sort(): sort the lst in place and return None.
  • lst.copy(): return a copy of lst; same as lst[:].

Recall that list is mutable (unlike string which is immutable). These functions modify the list directly. For examples,

>>> lst = [123, 4.5, 'hello', [6, 7, 8]]  # list can also contain list
>>> lst
[123, 4.5, 'hello', [6, 7, 8]]
>>> type(lst)  # Show type
<class 'list'>
>>> dir(lst)   # Show all the attributes of the lst object

>>> len(lst)
4
>>> lst.append('apple')  # Append item at the back
>>> lst
[123, 4.5, 'hello', [6, 7, 8], 'apple']
>>> len(lst)
5
>>> lst.pop(1)     # Retrieve and remove item at index
4.5
>>> lst
[123, 'hello', [6, 7, 8], 'apple']
>>> len(lst)
4
>>> lst.insert(2, 55.66)  # Insert item before the index
>>> lst
[123, 'hello', 55.66, [6, 7, 8], 'apple']
>>> del lst[3:]         # Delete the slice (del is an operator , not function)
>>> lst
[123, 'hello', 55.66]
>>> lst.append(55.66)   # A list can contain duplicate values
>>> lst
[123, 'hello', 55.66, 55.66]
>>> lst.remove(55.66)   # Remove the first item of given value
>>> lst
[123, 'hello', 55.66]
>>> lst.reverse()       # Reverse the list in place
>>> lst
[55.66, 'hello', 123]
 
# Searching and Sorting
>>> lst2 = [5, 8, 2, 4, 1]
>>> lst2.sort()     # In-place sorting
>>> lst2
[1, 2, 4, 5, 8]
>>> lst2.index(5)   # Get the index of the given item
3
>>> lst2.index(9)
......
ValueError: 9 is not in list
>>> lst2.append(1)
>>> lst2
[1, 2, 4, 5, 8, 1]
>>> lst2.count(1)   # Count the occurrences of the given item
2
>>> lst2.count(9)
0
>>> sorted(lst2)    # Built-in function that returns a sorted list
[1, 1, 2, 4, 5, 8]
>>> lst2
[1, 2, 4, 5, 8, 1]  # Not modified
Using list as a last-in-first-out Stack

To use a list as a last-in-first-out (LIFO) stack, use append(item) to add an item to the top-of-stack (TOS) and pop() to remove the item from the TOS.

Using list as a first-in-first-out Queue

To use a list as a first-in-first-out (FIFO) queue, use append(item) to add an item to the end of the queue and pop(0) to remove the first item of the queue.

However, pop(0) is slow! The standard library provide a class collections.deque to efficiently implement deque with fast appends and pops from both ends.

Tuple (v1, v2,...)

Tuple is similar to list except that it is immutable (just like string). A tuple consists of items separated by commas, enclosed in parentheses ().

>>> tup = (123, 4.5, 'hello')  # A tuple can contain different types
>>> tup
(123, 4.5, 'hello')
>>> tup[1]           # Indexing to get an item
4.5
>>> tup[1:3]         # Slicing to get a sub-tuple
(4.5, 'hello')
>>> tup[1] = 9       # Tuple, unlike list, is immutable
TypeError: 'tuple' object does not support item assignment
>>> type(tup)
<class 'tuple'>
>>> lst = list(tup)  # Convert to list
>>> lst
[123, 4.5, 'hello']
>>> type(lst)
<class 'list'>

An one-item tuple needs a comma to differentiate from parentheses:

>>> tup = (5,)  # An one-item tuple needs a comma
>>> tup
(5,)
>>> x = (5)     # Treated as parentheses without comma
>>> x
5

The parentheses are actually optional, but recommended for readability. Nevertheless, the commas are mandatory. For example,

>>> tup = 123, 4.5, 'hello'
>>> tup
(123, 4.5, 'hello')
>>> tup2 = 88,  # one-item tuple needs a trailing commas 
>>> tup2
(88,)

# However, we can use empty parentheses to create an empty tuple
# Empty tuples are quite useless, as tuples are immutable.
>>> tup3 = ()
>>> tup3
()
>>> len(tup3)
0

You can operate on tuples using (supposing that tup is a tuple):

  • built-in functions such as len(tup);
  • built-in functions for tuple of numbers such as max(tup), min(tup) and sum(tup);
  • operators such as in, + and *; and
  • tuple's member functions such as tup.count(item), tup.index(item), etc.
Conversion between List and Tuple

You can covert a list to a tuple using built-in function tuple(); and a tuple to a list using list(). For examples,

>>> tuple([1, 2, 3, 1])  # Convert a list to a tuple
(1, 2, 3, 1)
>>> list((1, 2, 3, 1))   # Convert a tuple to a list
[1, 2, 3, 1]

Dictionary {k1:v1, k2:v2,...}

Python's built-in dictionary type supports key-value pairs (also known as name-value pairs, associative array, or mappings).

  • A dictionary is enclosed by a pair of curly braces {}. The key and value are separated by a colon (:), in the form of {k1:v1, k2:v2, ...}
  • Unlike list and tuple, which index items using an integer index (0, 1, 2, 3,...), dictionary can be indexed using any key type, including number, string or other types.
>>> dct = {'name':'Peter', 'gender':'male', 'age':21}
>>> dct
{'age': 21, 'name': 'Peter', 'gender': 'male'}
>>> dct['name']       # Get value via key
'Peter'
>>> dct['age'] = 22   # Re-assign a value
>>> dct
{'age': 22, 'name': 'Peter', 'gender': 'male'}
>>> len(dct)
3
>>> dct['email'] = 'peter@nowhere.com'   # Add new item
>>> dct
{'name': 'Peter', 'age': 22, 'email': 'peter@nowhere.com', 'gender': 'male'}
>>> type(dct)
<class 'dict'>

# Use dict() built-in function to create a dictionary
>>> dct2 = dict([('a', 1), ('c', 3), ('b', 2)])  # Convert a list of 2-item tuples into a dictionary
>>> dct2
{'b': 2, 'c': 3, 'a': 1}
Dictionary-Specific Member Functions

The dict class has many member methods. The commonly-used are follows (suppose that dct is a dict object):

  • dct.has_key():
  • dct.items(), dct.keys(), dct.values():
  • dct.clear():
  • dct.copy():
  • dct.get():
  • dct.update(dct2): merge the given dictionary dct2 into dct. Override the value if key exists, else, add new key-value.
  • dct.pop():

For Examples,

>>> dct = {'name':'Peter', 'age':22, 'gender':'male'}
>>> dct
{'gender': 'male', 'name': 'Peter', 'age': 22}

>>> type(dct)  # Show type
<class 'dict'>
>>> dir(dct)   # Show all attributes of dct object
......

>>> list(dct.keys())       # Get all the keys as a list
['gender', 'name', 'age']
>>> list(dct.values())     # Get all the values as a list
['male', 'Peter', 22]
>>> list(dct.items())      # Get key-value as tuples
[('gender', 'male'), ('name', 'Peter'), ('age', 22)]

# You can also use get() to retrieve the value of a given key
>>> dct.get('age', 'not such key')  # Retrieve item
22
>>> dct.get('height', 'not such key')
'not such key'
>>> dct['height']
KeyError: 'height'
    # Indexing an invalid key raises KeyError, while get() could gracefully handle invalid key

>>> del dct['age']   # Delete (Remove) an item of the given key
>>> dct
{'gender': 'male', 'name': 'Peter'}

>>> 'name' in dct
True

>>> dct.update({'height':180, 'weight':75})  # Merge the given dictionary
>>> dct
{'height': 180, 'gender': 'male', 'name': 'Peter', 'weight': 75}

>>> dct.pop('gender')  # Remove and return the item with the given key 
'male'
>>> dct
{'name': 'Peter', 'weight': 75, 'height': 180}
>>> dct.pop('no_such_key')   # Raise KeyError if key not found
KeyError: 'no_such_key'
>>> dct.pop('no_such_key', 'not found')   # Provide a default if key does not exist
'not found'

Set {k1, k2,...}

A set is an unordered, non-duplicate collection of objects. A set is delimited by curly braces {}, just like dictionary. You can think of a set as a collection of dictionary keys without associated values.

For example,

>>> st = {123, 4.5, 'hello', 123, 'Hello'}
>>> st         # Duplicate removed and ordering may change
{'Hello', 'hello', 123, 4.5}
>>> 123 in st  # Test membership
True
>>> 88 in st
False

# Use the built-in function set() to create a set.
>>> st2 = set([2, 1, 3, 1, 3, 2])  # Convert a list to a set. Duplicate removed and unordered.
>>> st2
{1, 2, 3}
>>> st3 = set('hellllo')  # Convert a string to a character set.
>>> st3
{'o', 'h', 'e', 'l'}
Set-Specific Operators

Python supports set operators & (intersection), | (union), - (difference) and ^ (exclusive-or). For example,

>>> st1 = {'a', 'e', 'i', 'o', 'u'}
>>> st1
{'e', 'o', 'u', 'a', 'i'}
>>> st2 = set('hello')  # Convert a string to a character set
>>> st2
{'o', 'l', 'e', 'h'}
>>> st1 & st2   # Set intersection
{'o', 'e'}
>>> st1 | st2   # Set union
{'o', 'l', 'h', 'i', 'e', 'a', 'u'}
>>> st1 - st2   # Set difference
{'i', 'u', 'a'}
>>> st1 ^ st2   # Set exclusive-or
{'h', 'i', 'u', 'a', 'l'}

Control Constructs

Syntax

A Python compound statement (such as conditional, loop and function definition) takes the following syntax:

Header-1:        # Headers are terminated by a colon
    statement-1  # Body blocks are indented (recommended to use 4 spaces)
    statement-2
    ......
Header-2:
    statement-1
    statement-2
    ......

For example, a if-elif-else statement takes the following form:

if x == 0:    # Parentheses are optional for condition
    print('x is zero')
elif x > 0:
    print(x)
    print('x is more than zero')
else:
    print(x)
    print('x is less than zero')
  • It begins with a header line, such as if, elif, else, terminated by a colon (:), followed by the indented body block of statements.
  • The parentheses () surrounding the condition are optional.
  • Python does not use curly braces {} to embrace a block (as in C/C++/Java). Instead, it uses indentation to delimit a body block. That is, the indentation is syntactically significant. Python does not care how you indent (with spaces or tab), and how far you indent. But ALL the statements in the SAME block must be indented with the SAME distance. You can use either spaces or tabs, but you cannot mix them. It is recommended to use 4 spaces for the indentation. This indentation rule forces programmers to write readable codes!!!

In Python, a statement is delimited by an end-of-line. That is, the end-of-line is the end of the statement. There is no need to place a semicolon (;) to end a statement (as in C/C++/Java).

Special cases:

  • You can place multiple statements in one line, separated by semicolon (;), for example,
    x = 1; y = 2; z = 3
  • A statement can span multiple line if parentheses (), square brackets [], or braces {} are parts of the statement, for example,
    x = [123,
         55.66,   # Indentation optional
         'hello']
  • You can place the body block on the same line as the header line, e.g.,
    if x == 0:  print('x is zero')
    elif x > 0: print(x); print('x is more than zero')
    else:       print(x); print('x is less than zero')
  • You can use backslash \ to indicate continuation into next line. This is an old rule and is NOT recommended, because it is error-prone.

Conditional if-elif-else

The general syntax is as follows. The elif (else-if) and else blocks are optional.

if test-1:
    block-1
elif test-2:
    block-2
elif test-3:
    block-3
......
elif test-n:
    block-n
else:
    else-block

Example: See above example.

There is no switch-case statement in Python (as in C/C++/Java).

Comparison and Logical Operators

Python supports these comparison (relational) operators, which return a bool of either True or False

  • < (less than), <= (less than or equal to), == (equal to), != (not equal to), > (greater than), >= (greater than or equal to).
  • in, not in: Check if an item is|is not in a sequence (list, tuple, etc).
  • is, is not: Check if two variables have the same reference.

Python supports these logical (boolean) operators:

  • and
  • or
  • not

Examples [TODO]

Chain Comparison n1 < x < n2

Python supports chain comparison in the form of n1 < x < n2 (which is not supported in C/C++/Java), e.g.,

>>> x = 8
>>> 1 < x < 10
True
>>> 1 < x and x < 10  # Same as above
True
>>> 10 < x < 20
False
>>> 10 > x > 1
True
>>> not (10 < x < 20)
True
Comparing Sequences

The comparison operators (such as ==, <=) are overloaded to support sequences (such as string, list and tuple). For example,

>>> x = 8
>>> 'a' < 'b'
True
>>> 'ab' < 'aa'
False
>>> 'a' < 'b' < 'c'
True
>>> (1, 2, 3) < (1, 2, 4)
True
>>> [1, 2, 3] <= [1, 2, 3]
True
Shorthand if-else

The syntax is:

expr-1 if test else expr-2
    # Evaluate expr-1 if test is True; otherwise, evaluate expr-2

For example,

>>> x = 0
>>> print('zero' if x == 0 else 'not zero')
zero
 
>>> x = -8
>>> abs_x = x if x > 0 else -x
>>> abs_x
8

Note: Python does not use "? :" for shorthand if-else, as in C/C++/Java.

The while loop

The general syntax is as follows:

while test:
    true-block
 
# while loop has an optional else block
while test:
    true-block
else:           # Run only if no break encountered
    else-block

The else block is optional, which will be executed if control exits the loop without encountering a break statement.

For example,

# Sum from 1 to the given upperbound
n = int(input('Enter the upperbound: '))
i = 1
sum = 0
while (i <= n):
    sum += i
    i += 1
print(sum)

break, continue, pass and loop-else

Like C/C++/Java, the break statement breaks out from the innermost loop; the continue statement skips the remaining statements of the loop and continues the next iteration.

The pass statement does nothing. It serves as a placeholder for an empty statement or empty block.

The loop-else block is executed if the loop is exited normally, without encountering the break statement.

Examples: [TODO]

The for-in loop

The for-in loop has the following syntax:

for item in sequence:  # sequence: string, list, tuple, dictionary, set
    true-block
 
# for-in loop with a else block
for item in sequence:
    true-block
else:        # Run only if no break encountered
    else-block

You shall read it as "for each item in the sequence...". Again, the else block is executed only if the loop exits normally, without encountering the break statement.

Iterating through a Sequence (String, List, Tuple, Dictionary, Set)

The for-in loop is primarily used to iterate the same process through all the items of a sequence, for example,

# String: iterating through each character
>>> for char in 'hello': print(char)
h
e
l
l
o

# List: iterating through each item
>>> for item in [123, 4.5, 'hello']: print(item)
123
4.5
hello

# Tuple: iterating through each item
>>> for item in (123, 4.5, 'hello'): print(item)
123
4.5
hello

# Dictionary: iterating through each key
>>> dct = {'a': 1, 2: 'b', 'c': 'cc'}
>>> for key in dct: print(key, ':', dct[key])
a : 1
c : cc
2 : b

# Set: iterating through each item
>>> for item in {'apple', 1, 2, 'apple'}: print(item)
1
2
apple

# File: iterating through each line
>>> f = open('test.txt', 'r')
>>> for line in f: print(line)
...Each line of the file...
>>> f.close()
Iterating through a Sequence of Sequences

A sequence (such as list, tuple) can contain sequences. For example,

# A list of 2-item tuples
>>> lst = [(1,'a'), (2,'b'), (3,'c')]
# Iterating thru the each of the 2-item tuples
>>> for i1, i2 in lst: print(i1, i2)
... 
1 a
2 b
3 c

# A list of 3-item lists
>>> lst = [[1, 2, 3], ['a', 'b', 'c']]
>>> for i1, i2, i3 in lst: print(i1, i2, i3)
... 
1 2 3
a b c
Iterating through a Dictionary

There are a few ways to iterate through an dictionary:

>>> dct = {'name':'Peter', 'gender':'male', 'age':21}

# Iterate through the keys (as in the above example)
>>> for key in dct: print(key, ':', dct[key])
age : 21
name : Peter
gender : male

# Iterate through the key-value pairs
>>> for key, value in dct.items(): print(key, ':', value)
age : 21
name : Peter
gender : male

>>> dct.items()  # Return a list of key-value (2-item) tuples
[('gender', 'male'), ('age', 21), ('name', 'Peter')]
The iter() and next() Built-in Functions

The built-in function iter(iterable) takes a iterable (such as sequence) and returns an iterator object. You can then use next(iterator) to iterate through the items. For example,

>>> i = iter([11, 22, 33])
>>> next(i)
11
>>> next(i)
22
>>> next(i)
33
>>> next(i)  # Raise StopIteration exception if no more item
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration

>>> type(i)
<class 'list_iterator'>
The range() Built-in Function

The range() function produces a series of running integers, which can be used as list indexes for the for-in loop.

  • range(n) produces integers from 0 to n-1;
  • range(m, n) produces integers from m to n-1;
  • range(m, n, s) produces integers from m to n-1 in step of s.

For example,

# Sum from 1 to the given upperbound
upperbound = int(input('Enter the upperbound: '))
sum = 0
for number in range(1, upperbound+1):  # list of 1 to n
    sum += number
print("The sum is: %d" % sum)
 
# Sum a given list
lst = [9, 8, 4, 5]
sum = 0
for index in range(len(lst)):  # list of 0 to len-1
    sum += lst[index]
print(sum)
 
# Better alternative of the above
lst = [9, 8, 4, 5]
sum = 0
for item in lst:  # Each item of lst
    sum += item
print(sum)

# Use built-in function
del sum   # Need to remove the sum variable before using builtin function sum
print(sum(lst))
The reversed() Built-in Function

To iterate a sequence in the reverse order, apply the reversed() function which reverses the iterator over values of the sequence. For example,

>>> lst = [11, 22, 33]
>>> for item in reversed(lst): print(item, end=' ')
33 22 11
>>> reversed(lst)
<list_reverseiterator object at 0x7fc4707f3828>

>>> str = "hello"
>>> for c in reversed(str): print(c, end='')
olleh
The enumerate() Built-in Function

You can use the built-in function enumerate() to obtain the positional indexes, when looping through a sequence. For example,

# List
>>> for i, v in enumerate(['a', 'b', 'c']): print(i, v)
0 a
1 b
2 c
>>> enumerate(['a', 'b', 'c'])
<enumerate object at 0x7ff0c6b75a50>

# Tuple
>>> for i, v in enumerate(('d', 'e', 'f')): print(i, v)
0 d
1 e
2 f
Multiple Sequences and the zip() Built-in Function

To loop over two or more sequences concurrently, you can pair the entries with the zip() built-in function. For examples,

>>> lst1 = ['a', 'b', 'c']
>>> lst2 = [11, 22, 33]
>>> for i1, i2 in zip(lst1, lst2): print(i1, i2)
a 11
b 22
c 33
>>> zip(lst1, lst2)   # Return a list of tuples
[('a', 11), ('b', 22), ('c', 33)]

# zip() for more than 2 sequences
>>> tuple3 = (44, 55)
>>> zip(lst1, lst2, tuple3)
[('a', 11, 44), ('b', 22, 55)]
List and Dictionary Comprehension

List comprehension provides concise way to generate a list. The syntax is:

result_list = [expression_with_item for item in in_list]
result_list = [expression_with_item for item in in_list if test]   # with an optional test
 
# Same as
result_list = []
for item in in_list:
    if test:
        result_list.append(item)

For examples,

>>> sq = [item * item for item in range(1,11)]
>>> sq
[1, 4, 9, 16, 25, 36, 49, 64, 81, 100]

>>> x = [3, 4, 1, 5]
>>> sq_x = [item * item for item in x]  # no test, all items
>>> sq_x
[9, 16, 1, 25]
>>> sq_odd = [item * item for item in x if item % 2 != 0]
>>> sq_odd
[9, 1, 25]

# Nested for
>>> [(x, y) for x in range(1,3) for y in range(1,4) if x != y]
[(1, 2), (1, 3), (2, 1), (2, 3)]

Similarly, you can create dictionary and set (mutable sequences) via comprehension. For example,

# Dictionary {k1:v1, k2:v2,...}
>>> d = {x:x**2 for x in range(1, 5)}  # Use braces for dictionary
>>> d
{1: 1, 2: 4, 3: 9, 4: 16}

# Set {v1, v2,...}
>>> s = {i for i in 'hello' if i not in 'aeiou'}  # Use braces for set too
>>> s
{'h', 'l'}

Comprehension is not applicable to tuple, as it is immutable.

Using else-clause in Loop

Recall that the else-clause runs when no break occurs inside the loop body.

# List all primes between 2 and 100
for number in range(2, 101):
    for factor in range(2, number//2+1):  # Look for factor
        if number % factor == 0:  # break if a factor found
            print('%d is NOT a prime' % number)  
            break
    else:
        print('%d is a prime' % number)  # Only if no break encountered

Functions

Syntax

In Python, you define a function via the keyword def followed by the function name, the parameter list, the doc-string and the function body. Inside the function body, you can use a return statement to return a value to the caller.

The syntax is:

def function_name(arg1, arg2, ...):
    """Function doc-string"""    # Can be retrieved via function_name.__doc__
    statements
    return return-value
Example 1
>>> def my_square(x):
        """Return the square of the given number"""
        return x * x

# Invoke the function defined earlier
>>> my_square(8)
64
>>> my_square(1.8)
3.24
>>> my_square('hello')
TypeError: can't multiply sequence by non-int of type 'str'
>>> my_square
<function my_square at 0x7fa57ec54bf8>
>>> type(my_square)
<class 'function'>
>>> my_square.__doc__  # Show function doc-string
'Return the square of the given number'
>>> help(my_square)    # Show documentaion
my_square(x)
    Return the square of the given number
>>> dir(my_square)     # Show attributes
......

Take note that you need to define the function before using it.

Example 2
def fibon(n):
    """Print the first n Fibonacci numbers, where f(n)=f(n-1)+f(n-2) and f(1)=f(2)=1"""
    a, b = 1, 1
    for count in range(n):
        print(a, end=' ')  # print a space instead of an newline at the end (Python 3)
        a, b = b, a+b
    print()   # print a newline

fibon(20)
Example 3: Function doc-string
def my_cube(x):
    """
    (number) -> (number)
    Return the cube of the given number.

    Examples (can be used by doctest):
    >>> my_cube(5)
    125
    >>> my_cube(-5)
    -125
    >>> my_cube(0)
    0
    """
    return x*x*x

# Test the function
print(my_cube(8))    # 512
print(my_cube(-8))   # -512
print(my_cube(0))    # 0

This example elaborates on the function's doc-string:

  • The first line "(number) -> (number)" specifies the type of the argument and return value. Python does not perform type check on function, and this line merely serves as documentation.
  • The second line gives a description.
  • Examples of function invocation follow. You can use the doctest module to perform unit test for this function based on these examples (to be described in the "unit-test" section.
The pass statement

The pass statement does nothing. It is sometimes needed as the statement placeholder to ensure correct syntax, e.g.,

def my_fun():
    pass      # To be defined later, but syntax error if empty

Function Parameters

Passing Arguments by Value vs. by Reference

In Python:

  • Immutable arguments (such as integers, floats, strings and tuples) are passed by value. That is, a copy is cloned and passed into the function, and the original cannot be modified inside the function.
  • Mutable arguments (such as lists, dictionaries, sets and instances of classes) are passed by reference. That is, they can be modified inside the function.
Function Parameters with Default Values

You can assign a default value to the "trailing" function parameters. These trailing parameters having default values are optional during invocation. For example,

>>> def my_sum(n1, n2 = 4, n3 = 5):  # n1 is required, n2 and n3 having defaults are optional
        """Return the sum of all the arguments"""
        return n1 + n2 + n3

>>> print(my_sum(1, 2, 3))
6
>>> print(my_sum(1, 2))    # n3 defaults
8
>>> print(my_sum(1))       # n2 and n3 default
10
>>> print(my_sum())
TypeError: my_sum() takes at least 1 argument (0 given)
>>> print(my_sum(1, 2, 3, 4))
TypeError: my_sum() takes at most 3 arguments (4 given)

Another Example,

def greet(name):
    return 'hello, ' + name
    
greet('Peter')  # Output: 'hello, Peter'

In stead of hardcoding the 'hello, ', it is more flexible to use a parameter with a default value, as follows:

def greet(name, prefix='hello'):  # 'name' is required, 'prefix' is optional
    return prefix + ', ' + name
    
greet('Peter')                    # Output: 'hello, Peter'
greet('Peter', 'hi')              # Output: 'hi, Peter'
greet('Peter', prefix='hi')       # Output: 'hi, Peter'
greet(name='Peter', prefix='hi')  # Output: 'hi, Peter'
Positional and Keyword Arguments

Python functions support both positional and keyword (or named) arguments.

Normally, Python passes the arguments by position from left to right, i.e., positional, just like C/C++/Java. Python also allows you to pass arguments by keyword (or name) in the form of kwarg=value. For example,

def my_sum(n1, n2 = 4, n3 = 5):
    """Return the sum of all the arguments"""
    return n1 + n2 + n3

print(my_sum(n2 = 2, n1 = 1, n3 = 3)) # Keyword arguments need not follow their positional order
print(my_sum(n2 = 2, n1 = 1))         # n3 defaults
print(my_sum(n1 = 1))                 # n2 and n3 default
print(my_sum(1, n3 = 3))              # n2 default
#print(my_sum(n2 = 2))                # TypeError, n1 missing

You can also mix the positional arguments and keyword arguments, but you need to place the positional arguments first, as shown in the above examples.

Variable Number of Positional Parameters (*args) and Keyword Parameters (**kwargs)

Python supports variable number (arbitrary) of arguments. In the function definition, you can use * to pack all the remaining positional arguments into a tuple. For example,

def my_sum(a, *args):  # Accept one positional argument, followed by arbitrary number of arguments
    """Return the sum of all the arguments (one or more)"""
    sum = a
    for item in args:  # args is a tuple
        sum += item
    return sum

print(my_sum(1))           # args is ()
print(my_sum(1, 2))        # args is (2,)
print(my_sum(1, 2, 3))     # args is (2, 3)
print(my_sum(1, 2, 3, 4))  # args is (2, 3, 4)

Python supports placing *args in the middle of the parameter list. However, all the arguments after *args must be passed by keyword to avoid ambiguity. For example

def my_sum(a, *args, b):
    sum = a
    for item in args:
        sum += item
    sum += b
    return sum

print(my_sum(1, 2, 3, 4))  
    # TypeError: my_sum() missing 1 required keyword-only argument: 'b'
print(my_sum(1, 2, 3, 4, b=5))

In the reverse situation when the arguments are already in a list/tuple, you can also use * to unpack the list/tuple as separate positional arguments. For example,

>>> def my_sum(a, b, c): return a+b+c

>>> lst1 = [11, 22, 33]
# my_sum() expects 3 arguments, NOT a 3-item list
>>> my_sum(*lst1)   # unpack the list into separate positional arguments
66

>>> lst2 = [44, 55]
>>> my_sum(*lst2)
TypeError: my_sum() missing 1 required positional argument: 'c'

For keyword parameters, you can use ** to pack them into a dictionary. For example,

def my_print_kwargs(**kwargs):  # Accept variable number of keyword arguments
    """Print all the keyword arguments"""
    for key, value in kwargs.items():  # kwargs is a dictionary
        print('%s: %s' % (key, value))

my_print_kwargs(name='Peter', age=24)

# Similarly, you can also use ** to unpack a dictionary into individual keyword arguments
dict = {'k1':'v1', 'k2':'v2'}
my_print_kwargs(**dict)  # Use ** to unpack dictionary into separate keyword arguments k1=v1, k2=v2

You can use both *args and **kwargs in your function definition. Place *args before **kwargs. For example,

def my_print_all_args(*args, **kwargs):   # Place *args before **kwargs
    """Print all positional and keyword arguments"""
    for item in args:  # args is a tuple
        print(item)
    for key, value in kwargs.items():  # kwargs is a dictionary
        print('%s: %s' % (key, value))

my_print_all_args('a', 'b', 'c', name='Peter', age=24)
    # Place the positional arguments before the keyword arguments during invocation

Function Overloading

Python does NOT support Function Overloading like Java/C++ (where the same function name can have different versions differentiated by their parameters).

Function Return Values

You can return multiple values from a Python function, e.g.,

>>> def my_fun():
       return 1, 'a', 'hello'

>>> x, y, z = my_fun()
>>> z
'hello'
>>> my_fun()
(1, 'a', 'hello')

It seems that Python function can return multiple values. In fact, a tuple is returned.

A tuple is actually formed through the commas, not the parentheses, e.g.,

>>> x = 1, 'a'  # Parentheses are optional for tuple
>>> x
(1, 'a')

Types Hints via Function Annotations

From Python 3.5, you can provide type hints via function annotations in the form of:

def say_hello(name:str) -> str:  # Type hints for parameter and return value
    return 'hello, ' + name

say_hello('Peter')

The type hints annotations are usually ignored, and merely serves as documentation. But there are external library that can perform the type check.

Read: "PEP 484 -- Type Hints".

Modules, Import-Statements and Packages

Modules

A Python module is a file containing Python codes - including statements, variables, functions and classes. It shall be saved with file extension of ".py". The module name is the filename, i.e., a module shall be saved as "<module_name>.py".

By convention, modules names shall be short and all-lowercase (optionally joined with underscores if it improves readability).

A module typically begins with a triple-double-quoted documentation string (doc-string) (available in <module_name>.__doc__), followed by variable, function and class definitions.

Example: The greet Module

Create a module called greet (save as "greet.py") as follows:

#!/usr/bin/env python3
# -*- coding: UTF-8 -*-
"""
greet
~~~~~
This module contains the greeting message 'msg' and greeting function 'greet()'.
"""

msg = 'Hello'      # Global Variable
 
def greet(name):   # Function
    print('{}, {}'.format(msg, name))

This greet module defines a variable msg and a function greet().

The import statement

To use an external module in your script, use the import statement:

import <module_name>                          # import one module
import <module_name_1>, <module_name_2>, ...  # import many modules, separated by commas
import <module_name> as <name>                # To reference the imported module as <name>

Once imported, you can reference the module's attributes as <module_name>.<attribute_name>. You can use the import-as to assign a new module name to avoid module name conflict.

For example, to use the greet module created earlier:

$ cd /path/to/target-module
$ python3
>>> import greet
>>> greet.greet('Peter')  # <module_name>.<function_name>
Hello, Peter
>>> print(greet.msg)      # <module_name>.<var_name>
Hello

>>> greet.__doc__         # module's doc-string
'greet.py: the greet module with attributes msg and greet()'
>>> greet.__name__        # module's name
'greet'

>>> dir(greet)            # List all attributes defined in the module
['__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__',
 '__package__', '__spec__', 'greet', 'msg']

>>> help(greet)           # Show module's name, functions, data, ...
Help on module greet:
NAME
    greet
DESCRIPTION
    ...doc-string...
FUNCTIONS
    greet(name)
DATA
    msg = 'Hello'
FILE
    /path/to/greet.py

>>> import greet as gr  # Reference the 'greet' module as 'gr'
>>> gr.greet('Paul')
Hello, Paul

The import statements should be grouped in this order:

  1. standard library
  2. third party libraries
  3. local application libraries

The from-import Statement

The syntax is:

from <module_name> import <attr_name>              # import one attribute
from <module_name> import <attr_name_1>, <attr_name_2>, ...   # import selected attributes
from <module_name> import *                        # import ALL attributes (NOT recommended)
from <module_name> import <attr_name> as <name>    # import attribute as the given name

With the from-import statement, you can reference the imported attributes using <attr_name> directly, without qualifying with the <module_name>.

For example,

>>> from greet import greet, msg as message
>>> greet('Peter')  # Reference without the 'module_name'
Hello, Peter
>>> message
'Hello'
>>> msg
NameError: name 'msg' is not defined

import vs. from-import

The from-import statement actually loads the entire module (like import statement); and NOT just the imported attributes. But it exposes ONLY the imported attributes to the namespace. Furthermore, you can reference them directly without qualifying with the module name.

For example, let create the following module called imtest.py for testing import vs. from-import:

"""
imtest.py: for testing import vs. from-import
"""
x = 1
y = 2

print("x is: {:d}".format(x))

def foo():
    print("y is: {:d}".format(y))
    
def bar():
    foo()

Let's try out import:

$ python3
>>> import imtest
x is: 1
>>> imtest.y  # All attributes are available, qualifying by the module name
2
>>> imtest.bar()
y is: 2

Now, try the from-import and note that the entire module is loaded, just like the import statement.

$ python3
>>> from imtest import x, bar
x is: 1
>>> x  # Can reference directly, without qualifying with the module name
1
>>> bar()
y is: 2
>>> foo()  # Only the imported attributes are available
NameError: name 'foo' is not defined

Conditional Import

Conditional import is supported. For example,

if ....:   # E.g., check the version number
   import xxx
else:
   import yyy

sys.path and PYTHONPATH environment variable

The module search path is maintained in a Python variable path of the sys module, i.e. sys.path.

The sys.path is initialized from the environment variable PYTHONPATH, plus an installation-dependent default.

For example,

>>> import sys
>>> sys.path
['', '/usr/lib/python3.5', '/usr/local/lib/python3.5/dist-packages', 
 '/usr/lib/python3.5/dist-packages', ...]

sys.path default includes the current working directory (denoted as empty string), the standard Python directories, plus the extension directories in dist-packages.

The imported modules must be available in one of the sys.path entries.

>>> import some_mod
ImportError: No module named 'some_mod'
>>> some_mod.var
NameError: name 'some_mod' is not defined

Reloading Module using imp.reload() or importlib.reload()

If you modify a module, you can use reload() function of the imp (for import) module to reload the module, for example,

>>> import greet
# Make changes to greet module
>>> import imp
>>> imp.reload(greet)

NOTE: Since Python 3.4, the imp package is pending deprecation in favor of importlib.

>>> import greet
# Make changes to greet module
>>> import importlib   # Use 'importlib' in Python 3
>>> importlib.reload(greet)

Template for Python Standalone Module

The following is a template of standalone module for performing a specific task:

#!/usr/bin/env python3
# -*- coding: UTF-8 -*-

"""
<package_name>.<module_name>
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

A description which can be long and explain the complete
functionality of this module even with indented code examples.
Class/Function however should not be documented here.

:author: <author-name>
:version: x.y.z (verion.release.modification)
:copyright: ......
:license: ......
"""

import <standard_library_modules>
import <third_party_library_modules>
import <application_modules>

# Define global variables
......

# Define helper functions
......

# Define the entry 'main' function
def main():
    """The main function doc-string"""
    .......

# Run the main function
if __name__ == '__main__':
    main()

When you execute a Python module (via the Python Interpreter), the __name__ is set to '__main__'. On the other hand, when a module is imported, its __name__ is set to the module name. Hence, the above module will be executed if it is loaded by the Python interpreter, but not imported by another module.

Example: [TODO]

Packages

A module contains attributes (such as variables, functions and classes). Relevant modules (kept in the same directory) can be grouped into a package. Python also supports sub-packages (in sub-directories). Packages and sub-packages are a way of organizing Python's module namespace by using "dotted names" notation, in the form of '<pack_name>.<sub_pack_name>.<sub_sub_pack_name>.<module_name>.<attr_name>'.

To create a Python package:

  1. Create a directory and named it your package's name.
  2. Put your modules in it.
  3. Create a '__init__.py' file in the directory.

The '__init__.py' marks the directory as a package. For example, suppose that you have this directory/file structure:

myapp/                 # This directory is in the 'sys.path'
   |
   + mypack1/          # A directory of relevant modules
   |    |
   |    + __init__.py  # Mark this directory as a package called 'mypack1'
   |    + mymod1_1.py  # Reference as 'mypack1.mymod1_1'
   |    + mymod1_2.py  # Reference as 'mypack1.mymod1_2'
   |
   + mypack2/          # A directory of relevant modules
        |
        + __init__.py  # Mark this directory as a package called 'mypack2'
        + mymod2_1.py  # Reference as 'mypack2.mymod2_1'
        + mymod2_2.py  # Reference as 'mypack2.mymod2_2'

If 'myapp' is in your 'sys.path', you can import 'mymod1_1' as:

import mypack1.mymod1_1       # Reference 'attr1_1_1' as 'mypack1.mymod1_1.attr1_1_1'
from mypack1 import mymod1_1  # Reference 'attr1_1_1' as 'mymod1_1.attr1_1_1'

Without the '__init__.py', Python will NOT search the 'mypack1' directory for 'mymod1_1'. Moreover, you cannot reference modules in the 'mypack1' directory directly (e.g., 'import mymod1_1') as it is not in the 'sys.path'.

Attributes in '__init__.py'

The '__init__.py' file is usually empty, but it can be used to initialize the package such as exporting selected portions of the package under more convenient name, hold convenience functions, etc.

The attributes of the '__init__.py' module can be accessed via the package name direclty (i.e., '<package-name>.<attr-name>' instead of '<package-name>.<__init__>.<attr-name>'). For example,

import mypack1               # Reference 'myattr1' in '__init__.py' as 'mypack1.myattr1'
from mypack1 import myattr1  # Reference 'myattr1' in '__init__.py' as 'myattr1'
Sub-Packages

A package can contain sub-packages too. For example,

myapp/                 # This directory is in the 'sys.path'
   |
   + mypack1/
        |
        + __init__.py    # Mark this directory as a package called 'mypack1'
        + mymod1_1.py    # Reference as 'mypack1.mymod1_1'
        |
        + mysubpack1_1/
        |    |
        |    + __init__.py    # Mark this sub-directory as a package called 'mysubpack1_1'
        |    + mymod1_1_1.py  # Reference as 'mypack1.mysubpack1_1.mymod1_1_1'
        |    + mymod1_1_2.py  # Reference as 'mypack1.mysubpack1_1.mymod1_1_2'
        |
        + mysubpack1_2/
             |
             + __init__.py    # Mark this sub-directory as a package called 'mysubpack1_2'
             + mymod1_2_1.py  # Reference as 'mypack1.mysubpack1_2.mymod1_2_1'

Clearly, the package's dot structure corresponds to the directory structure.

Relative from-import

In the from-import statement, you can use . to refer to the current package and .. to refer to the parent package. For example, inside 'mymod1_1_1.py', you can write:

# Inside 'mymod1_1_1'
# The current package is 'mysubpack1_1', where 'mymod1_1_1' resides
from . import mymod1_1_2     # from current package
from .. import mymod1_1      # from parent package
from .mymod1_1_2 import attr
from ..mysubpack1_2 import mymod1_2_1

Take note that in Python, you write '.mymod1_1_2', '..mysubpack1_2' by omitting the separating dot (instead of '..mymod1_1_2', '...mysubpack1_2').

Circular Import

[TODO]

Advanced Functions and Namespaces

Local Variables vs Global Variables

Names created inside a function (i.e. within def statement) are local to the function and are available inside the function only.

Names created outside all functions are global to that particular module (or file), but not available to the other modules. Global variables are available inside all the functions defined in the module. Global-scope in Python is equivalent to module-scope or file-scope. There is NO all-module-scope in Python.

For example,

x = 'global'     # x is a global variable for this module
 
def myfun(arg):  # arg is a local variable for this function
    y = 'local'  # y is also a local variable
    
    # Function can access both local and global variables
    print(x)
    print(y)
    print(arg)
 
myfun('abc')
print(x)
#print(y)   # locals are not visible outside the function
#print(arg)

Function Variables

In Python, a variable takes a value or object (such as int, str). It can also take a function. For example,

>>> def square(n): return n * n

>>> square(5)
25
>>> sq = square   # Assign a function to a variable
>>> sq(5)
25
>>> type(square)
<class 'function'>
>>> type(sq)
<class 'function'>
>>> square
<function square at 0x7f0ba7040f28>
>>> sq
<function square at 0x7f0ba7040f28>  # Exactly the same reference as square

A variable in Python can hold anything, a value, a function or an object.

In Python, you can also assign a specific invocation of a function to a variable. For example,

>>> def square(n): return n * n

>>> sq5 = square(5)   # A specific function invocation
>>> sq5
25
>>> type(sq5)
<class 'int'>

Nested Functions

Python supports nested functions, i.e., defining a function inside a function. For example,

def outer(a):      # Outer function
    print('outer begins with arg =', a)
    x = 1  # Define a local variable

    # Define an inner function
    # Outer has a local variable of function object
    def inner(b):  
        print('inner begins with arg = %s' % b)
        y = 2
        print('a = %s, x = %d, y = %d' % (a, x, y))
            # Have read-access to outer function's attributes
        print('inner ends')

    # Call inner function defined earlier
    inner('bbb')

    print('outer ends')

# Call outer function, which in turn calls the inner function 
outer('aaa')

The expected output is:

outer begins with arg = aaa
inner begins with arg = bbb
a = aaa, x = 1, y = 2
inner ends
outer ends

Take note that the inner function has read-access to all the attributes of the enclosing outer function, and the global variable of this module.

Lambda Function

Lambda functions are anonymous function or un-named function. They are used to inline a function definition, or to defer execution of certain codes. The syntax is:

lambda arg1, arg2, ...: return-expression

For example,

# Define an ordinary function
>>> def f1(a, b, c): return a + b + c

>>> f1(1, 2, 3)
6
>>> type(f1)
<class 'function'>

# Define a Lambda function and assign to a variable
>>> f2 = lambda a, b, c: a + b + c

>>> f2(1, 2, 3)  # Invoke function
6
>>> type(f2)
<class 'function'>

f1 and f2 do the same thing. Take note that return keyword is NOT needed inside the lambda function.

Lambda function, like ordinary function, can have default values for its parameters.

>>> f3 = lambda a, b=2, c=3: a + b + c
>>> f3(1, 2, 3)
6
>>> f3(8)
13

Functions are Objects

In Python, functions are objects (like instances of a class). Like any object,

  1. a function can be assigned to a variable;
  2. a function can be passed into a function as an argument; and
  3. a function can be the return value of a function.
Example: Passing a Function Object as a Function Argument

A function name is a variable name that can be passed into another function as argument.

def my_add(x, y):
    return x + y

def my_sub(x, y):
    return x - y

# This function takes a function object as its first argument
def my_apply(func, x, y):
    # Invoke the function received
    return func(x, y)

print(my_apply(my_add, 3, 2))  # Output: 5
print(my_apply(my_sub, 3, 2))  # Output: 1

# We can also pass an anonymous function as argument
print(my_apply(lambda x, y: x * y, 3, 2))  # Output: 6
Example: Returning an Inner Function object from an Outer Function
# Define an outer function
def my_outer():
    # Outer has a function local variable
    def my_inner():  
        print('hello from inner')

    # Outer returns the inner function defined earlier
    return my_inner

result = my_outer()  # Invoke outer function, which returns a function object
result()             # Invoke the return function. Output: 'hello from inner'
print(result)        # Output: '<function inner at 0x7fa939fed410>'
Example: Returning a Lambda Function
def increase_by(n):
    """Return a one-argument function object"""
    return lambda x: x + n

plus_8 = increase_by(8)    # Return a specific invocation of the function,
                           # which is also a function that takes one argument
plus_88 = increase_by(88)

print(plus_8(1))    # Run the function with one argument. Outpu: 9
print(plus_88(1))   # 89

# Same as above with anonymous references
print(increase_by(8)(1))   
print(increase_by(88)(1))
Function Closure

In the above example, n is not local to the lambda function. Instead, n is obtained from the outer function.

When we assign increase_by(8) to plus_8, n takes on the value of 8 during the invocation. But we expect n to go out of scope after the outer function terminates. If this is the case, calling plus_8(1) would encounter an non-existent n?

This problem is resolved via so called Function Closure. A closure is an inner function that is passed outside the enclosing function, to be used elsewhere. In brief, the inner function creates a closure (enclosure) for its enclosing namespaces at definition time. Hence, in plus_8, an enclosure with n=8 is created; while in plus_88, an enclosure with n=88 is created. Take note that Python only allows the read access to the outer scope, but not assignment. You can inspect the enclosure via function_name.func_closure, e.g.,

print(plus_8.func_closure)   # (<cell at 0x7f01c3909c90: int object at 0x16700f0>,)
print(plus_88.func_closure)  # (<cell at 0x7f01c3909c20: int object at 0x1670900>,)

Using Lambda function in filter(), map(), reduce() and Comprehension

Instead of using a for-in loop to iterate through all the items in a iterable (sequence), you can use the following functions to apply an operation to all the items. This is known as functional programming or expression-oriented programming.

  • filter(func, iterable): Return an iterator yielding those items of iterable for which func(item) is True. For example
    >>> lst = [11, 22, 33, 44, 55]
    >>> filter(lambda x: x % 2 == 0, lst)
    <filter object at 0x7fc46f72b8d0>
    >>> list(filter(lambda x: x % 2 == 0, lst))
    [22, 44]
    >>> for item in filter(lambda x: x % 2 == 0, lst): print(item, end=' ')
    22 44
  • map(func, iterable): Apply (or Map) the function on each item of the iterable. For example,
    >>> lst = [11, 22, 33, 44, 55]
    >>> map(lambda x: x*x, lst)
    <map object at 0x7fc46f72b908>
    >>> list(map(lambda x: x*x, lst))
    [121, 484, 1089, 1936, 3025]
    >>> for item in map(lambda x: x*x, lst): print(item, end=' ')
    121 484 1089 1936 3025
  • reduce(func, iterable) (in module functools): Apply the function of two arguments cumulatively to the items of a sequence, from left to right, so as to reduce the sequence to a single value. For example,
    >>> lst = [11, 22, 33, 44, 55]
    >>> from functools import reduce
    >>> reduce(lambda x,y: x+y, lst)
    165    # (((11 + 22) + 33) + 44) + 55
    
  • List comprehension: a one-liner to generate a list as discussed in the earlier section. e.g.,
    >>> lst = [x*x for x in range(1,10) if x % 2 == 0]
    >>> lst
    [4, 16, 36, 64]

Decorators

In Python, a decorator is a callable (function) that takes a function as an argument and returns a replacement function. Recall that functions are objects in Python, i.e., you can pass a function as argument, and return a function. A decorator is a transformation of a function. It can be used to pre-process the function arguments before passing them into the actual function; or extending the behavior of functions that you don't want to modify, such as ascertain that the user has logined and has the necessary permissions.

Example: Decorating an 1-arugment Function
def clamp_range(func):
    """Decorator to clamp the value of the argument to [0,100]"""
    def _wrapper(x):    # Applicable to functions of 1 argument
        if x < 0:
            x = 0
        elif x > 100:
            x = 100
        return func(x)  # Run the original 1-argument function with clamped argument
    return _wrapper

def square(x):
    return x**2

# Invoke clamp_range() with square()
print(clamp_range(square)(5))   # 25

# Transforming the square() function by replacing it with a decorated version
square = clamp_range(square)  # Assign the decorated function back to the original
print(square(50))    # Output: 2500
print(square(-1))    # Output: 0
print(square(101))   # Output: 10000

Notes:

  1. The decorator clamp_range() takes a 1-argument function as its argument, and returns an replacement 1-argument function _wrapper(x), with its argument x clamped to [0,100], before applying the original function.
  2. In 'square=clamp_range(square)', we decorate the square() function and assign the decorated (replacement) function to the same function name (confusing?!). After the decoration, the square() takes on a new decorated life!
Example: Using the @ symbol

Using 'square=clamp_range(square)' to decorate a function is messy?! Instead, Python uses the @ symbol to denote the replacement. For example,

def clamp_range(func):
    """Decorator to clamp the value of the argument to [0,100]"""
    def _wrapper(x):
        if x < 0:
            x = 0
        elif x > 100:
            x = 100
        return func(x)  # Run the original 1-arg function with clamped argument
    return _wrapper

# Use the decorator @ symbol
# Same as cube = clamp_range(cube)
@clamp_range
def cube(x):
    return x**3

print(cube(50))    # Output: 12500
print(cube(-1))    # Output: 0
print(cube(101))   # Output: 1000000

For Java programmers, do not confuse the Python decorator @ with Java's annotation like @Override.

Example: Decorator with an Arbitrary Number of Function Arguments

The above example only work for one-argument function. You can use *args and/or **kwargs to handle variable number of arguments. For example, the following decorator log all the arguments before the actual processing.

def logger(func):
    """log all the function arguments"""
    def _wrapper(*args, **kwargs):
        print('The arguments are: %s, %s' % (args, kwargs))
        return func(*args, **kwargs)  # Run the original function
    return _wrapper

@logger
def myfun(a, b, c=3, d=4):
    pass

myfun(1, 2, c=33, d=44)  # Output: The arguments are: (1, 2), {'c': 33, 'd': 44}
myfun(1, 2, c=33)        # Output: The arguments are: (1, 2), {'c': 33}

We can also modify our earlier clamp_range() to handle an arbitrary number of arguments:

def clamp_range(func):
    """Decorator to clamp the value of ALL arguments to [0,100]"""
    def _wrapper(*args):
        newargs = []
        for item in args:
            if item < 0:
                newargs.append(0)
            elif item > 100:
                newargs.append(100)
            else:
                newargs.append(item)
        return func(*newargs)  # Run the original function with clamped arguments
    return _wrapper

@clamp_range
def my_add(x, y, z):
    return x + y + z

print(my_add(1, 2, 3))     # Output: 6
print(my_add(-1, 5, 109))  # Output: 105
The @wraps Decorator

Decorator can be hard to debug. This is because it wraps around and replaces the original function and hides variables like __name__ and __doc__. This can be solved by using the @wraps of functools, which modifies the signature of the replacement functions so they look more like the decorated function. For example,

from functools import wraps
  
def without_wraps(func):
    def _wrapper(*args, **kwargs):
        """_wrapper without_wraps doc-string"""
        return func(*args, **kwargs)
    return _wrapper
 
def with_wraps(func):
    @wraps(func)
    def _wrapper(*args, **kwargs):
        """_wrapper with_wraps doc-string"""
        return func(*args, **kwargs)
    return _wrapper
 
@without_wraps
def fun_without_wraps():
    """fun_without_wraps doc-string"""
    pass
 
@with_wraps
def fun_with_wraps():
    """fun_with_wraps doc-string"""
    pass

# Show the _wrapper
print(fun_without_wraps.__name__)  # Output: _wrapper
print(fun_without_wraps.__doc__)   # Output: _wrapper without_wraps doc-string
# Show the function
print(fun_with_wraps.__name__)     # Output: fun_with_wraps
print(fun_with_wraps.__doc__)      # Output: fun_with_wraps doc-string
Example: Passing Arguments into Decorators

Let's modify the earlier clamp_range decorator to take two arguments - min and max of the range.

from functools import wraps

def clamp_range(min, max):    # Take the desired arguments instead of func
    """Decorator to clamp the value of ALL arguments to [min,max]"""
    def _decorator(func):     # Take func as argument
        @wraps(func)          # For proper __name__, __doc__
        def _wrapper(*args):  # Decorate the original function here
            newargs = []
            for item in args:
                if item < min:
                    newargs.append(min)
                elif item > max:
                    newargs.append(max)
                else:
                    newargs.append(item)
            return func(*newargs)  # Run the original function with clamped arguments
        return _wrapper
    return _decorator

@clamp_range(1, 10)
def add(x, y, z):
    """Clamped Add"""
    return x + y + z
# Same as
# add = clamp_range(min, max)(add)
# 'clamp_range(min, max)' returns '_decorator(func)'; apply 'add' as 'func'

print(add(1, 2, 3))     # Output: 6
print(add(-1, 5, 109))  # Output: 16 (1+5+10)
print(add.__name__)     # Output: add
print(add.__doc__)      # Output: Clamped Add

The decorator clamp_range takes the desired arguments and returns a wrapper function which takes a function argument (for the function to be decorated).

Namespace

Names, Namespaces and Scope

In Python, a name is roughly analogous to a variable in other languages but with some extras. Because of the dynamic nature of Python, a name is applicable to almost everything, including variable, function, class/instance, module/package.

Names defined inside a function are local. Names defined outside all functions are global for that module, and are accessible by all functions inside the module (i.e., module-global scope). There is no all-module-global scope in Python.

A namespace is a collection of names (i.e., a space of names).

A scope refers to the portion of a program from where a names can be accessed without a qualifying prefix. For example, a local variable defined inside a function has local scope (i.e., it is available within the function, and NOT available outside the function).

Each Module has a Global Namespace

A module is a file containing attributes (such as variables, functions and classes). Each module has its own global namespace. Hence, you cannot define two functions or classes of the same name within a module. But you can define functions of the same name in different modules, as the namespaces are isolated.

When you launch the interactive shell, Python creates a module called __main__, with its associated global namespace. All subsequent names are added into __main__'s namespace.

When you import a module via 'import <module_name>' under the interactive shell, only the <module_name> is added into __main__'s namespace. You need to access the names (attributes) inside <module_name> via <module_name>.<attr_name>. In other words, the imported module retains its own namespace and must be prefixed with <module_name>. inside __main__. (Recall that the scope of a name is the portion of codes that can access it without prefix.)

However, if you import an attribute via 'from <module_name> import <attr_name>' under the interactive shell, the <attr_name> is added into __main__'s namespace, and you can access the <attr_name> directly without prefixing with the <module_name>.

On the other hand, when you import a module inside another module (instead of interactive shell), the imported <module_name> is added into the target module's namespace (instead of __main__ for the interactive shell).

The built-in functions are kept in a module called __builtin__, which is imported into __main__ automatically.

The globals(), locals() and dir() Built-in Functions

You can list the names of the current scope via these built-in functions:

  • globals(): return a dictionary (name-value pairs) containing the current scope's global variables.
  • locals(): return a dictionary (name-value pairs) containing the current scope's local variables. If locals() is issued in global scope, it returns the same outputs as globals().
  • dir(): return a list of local names in the current scope, which is equivalent to locals().keys().
  • dir(obj): return a list of the local names for the given object.

For example,

$ python3
# The current scope is the __main__ module's global scope.

>>> globals()  # Global variable of the current scope
{'__name__': '__main__',  # module name
 '__builtins__': <module 'builtins' (built-in)>,  # Hook to built-in names
 '__doc__': None,  # Module's doc-string
 '__package__': None,  # package name
 '__spec__': None,
 '__loader__': <class '_frozen_importlib.BuiltinImporter'>}

>>> __name__  # The module name of the current scope
'__main__'
 
>>> locals()
...same outputs as global() under the global-scope...

>>> dir()  # Names (local) only
['__builtins__', '__doc__', '__loader__', '__name__', '__package__', '__spec__']

# Add a name (current global scope)
>>> x = 88
>>> globals()
{'x': 88, ...}
>>> dir()
['__builtins__', '__doc__', '__loader__', '__name__', '__package__', '__spec__', 'x']

# import
>>> import random
>>> globals()
{'x': 88,
 'random': <module 'random' from '/usr/lib/python3.4/random.py'>,   # Hook to the imported module
 ......}
>>> from math import pi
>>> globals()
{'x': 88,
 'pi': 3.141592653589793,  # Added directly into global namespace
 'random': <module 'random' from '/usr/lib/python3.4/random.py'>, 
 ......}

To show the difference between locals and globals, we need to define a function to create a local scope. For example,

$ python3
>>> x = 88  # x is a global variable

>>> def myfun(arg):  # arg is a local variable
   y = 99            # y is a local variable
   print(x)          # Can read global
   print(globals())
   print(locals())
   print(dir())
   
>>> myfun(11)
88
{'__builtins__': <module 'builtins' (built-in)>,  # Name-value pairs of globals
 'myfun': <function myfun at 0x7f550d1b5268>,
 '__name__': '__main__',
 '__package__': None,
 '__spec__': None,
 '__doc__': None,
 '__loader__': <class '_frozen_importlib.BuiltinImporter'>,
 'x': 88}             
{'y': 99, 'arg': 11}  # Name-value pairs of locals
['arg', 'y']          # Names only of locals
More on Module's Global Namespace

Let's create two modules: mod1 and mod2, where mod1 imports mod2, as follows:

"""mod1.py: Module 1"""
import mod2

mod1_var = 'mod1 global variable'
print('Inside mod1, __name__ = ', __name__)

if __name__ == '__main__':
    print('Run module 1')
"""mod2.py: Module 2"""

mod2_var = 'mod2 global variable'
print('Inside mod2, __name__ = ', __name__)

if __name__ == '__main__':
    print('Run module 2')

Let's import mod1 (which in turn import mod2) under the interpreter shell, and check the namespaces:

$ python3
>>> import mod1
Inside mod2, __name__ =  mod2   # from imported mod2
Inside mod1, __name__ =  mod1
>>> dir()
['__builtins__', '__doc__', '__loader__', '__name__', '__package__', '__spec__', 'mod1']  # no mod2, which is referenced as mod1.mod2
>>> dir(mod1)
['__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__spec__', 'mod1_var', 'mod2']
>>> dir(mod2)
NameError: name 'mod2' is not defined
>>> dir(mod1.mod2)
['__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__spec__', 'mod2_var']

Take note that the interpreter's current scope __name__ is __main__. It's namespace contains mod1 (imported). The mod1's namespace contains mod2 (imported) and mod1_var. To refer to mod2, you need to go thru mod1, in the form of mod1.mod2. The mod1.mod2's namespace contains mod2_var.

Now, let run mod1 instead, under IDLE3, and check the namespaces:

Inside mod2, __name__ =  mod2
Inside mod1, __name__ =  __main__
Run module 1
>>> dir()
['__builtins__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__spec__', 'mod1_var', 'mod2']
>>> dir(mod2)
['__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__spec__', 'mod2_var']

Take note that the current scope's name is again __main__, which is the executing module mod1. Its namespace contains mod2 (imported) and mod1_var.

Name Resolution

When you ask for a name (variable), says x, Python searches the LEGB namespaces, in this order, of the current scope:

  1. L: Local namespace which is specific to the current function
  2. E: for nested function, the Enclosing function's namespace
  3. G: Global namespace for the current module
  4. B: Built-in namespace for all the modules

If x cannot be found, Python raises a NameError.

Modifying Global Variables inside a Function

Recall that names created inside a function are local, while names created outside all functions are global for that module. You can "read" the global variables inside all functions defined in that module. For example,

x = 'global'     # Global file-scope
 
def myfun():
    y = 'local'  # Function local-scope
    print(y)
    print(x)     # Can read global variable
 
myfun()
print(x)
#print(y)        # Local out-of-scope

If you assign a value to a name inside a function, a local name is created, which hides the global name. For example,

x = 'global'      # Global file-scope
 
def myfun():
    x = 'change'  # Local x created which hides the global x
    print(x)      # Show local. Global is hidden
 
myfun()
print(x)          # Global does not change

To modify a global variable inside a function, you need to use a global statement to declare the name global; otherwise, the modification (assignment) will create a local variable (see above). For example,

x = 'global'      # Global file-scope
 
def myfun():
    global x      # Declare x global, so as to modify global variable
    x = 'change'  # Else, a local x created which hides the global x
    print(x)
 
myfun()
print(x)          # Global changes

For nested functions, you need to use the nonlocal statement in the inner function to modify names in the enclosing outer function. For example,

def outer():        # Outer function
    count = 0
    
    def inner():    # Inner function
        nonlocal count  # Needed to modify count
        count += 1      # Else, a local created, which hides the outer

    print(count)    # Output: 0
    inner()         # Call inner function
    print(count)    # Output: 1

# Call outer function 
outer()

To modify a global variable inside a nested function, declare it via global statement too. For example,

count = 100

def outer():
    count = 0         # Local created, hide global
    
    def inner():
        global count  # Needed to modify global
        count += 1    # Else, a local created, which hides the outer

    print(count)      # Output: 0
    inner()           # Call inner function
    print(count)      # Output: 0

# Call outer function 
outer()
print(count)          # Output: 101

In summary,

  1. The order for name resolution (for names inside a function) is: local, enclosing function for nested def, global, and then the built-in namespaces (i.e., LEGB).
  2. However, if you assign a new value to a name, a local name is created, which hides the global name.
  3. You need to declare via global statement to modify globals inside the function. Similarly, you need to declare via nonlocal statement to modify enclosing local names inside the nested function.
More on global Statement

The global statement is necessary if you are changing the reference to an object (e.g. with an assignment). It is not needed if you are just mutating or modifying the object. For example,

>>> a = []
>>> def myfun():
        a.append('hello')   # Don't need global. No change of reference for a.
 
>>> myfun()
>>> a
['hello']

In the above example, we modify the contents of the array. The global statement is not needed.

>>> a = 1
>>> def myfun():
       global a
       a = 8

>>> myfun()
>>> a
8

In the above example, we are modifying the reference to the variable. global is needed, otherwise, a local variable will be created inside the function.

Built-in Namespace

The built-in namespace is defined in the __builtins__ module, which contains built-in functions such as len(), min(), max(), int(), float(), str(), list(), tuple() and etc. You can use help(__builtins__) or dir(__builtins__) to list the attributes of the __builtins__ module.

[TODO]

del Statement

You can use del statement to remove names from the namespace, for example,

>>> del x, pi     # delete variables or imported attributes
>>> globals()
...... x and pi removed ......
>>> del random    # remove imported module
>>> globals()
...... random module removed ......

If you override a built-in function, you could also use del to remove it from the namespace to recover the function from the built-in space.

>>> len = 8       # Override built-in function len() (for length)
>>> len('abc')    # built-in function len() no longer available
TypeError: 'int' object is not callable
>>> del len       # Delete len from global and local namespace
>>> len('abc')    # built-in function len() is available
3

File Input/Output

File Objects

Python provides built-in functions to support file input/output:

  • open(filename_str, mode): returns a file object. The valid modes are: 'r' (read-only, default), 'w' (write - erase all contents for existing file), 'a' (append), 'r+' (read and write). You can also use 'rb', 'wb', 'ab', 'rb+' for binary mode (raw bytes) operations.
  • file.close(): closes the file object.
  • file.readline(): reads a single line (up to a newline and including the newline). It returns an empty string after the end-of-file (EOF).
  • file.read(): reads the entire file. It returns an empty string after the end-of-file (EOF).
  • file.write(str): writes the given string to the file.
  • file.tell(): returns the "current position". The "current position" is the number of bytes from the beginning of the file in binary mode, and an opaque number in text mode.
  • file.seek(offset): sets the "current position" to offset from the beginning of the file.

For example,

>>> f = open('test.txt', 'w')  # Create (open) a file for write
>>> f.write('apple\n')         # Write given string to file
>>> f.write('orange\n')
>>> f.close()                  # Close the file
 
>>> f = open('test.txt', 'r')  # Create (open) a file for read (default)
>>> f.readline()               # Read till newline
'apple\n'
>>> f.readline()
'orange\n'
>>> f.readline()               # Return empty string after end-of-file
''
>>> f.close()
 
>>> f = open('test.txt', 'r')
>>> f.read()                   # Read entire file
'apple\norange\n'
>>> f.close()

# Test tell() and seek()
>>> f = open('test.txt', 'r')
>>> f.tell()
0
>>> f.read()
'apple\norange\n'
>>> f.tell()
13
>>> f.read()
''
>>> f.seek(0)  # Rewind
0
>>> f.read()
'apple\norange\n'
>>> f.close()
Iterating through Files

You can process a text file line-by-line via a for-in loop.

with open('test.txt') as f:    # Auto close the file upon exit
    for line in f:
        line = line.rstrip()   # Strip trailing spaces and newline
        print(line)

# Same as above
f = open('test.txt', 'r')
for line in f:
    print(line.rstrip())
f.close()

Each line includes a newline. For example,

>>> f = open('temp.txt', 'w')
>>> f.write('apple\n')
6
>>> f.write('orange\n')
7
>>> f.close()

>>> f = open('temp.txt', 'r')
>>> for line in f: print(line, end='')  # line includes a newline, disable print()'s default newline
apple
orange
>>> f.close()

Assertion and Exception Handling

assert Statement

You can use assert statement to test a certain assertion (or constraint). For example, if x is supposed to be 0 in a certain part of the program, you can use the assert statement to test this constraint. An AssertionError will be raised if x is not zero.

For example,

>>> x = 0
>>> assert x == 0, 'x is not zero?!'  # Assertion true, no output
 
>>> x = 1
>>> assert x == 0, 'x is not zero?!'  # Assertion false, raise AssertionError with the message
......
AssertionError: x is not zero?!

The assertions are always executed in Python.

Syntax

The syntax for assert is:

assert test, error-message

If the test if True, nothing happens; otherwise, an AssertionError will be raised with the error-message.

Exceptions

In Python, errors detected during execution are called exceptions. For example,

>>> 1/0        # Divide by 0
......
ZeroDivisionError: division by zero
>>> zzz        # Variable not defined
......
NameError: name 'zzz' is not defined
>>> '1' + 1    # Cannot concatenate string and int
......
TypeError: Can't convert 'int' object to str implicitly

>>> lst = [0, 1, 2]
>>> lst[3]        # Index out of range
......
IndexError: list index out of range
>>> lst.index(8)  # Item is not in the list
......
ValueError: 8 is not in list

>>> int('abc')    # Cannot parse this string into int
......
ValueError: invalid literal for int() with base 10: 'abc'

>>> tup = (1, 2, 3)
>>> tup[0] = 11    # Tuple is immutable
......
TypeError: 'tuple' object does not support item assignment

Whenever an exception is raised, the program terminates abruptly.

try-except-else-finally

You can use try-except-else-finally exception handling facility to prevent the program from terminating abruptly.

Example 1: Handling Index out-of-range for List Access
def get_item(seq, index):
    """Return the indexed item of the given sequences."""
    try:
        result = seq[index]   # may raise IndexError
        print('try succeed')      
    except IndexError:
        result = 0
        print('Index out of range')
    except:        # run if other exception is raised
        result = 0
        print('other exception')
    else:          # run if no exception raised
        print('no exception raised')
    finally:       # always run regardless of whether exception is raised
        print('run finally')

    # Continue into the next statement after try-except-finally instead of abruptly terminated.
    print('continue after try-except')
    return result
 
print(get_item([0, 1, 2, 3], 1))  # Index within the range
print('-----------')
print(get_item([0, 1, 2, 3], 4))  # Index out of range

The expected outputs are:

try succeed
no exception raised
run finally
continue after try-except
1
-----------
Index out of range
run finally
continue after try-except
0

The exception handling process for try-except-else-finally is:

  1. Python runs the statements in the try-block.
  2. If no exception is raised in all the statements of the try-block, all the except-blocks are skipped, and the program continues to the next statement after the try-except statement.
  3. However, if an exception is raised in one of the statement in the try-block, the rest of try-block will be skipped. The exception is matched with the except-blocks. The first matched except-block will be executed. The program then continues to the next statement after the try-except statement, instead of terminates abruptly. Nevertheless, if none of the except-blocks is matched, the program terminates abruptly.
  4. The else-block will be executable if no exception is raised.
  5. The finally-block is always executed for doing house-keeping tasks such as closing the file and releasing the resources, regardless of whether an exception has been raised.
Syntax

The syntax for try-except-else-finally is:

try:
    statements
except exception-1:                # Catch one exception
    statements
except (exception-2, exception-3): # Catch multiple exceptions
    statements
except exception-4 as var_name:    # Retrieve the exception instance
    statements
except:         # For (other) exceptions
    statements
else:
    statements   # Run if no exception raised
finally:
    statements   # Always run regardless of whether exception raised

The try-block (mandatory) must follow by at least one except or finally block. The rests are optional.

CAUTION: Python 2 uses older syntax of "except exception-4, var_name:", which should be re-written as "except exception-4 as var_name:" for portability.

Example 2: Input Validation
>>> while True:
       try:
           x = int(input('Enter an integer: '))  # Raise ValueError if input cannot be parsed into int
           break                                 # Break out while-loop
       except ValueError:
           print('Wrong input! Try again...')    # Repeat while-loop

Enter an integer: abc
Wrong input! Try again...
Enter an integer: 11.22
Wrong input! Try again...
Enter an integer: 123

raise Statement

You can manually raise an exception via the raise statement, for example,

>>> raise IndexError('out-of-range')
......
IndexError: out-of-range

The syntax is:

raise exception_class_name     # E.g. raise IndexError
raise exception_instance_name  # E.g. raise IndexError('out of range')
raise                          # Re-raise the most recent exception for propagation

A raise without argument in the except block re-raise the exception to the outer block, e.g.,

try:
    ......
except:
    raise   # re-raise the exception (for the outer try)

Built-in Exceptions

  • BaseException, Exception, StandardError: base classes
  • ArithmeticError: for OverflowError, ZeroDivisionError, FloatingPointError.
  • BufferError:
  • LookupError: for IndexError, KeyError.
  • Environment: for IOError, OSError.
  • [TODO] more

User-defined Exception

You can defined your own exception by sub-classing the Exception class.

Example
class MyCustomError(Exception):  # Sub-classing Exception base class
    """My custom exception"""

    def __init__(self, value):
        """Constructor"""
        self.value = value

    def __str__(self):
        return repr(self.value)

# Test the exception defined
try:
    raise MyCustomError('an error occurs')
    print('after exception')
except MyCustomError as e:
    print('MyCustomError: ', e.value)
else:
    print('running the else block')
finally:
    print('always run the finally block')

with-as Statement and Context Managers

The syntax of the with-as statement is as follows:

with ... as ...:
    statements
   
# More than one items
with ... as ..., ... as ..., ...:
    statements

Python’s with statement supports the concept of a runtime context defined by a context manager. In programming, context can be seen as a bucket to pass information around, i.e., the state at a point in time. Context Managers are a way of allocating and releasing resources in the context.

Example 1
with open('test.log', 'r') as infile:  # automatically close the file at the end of with
    for line in infile:
        print(line)

This is equivalent to:

infile = open('test.log', 'r')
try:
    for line in infile:
        print(line)
finally:
    infile.close()

The with-statement's context manager acquires, uses, and releases the context (of the file) cleanly, and eliminate a bit of boilerplate.

However, the with-as statement is applicable to certain objects only, such as file; while try-finally can be applied to all.

Example 2:
# Copy a file
with open('in.txt', 'r') as infile, open('out.txt', 'w') as outfile:
    for line in infile:
        outfile.write(line)

Commonly-Used Python Standard Library Modules

Python provides a set of standard library. (Many non-standard libraries are provided by third party!)

To use a module, use 'import <module_name>' or 'from <module_name> import <attribute_name>' to import the entire module or a selected attribute. You can use 'dir(<module_name>)' to list all the attributes of the module, 'help(<module_name>)' or 'help(<attribute_name>)' to read the documentation page. For example,

>>> import math   # import an external module
>>> dir(math)     # List all attributes
['e', 'pi', 'sin', 'cos', 'tan', 'tan2', ....]
>>> help(math)    # Show the documentation page for the module
......
>>> help(math.atan2)  # Show the documentation page for a specific attribute
......
>>> math.atan2(3, 0)
1.5707963267948966
>>> math.sin(math.pi / 2)
1.0
>>> math.cos(math.pi / 2)
6.123233995736766e-17

>>> from math import pi  # import an attribute from a module
>>> pi
3.141592653589793

math and cmath Modules

The math module provides access to the mathematical functions defined by the C language standard. The commonly-used attributes are:

  • Constants: pi, e.
  • Power and exponent: pow(x,y), sqrt(x), exp(x), log(x), log2(x), log10(x)
  • Converting float to int: ceil(x), floor(x), trunc(x).
  • float operations: fabs(), fmod()
  • hypot(x,y) (=sqrt(x*x + y*y))
  • Conversion between degrees and radians: degrees(x), radians(x).
  • Trigonometric functions: sin(x), cos(x), tan(x), acos(x), asin(x), atan(x), atan2(x,y).
  • Hyperbolic functions: sinh(x), cosh(x), tanh(x), asinh(x), acosh(x), atanh(x).

For example,

>>> import math
>>> dir(math)
......
>>> help(math)
......
>>> help(math.trunc)
......

# Test floor(), ceil() and trunc()
>>> x = 1.5
>>> type(x)
<class 'float'>
>>> math.floor(x)
1
>>> type(math.floor(x))
<class 'int'>
>>> math.ceil(x)
2
>>> math.trunc(x)
1
>>> math.floor(-1.5)
-2
>>> math.ceil(-1.5)
-1
>>> math.trunc(-1.5)
-1

# [TODO] other functions

In addition, the cmath module provides mathematical functions for complex numbers. See Python documentation for details.

statistics Module

The statistics module computes the basic statistical properties such as mean, median, variance, and etc. (Many third-party vendors provide advanced statistics packages!) For example,

>>> import statistics
>>> dir(statistics)
['mean', 'median', 'median_grouped', 'median_high', 'median_low', 'mode', 'pstdev', 'pvariance', 'stdev', 'variance', ...]
>>> help(statistics)
......
>>> help(statistics.pstdev)
......

>>> data = [5, 7, 8, 3, 5, 6, 1, 3]
>>> statistics.mean(data)
4.75
>>> statistics.median(data)
5.0
>>> statistics.stdev(data)
2.3145502494313788
>>> statistics.variance(data)
5.357142857142857
>>> statistics.mode(data)
statistics.StatisticsError: no unique mode; found 2 equally common values

random Module

The module random can be used to generate various pseudo-random numbers.

For example,

>>> import random
>>> dir(random)
......
>>> help(random)
......
>>> help(random.random)
......

>>> random.random()       # float in [0,1)
0.7259532743815786
>>> random.random()
0.9282534690123855
>>> random.randint(1, 6)  # int in [1,6]
3
>>> random.randrange(6)   # From range(6), i.e., 0 to 5
0
>>> random.choice(['apple', 'orange', 'banana'])  # Pick from the given list
'apple'

sys Module

The module sys (for system) provides system-specific parameters and functions. The commonly-used are:

  • sys.exit([exit-status=0]): exit the program by raising the SystemExit exception. If used inside a try, the finally clause is honored.
    The optional argument exit-status can be an integer (default to 0 for normal termination, or non-zero for abnormal termination); or any object (e.g., sys.exit('an error message')).
  • sys.path: A list of module search-paths. Initialized from the environment variable PYTHONPATH, plus installation-dependent default entries. See earlier example.
  • sys.stdin, sys.stdout, sys.stderr: standard input, output and error stream.
  • sys.argv: A list of command-line arguments passed into the Python script. argv[0] is the script name. See example below.
Example: Command-Line Arguments

The command-line arguments are kept in sys.argv as a list. For example, create the following script called "test_argv.py":

import sys
print(sys.argv)       # Print command-line argument list
print(len(sys.argv))  # Print length of list

Run the script:

$ python test_argv.py
['test_argv.py']
1
 
$ python test_argv.py hello 1 2 3 apple orange
['test_argv.py', 'hello', '1', '2', '3', 'apple', 'orange']   # list of strings
7

os Module

The module os provides interface to the Operating System.

However,

  • If you just want to read or write a file, use built-in function open().
  • If you just want to manipulate paths, use os.path module.
  • If you want to read all the lines in all the files on the command-line, use fileinput module.
  • To create temporary files/directories, use tempfile module.

The commonly-used attributes are:

  • os.mkdir(path, mode=0777): Create a directory with the given mode (further masked by environment variable umask). mode is ignored in Windows.
  • os.mkdirs(path, mode=0777]): Similar to mkdir, but create the intermediate sub-directories, if needed.
  • os.getcwd(): Return the current working directory (CWD).
  • os.chdir(path): Change the current working directory (CWD).
  • os.system(command): Run a shell command.
  • os.getenv(varname, value=None): Returns the environment variable if it exists, or value if it doesn't, with default of None.
  • os.putenv(varname, value): Set environment variable to value.
  • os.unsetenv(varname): Delete the environment variable.

For examples,

>>> import os
>>> dir(os)          # List all attributes
......
>>> help(os)         # Show man page
......
>>> help(os.getcwd)  # Show man page for specific function
......

>>> os.getcwd()                   # Get current working directory
...current working directory...
>>> os.listdir('.')               # List the contents of the current directory
...contents of current directory...
>>> os.chdir('test-python')       # Change directory
>>> exec(open('hello.py').read()) # Run a Python script
>>> os.system('ls -l')            # Run shell command
>>> os.name                       # Name of OS
'posix'
>>> os.makedirs(dir)              # Create sub-directory
>>> os.remove(file)               # Remove file
>>> os.rename(oldFile, newFile)   # Rename file
Listing a directory - os.listdir() and os.walk()

You can use os.listdir(path) to list all the entries in a given directory. You can also use os.walk(path) to recursively list all the entries in a given directory. For example,

>>> import os
>>> os.listdir('.')
    # Return a list of entries in the given directory
>>> for f in sorted(os.listdir('.')):
        print(f)

>>> import os
>>> for root, dirs, files in os.walk('.'):
        dirs.sort()  # sort the directories in alphabetical order
        for d in dirs:
            print(os.path.join(root, d))
        for f in sorted(files):   
            print(os.path.join(root, f))
    # Recursively print all the entries in the given directory

os.path Module

The os.path module implements some useful functions on paths. In Python, a path could refer to a simple filename, full-path filename (absolute or relative), a directory, or a symlink.

  • os.path.sep: the directory separator, '\' for Windows and '/' for Unix/Mac OS.
  • os.path.dirname(path): Return the directory component of the given file.
  • os.path.abspath(path): Return the absolute filename of the given file.
  • os.path.realpath(path): Return the canonical filename of the given file, eliminating any symlink encountered in the path.
  • os.path.join(path, *paths): Form a path by joining one or more path components intelligently, without using platform-dependent path separator ('/' or '\'). For absolute path, begin with os.path.sep. See example below.
  • os.path.exists(path): Check if the given path exists.
  • os.path.isfile(path), os.path.isdir(path), os.path.islink(path): Check if the given pathname is a file, a direcotry, or a symlink.

EXAMPLE: When a module is loaded in Python, __file__ is set to its name. You can then use that with other functions to find the directory that the file is located in.

"""test_ospath.py (also create a symlink to this file called test_ospath_link.py)"""
import os

print(__file__)
print(os.path.dirname(__file__))   # extract the directory component of __file__

print(os.path.abspath(__file__))                   # absolute filename
print(os.path.dirname(os.path.abspath(__file__)))  # absolute directory

print(os.path.realpath(__file__))                  # filename with symlink resolved, if any
print(os.path.dirname(os.path.realpath(__file__))) # directory name with symlink resolved

# Form path with os.path.join, without '/' or '\' for portability
print(os.path.join(os.path.dirname(__file__), '..'))             # parent directory
print(os.path.join(os.path.sep, 'etc', 'apache2', 'httpd.conf')) # /etc/apache2/httpd.conf
print(os.path.join('etc', 'apache2', 'httpd.conf'))              # etc/apache2/httpd.conf

Try running the script with various file references:

$ python3 test_ospath.py
$ python3 ./test_ospath.py
$ python3 ../parent_dir/test_ospath.py
$ python3 /path/to/test_ospath.py

# Make a symlink
$ ln -s test_ospath.py test_ospath_link.py
# Run via symlink - Check 'abspath' vs. 'realpath'
$ python3 test_ospath_link.py

More Examples,

>>> import os.path
>>> os.path.exists('/etc/apache2')  # Check if path exists (as file or directory)
True
>>> os.path.isfile('/etc/apache2')  # Check if path is a file
False
>>> os.path.isdir('/etc/apache2')   # Check if path is a directory
True

fileinput Module

The fileinput module provides support for processing lines of input from one or more files given in the command-line arguments (sys.argv). For example, create the following script called "test_fileinput.py":

"""Process all the files given in the command-line arguments"""
import fileinput

def main():
    """Get lines from all the file given in the command-line arguments"""
    for line in fileinput.input():
        # process each line from all the files (may use formatted output)
        print(line)

main()  # Run the main() function

Run the script:

$ python test_fileinput.py file1 file2
......

re module for Regular Expression

The re module provides support for regular expression (or regex in short).

>>> import re
>>> dir(re)   # List all attributes
......
>>> help(re)  # Show man page
......
    # The man page lists all the special characters and meta-characters used by Python's regex.
re.compile()

To create a Regex's Pattern object, use re.compile(regex-str), e.g.,

>>> import re
>>> p1 = re.compile(r'[1-9][0-9]*|0')
      # non-negative integer (begin with 1-9, followed by 0 or more 0-9; or 0)
>>> type(p1)
<class '_sre.SRE_Pattern'>

>>> p2 = re.compile(r'^\w{6,10}$')
      # 6-10 characters (^ matches the begin, $ matches the end, \w matches word character.)

>>> p3 = re.compile(r'ab*', re.IGNORECASE)
      # a followed by 0 or more b, case insensitive

The string prefix r denotes a raw string, where you do not need to escape black slashes. e.g., you can write r'^\w{6,10}$', instead of '^\\w{6,15}$' in an ordinary string. Raw strings are handing in writing regex, as regex's meta-characters begin with a black-slash, e.g., \w for a word character, \W for a non-word character, \d for a digit character and \D for a non-digit character.

<pattern>.findall() and re.findall()

The <pattern>.findall(str) finds all the matching substrings in the given str, and returns a list of the matching substrings. For example,

>>> p1.findall('123 456')
['123', '456']
>>> p1.findall('abc')
[]

You can invoke findall() (and all regex functions) in two ways: via a Regex Pattern object (as in the above example); or via the re module, in the form of re.findall(pattern, str). For example,

>>> re.findall(r'[1-9][0-9]*|0', '123 456')  # Provide the regex pattern string
['123', '456']
>>> re.findall(r'[1-9][0-9]*|0', 'abc')
[]
>>> re.findall(p1, '123 456')  # Provide a regex pattern object
['123', '456']
>>> re.findall(p2, '123 456')
[]
<pattern>.sub(), <pattern>.subn(), re.sub(), re.subn()

The <pattern>.sub(replacement, str) substitutes (replaces) all the matching substrings in the given str with the replacement. The subn() is similar, but returns a new string together with the number of replacements in a 2-tuple.

For example,

# Via regex pattern object's member function
>>> p1.sub('---', 'aaa123zzz')
'aaa---zzz'
>>> p1.subn('---', 'aaa123zzz456')
('aaa---zzz---', 2)

# Via re module's function
>>> re.sub(r'[1-9][0-9]*|0', '---', 'aaa123zzz') 
'aaa---zzz'

NOTE: For simple string replacement, use str.replace(old, new) which is more efficient, e.g.,

>>> str = "aaa123bbb"
>>> help(str.replace)
......
>>> str.replace('123', '---')
'aaa---bbb'
Using Back References

In Python, parentheses back references are denoted as \1, \2, and etc. Make sure you enclose the replacement string with a raw string r'...', so as to recognize the back slash.

For example, to swap the first two words using back references:

>>> re.sub(r'(\w+) (\w+)', r'\2 \1', 'hello world')
'world hello'
The Match Object and Functions search(), match() and fullmatch()

The <pattern>.search(str, begin, end) method returns a special Match object encapsulating the first match (or None if no matches). You can then use the following methods to further process the resultant Match object:

  • group(): return the matched substring.
  • start(): return the staring matched position (inclusive)
  • end(): return the ending matched position (exclusive)
  • span(): return a tuple of (start, end) matched position

For example,

>>> str = 'aaa123zzz456'
>>> m = p1.search(str)
>>> m
<_sre.SRE_Match object; span=(3, 6), match='123'>
>>> m.group()
'123'
>>> m.span()
(3, 6)
>>> m.start()
3
>>> m.end()
6

# You can search further by providing the begin and end search positions
# in the form of search(str, begin, end), e.g.,
>>>  m = p1.search(str, m.end())
>>>  m
<_sre.SRE_Match object; span=(9, 12), match='456'>

The search() matches anywhere in the given string (as shown in the above examples). On the other hand, the match() matches from the beginning of the given string; while the fullmatch() matches the entire string (from the beginning to the end). For example,

# match()
>>> m = p1.match('aaa123zzz456')
>>> m
# None
>>> m = p1.match('123zzz456')
>>> m
<_sre.SRE_Match object; span=(0, 3), match='123'>

# fullmatch()
>>> m = p1.fullmatch('123456')
>>> m
<_sre.SRE_Match object; span=(0, 6), match='123456'>
>>> m = p1.fullmatch('123456abc')
>>> m
# None
<pattern>.split() and re.split()

The <pattern>.split(str) split the string into a list, using the Regex Pattern as delimiter. For example,

>>> p1.split('aaa123bbb456ccc')
['aaa', 'bbb', 'ccc']

logging Module

The logging module

The logging module supports a flexible event logging system for your applications and libraries.

The logging supports five levels:

  1. logging.DEBUG: Detailed information meant for debugging.
  2. logging.INFO: Confirmation that an event takes place as expected.
  3. logging.WARNING: Something unexpected happened, but the application is still working.
  4. logging.ERROR: The application does not work as expected.
  5. logging.CRITICAL: Serious error, the application may not be able to continue.

The logging functions are:

  • logging.basicConfig(**kwargs): Perform basic configuration of the logging system. The keyword arguments are: filename, filemode (default to append 'a'), level (log this level and above), and etc.
  • logging.debug(msg, *args, **kwargs), logging.info(), logging.warning(), logging.error(), logging.critical(): Log the msg at the specific level. The args are merged into msg using formatting specifier.
  • logging.log(level, msg, *args, **kwargs): General logging function, at the given log level.
Basic Logging via logging.basicConfig()

Example,

import logging
logging.basicConfig(filename='myapp.log', level=logging.DEBUG)  # This level and above
logging.debug('A debug message')
logging.info('An info message %s, %s', 'apple', 'orange')  # with printf-like format specifiers
logging.error('error %d, some error messages', 1234)

The logging functions support printf-like format specifiers such as %s, %d, with values as function arguments (instead of via % operator in Python).

Run the script. A log file myapp.log would be created, with these records:

DEBUG:root:A debug message
INFO:root:An info message apple, orange
ERROR:root:error 1234, some error messages

By default, the log records inlcude the log-level and logger-name (default of root) before the message.

Getting the Log Level from a Configuration File

Log levels, such as logging.DEBUG and logging.INFO, are stored as certain integers in the logging module. For example,

>>> import logging
>>> logging.DEBUG
10
>>> logging.INFO
20

The log level is typically read from a configuration file, in the form of a descritive string. The following example shows how to convert a string log-level (e.g., 'debug') to the numeric log-level (e.g., 10) used by logging module:

import logging

str_level = 'info'   # Case insensitive

# Convert to uppercase, and get the numeric value
numeric_level = getattr(logging, str_level.upper(), None)
if not isinstance(numeric_level, int):
    raise ValueError('Invalid log level: %s' % str_level)

logging.basicConfig(level=numeric_level)  # Default logging to console

# Test logging
logging.debug('a debug message')  # Not logged
logging.info('an info message')   # Output: INFO:root:an info message
logging.error('an error message') # Output: ERROR:root:an error message
Log Record Format

To set the log message format, use the format keyword:

import logging
logging.basicConfig(
        format='%(asctime)s|%(levelname)s|%(name)s|%(pathname)s:%(lineno)d|%(message)s',
        level=logging.DEBUG)

where asctime for date/time, levelname for log level, name for logger name, pathname for full-path filename (filename for filename only), lineno (int) for the line number, and message for the log message.

Advanced Logging: Logger, Handler, Filter and Formatter

So far, we presented the basic logging facilities. The logging library is extensive and organized into these components:

  • Loggers: expose the methods to application for logging.
  • Handlers: send the log records created by the loggers to the appropriate destination, such as file, console (sys.stderr), email via SMTP, or network via HTTP/FTP.
  • Filters: decide which log records to output.
  • Formatters: specify the layout format of log records.
Loggers

To create a Logger instance, invoke the logging.getLogger(logger-name), where the optional logger-name specifies the logger name (default of root).

The Logger's methods falls into two categories: configuration and logging.

The commonly-used logging methods are: debug(), info(), warning(), error(), critical() and the general log().

The commonly-used configuration methods are:

  • setLevel()
  • addHandler() and removeHandler()
  • addFilter() and removeFilter()
Handlers

The logging library provides handlers like StreamHandler (sys.stderr, sys.stdout), FileHandler, RotatingFileHandler, and SMTPHandler (emails).

The commonly-used methods are:

  • setLevel(): The logger's setLevel() determines which message levels to be passed to the handler; while the handler's setLevel() determines which message level to be sent to the destination.
  • setFormatter(): for formatting the message sent to the destination.
  • addFilter() and removeFilter()

You can add more than one handlers to a logger, possibly handling different log levels. For example, you can add a SMTPHandler to receive emails for ERROR level; and a RotatingFileHandler for INFO level.

Formatters

Attach to a handler (via <handler>.setFormatter()) to format the log messages.

Example: Using Logger with Console Handler and a Formatter
import logging

# Create a logger
logger = logging.getLogger('MyApp')
logger.setLevel(logging.INFO)

# Create a console handler and set log level
ch = logging.StreamHandler()   # Default to sys.stderr 
ch.setLevel(logging.INFO)

# Create a formatter and attach to console handler
formatter = logging.Formatter('%(asctime)s|%(name)s|%(levelname)s|%(message)s')
ch.setFormatter(formatter)

# Add console handler to logger
logger.addHandler(ch)

# Test logging
logger.debug('a debug message')
logger.info('an info message')
logger.warn('a warn message')
logger.error('error %d, an error message', 1234)
logger.critical('a critical message')
  1. There is probably no standard for log record format (unless you have an analysis tool in mind)?! But I recommend that you choose a field delimiter which does not appear in the log messages, for ease of processing of log records (e.g., export to spreadsheet).

The expected outputs are:

2015-12-09 00:32:33,521|MyApp|INFO|an info message
2015-12-09 00:32:33,521|MyApp|WARNING|a warn message
2015-12-09 00:32:33,521|MyApp|ERROR|error 1234: an error message
2015-12-09 00:32:33,521|MyApp|CRITICAL|a critical message
Example: Using Rotating Log Files with RotatingFileHandler
import logging
from logging.handlers import RotatingFileHandler

# Configuration data in a dictionary
config = {
        'loggername'  : 'myapp',
        'logLevel'    : logging.INFO,
        'logFilename' : 'test.log',
        'logFileBytes': 300,          # for testing only
        'logFileCount': 3}

# Create a Logger and set log level
logger = logging.getLogger(config['loggername'])
logger.setLevel(config['logLevel'])
 
# Create a rotating file handler
handler = RotatingFileHandler(
        config['logFilename'], 
        maxBytes=config['logFileBytes'], 
        backupCount=config['logFileCount'])
handler.setLevel(config['logLevel'])
handler.setFormatter(logging.Formatter(
        "%(asctime)s|%(levelname)s|%(message)s|%(filename)s:%(lineno)d"))

# Add handler
logger.addHandler(handler)

# Test
logger.info('An info message')
logger.debug('A debug message')
for i in range(1, 10):    # Test rotating log files
    logger.error('Error message %d', i)
  1. We keep all the logging parameters in a dictionary, which are usually retrieved from a configuration file.
  2. In the constructor of RotatingFileHandler, the maxBytes sets the log file size-limit; the backupCount appends '.1', '.2', etc to the old log files, such that '.1' is always the newer backup of the log file. Both maxBytes and backupCount default to 0. If either one is zero, roll-over never occurs.
  3. The above example produces 4 log files: test.log, test.log.1 to test.log.3. The file being written to is always test.log. When this file is filled, it is renamed to test.log.1; and if test.log.1 and test.log.2 exist, they will be renamed to test.log.2 and test.log.3 respectively, with the old test.log.3 deleted.
Example: Using an Email Log for CRITICAL Level and Rotating Log Files for INFO Level
import logging
from logging.handlers import RotatingFileHandler, SMTPHandler

# Configuration data in a dictionary
config = {
        'loggername'  : 'myapp',
        'fileLogLevel' : logging.INFO,
        'logFilename'  : 'test.log',
        'logFileBytes' : 300,         # for testing only
        'logFileCount' : 5,
        'emailLogLevel': logging.CRITICAL,
        'smtpServer'   : 'your_smtp_server',
        'email'        : 'myapp@nowhere.com',
        'emailAdmin'   : 'admin@nowhere.com'}

# Create a Logger and set log level
logger = logging.getLogger(config['loggername'])
logger.setLevel(config['fileLogLevel'])  # lowest among all
 
# Create a rotating file handler
fileHandler = RotatingFileHandler(
        config['logFilename'],
        maxBytes=config['logFileBytes'],
        backupCount=config['logFileCount'])
fileHandler.setLevel(config['fileLogLevel'])
fileHandler.setFormatter(logging.Formatter(
        "%(asctime)s|%(levelname)s|%(message)s|%(filename)s:%(lineno)d"))

# Create a email handler
emailHandler = SMTPHandler(
        config['smtpServer'], 
        config['email'], 
        config['emailAdmin'],
        '%s - CRITICAL ERROR' % config['loggername'])
emailHandler.setLevel(config['emailLogLevel'])

# Add handlers
logger.addHandler(fileHandler)
logger.addHandler(emailHandler)

# Test
logger.debug('A debug message')
logger.info('An info message')
logger.warning('A warning message')
logger.error('An error message')
logger.critical('A critical message')
Example: Separating ERROR Log and INFO Log with Different Format
import logging, sys
from logging.handlers import RotatingFileHandler

class MaxLevelFilter(logging.Filter):
    """Custom filter that passes messages with level <= maxlevel"""
    def __init__(self, maxlevel):
        """Constructor takes the max level to pass"""
        self.maxlevel = maxlevel

    def filter(self, record):
        """Return True to pass the record"""
        return (record.levelno <= self.maxlevel)

# INFO and below go to rotating files
file_handler = RotatingFileHandler('test.log', maxBytes=500, backupCount=3)
file_handler.addFilter(MaxLevelFilter(logging.INFO))
file_handler.setFormatter(logging.Formatter(
        "%(asctime)s|%(levelname)s|%(message)s"))

# WARNING and above go to stderr, with all details
err_handler = logging.StreamHandler(sys.stderr)
err_handler.setLevel(logging.WARNING)
err_handler.setFormatter(logging.Formatter(
        "%(asctime)s|%(levelname)s|%(message)s|%(pathname)s:%(lineno)d"))

logger = logging.getLogger("myapp")
logger.setLevel(logging.DEBUG)    # Lowest
logger.addHandler(file_handler)
logger.addHandler(err_handler)

# Test
logger.debug("A DEBUG message")
logger.info("An INFO message")
logger.warning("A WARNING message")
logger.error("An ERROR message")
logger.critical("A CRITICAL message")

ConfigParser (Python 2) or configparser (Python 3) Module

The ConfigParser module implements a basic configuration file parser for .ini.

A .ini file contains key-value pairs organized in sections and looks like:

# This is a comment
[app]
name = my application
version = 0.9.1
authors = ["Peter", "Paul"]
debug = False

[db]
host = localhost
port = 3306

[DEFAULT]
message = hello
  1. A configuration file consists of sections (marked by [section-name] header). A section contains key=value or key:value pairs. The leading and trailing whitespaces are trimmed from the value. Lines beginning with '#' or ';' are comments.

You can use ConfigParser to parse the .ini file, e.g.,

import ConfigParser

cp = ConfigParser.SafeConfigParser()
cp.read('test1.ini')

# Print all contents. Also save into a dictionary
config = {}
for section in cp.sections():
    print("Section [%s]" % section)
    for option in cp.options(section):
        print("|%s|%s|" % (option,
                cp.get(section, option)))          # Print
        config[option] = cp.get(section, option) # Save in dict

print(config)

# List selected contents with type
cp.get('app', 'debug')         # string
cp.getboolean('app', 'debug')
cp.getint('app', 'version')
  • ConfigParser.read(file1, file2,...): read and parse from the list of filenames. It overrides the keys with each successive file, if present.
  • ConfigParser.get(section, name): get the value of name from section.
Interpolation with SafeConfigParser

A value may contain formatting string in the form of %(name)s, which refers to another name in the SAME section, or a special DEFAULT (in uppercase) section. This interpolation feature is, however, supported only in SafeConfigParser. For example, suppose we have the following configuration file called myapp.ini:

[My Section]
msg: %(head)s + %(body)s
body = bbb

[DEFAULT]
head = aaa

The msg will be interpolated as aaa + bbb, interpolated from the SAME section and DEFAULT section.

datetime Module

The datetime module supplies classes for manipulating dates and time in both simple and complex ways.

  • datetime.date.today(): Return the current local date.
>>> import datetime
>>> dir(datetime)
['MAXYEAR', 'MINYEAR', 'date', 'datetime', 'datetime_CAPI', 'time', 'timedelta', 'timezone', 'tzinfo', ...]
>>> dir(datetime.date)
['today', ...]

>>> from datetime import date
>>> today = date.today()
>>> today
datetime.date(2016, 6, 17)
>>> aday = date(2016, 5, 1)  # Construct a datetime.date instance
>>> aday
datetime.date(2016, 5, 1)
>>> diff = today - aday      # Find the difference between 2 date instances
>>> diff
datetime.timedelta(47)
>>> dir(datetime.timedelta)
['days', 'max', 'microseconds', 'min', 'resolution', 'seconds', 'total_seconds', ...]
>>> diff.days
47

smtplib and email Modules

The SMTP (Simple Mail Transfer Protocol) is a protocol, which handles sending email and routing email between mail servers. Python provides a smtplib module, which defines an SMTP client session object that can be used to send email to any Internet machine with an SMTP listener daemon.

To use smtplib:

import smtplib

# Create an SMTP instance
smtpobj = smtplib.SMTP([host [,port [, local_hostname [, timeout]]]])
......
# Send email
smtpobj.sendmail(form_addr, to_addrs, msg)
# Terminate the SMTP session and close the connection
smtpobj.quit()

The email module can be used to construct an email message.

[TODO] more

json Module

JSON (JavaScript Object Notation) is a lightweight data interchange format inspired by JavaScript object literal syntax. The json module provides implementation for JSON encoder and decoder.

  • json.dumps(python_obj): Serialize python_obj to a JSON-encoded string ('s' for string).
  • json.loads(json_str): Create a Python object from the given JSON-encoded string.
  • json.dump(python_obj, file_obj): Serialize python_obj to the file.
  • json.load(file_obj): Create a Python object by reading the given file.

For example,

>>> import json

# Create a JSON-encoded string from a Python object
>>> lst = [123, 4.5, 'hello', True]
>>> json_lst = json.dumps(lst)  # Create a JSON-encoded string
>>> json_lst
'[123, 4.5, "hello", true]'
        # JSON uses double-quote for string 

>>> dct = {'a': 11, 2: 'b', 'c': 'cc'}
>>> json_dct = json.dumps(dct)
>>> json_dct
'{"a": 11, "c": "cc", "2": "b"}'

# Create a Python object from a JSON string
>>> lst_decoded = json.loads(json_lst)
>>> lst_decoded
[123, 4.5, 'hello', True]
>>> dct_decoded = json.loads(json_dct)
>>> dct_decoded
{'a': 11, 'c': 'cc', '2': 'b'}

# Serialize a Python object to a text file
>>> f = open('json.txt', 'w')
>>> json.dump(dct, f)
>>> f.close()

# Construct a Python object via de-serializing from a JSON file
>>> f = open('json.txt', 'r')
>>> dct_decoded_from_file = json.load(f)
>>> dct_decoded_from_file
{'a': 11, 'c': 'cc', '2': 'b'}

# Inspect the JSON file
>>> f.seek(0)  # Rewind
0
>>> f.read()   # Read the entire file
'{"a": 11, "c": "cc", "2": "b"}'
>>> f.close()

pickle and cPickle Modules

The json module (described earlier) handles lists and dictionaries, but serializing arbitrary class instances requires a bit of extra effort. On the other hand, the pickle module implements serialization and de-serialization of any Python object. Pickle is a protocol which allows the serialization of arbitrarily complex Python objects. It is specific to the Python languages and not applicable to other languages.

The pickle module provides the same functions as the json module:

  • pickle.dumps(python_obj): Return the pickled representation of the python_obj as a string.
  • pickle.loads(pickled_str): Construct a Python object from pickled_str.
  • pickle.dump(python_obj, file_obj): Write a pickled representation of the python_obj to file_obj.
  • pickle.load(file_obj): Construct a Python object reading from the file_obj.

The module cPickle is an improved version of pickle.

signal module

Signals (software interrupt) are a limited form of asynchronous inter-process communication, analogous to hardware interrupts. It is generally used by the operating system to notify processes about certain issues/states/errors, like division by zero, etc.

The signal module provides mechanisms to use signal handlers in Python.

signal.signal()

The signal.signal() method takes two arguments: the signal number to handle, and the handling function. For example,

#!/usr/bin/env python3
# -*- coding: UTF-8 -*-
"""test_signal.py"""
import sys, signal, time

def my_signal_handler(signalnum, handler):
    """Custom Signal Handler"""
    print('Signal received %d: %s' % (signalnum, handler));

# Register signal handler for selected signals
signal.signal(signal.SIGINT, my_signal_handler);
signal.signal(signal.SIGUSR1, my_signal_handler);

while(1):
    print("Wait...")
    time.sleep(10)

Run the program in the background (with &) and send signals to the process:

$ ./test_signal.py &
[1] 24078

$ Wait...

$ kill -INT 24078    # Send signal
Signal received 2: <frame object at 0x7f6f59e12050>

$ kill -USR1 24078   # Send signal
Signal received 10: <frame object at 0x7f6f59e12050>

$ kill -9 24078      # Kill the process

Object-Oriented Programming (OOP) in Python

I assume that you are familiar with the OOP concepts, and you know some OO languages such as Java/C++/C#.

Introduction of OOP in Python

A class is a blueprint or template of entities (things) of the same kind. An instance is a particular realization of a class.

Unlike C++/Java, Python supports both class objects and instance objects. In fact, everything in Python is object, including class object.

An object contains attributes: data attributes (or variables) and behaviors (called methods). To access an attribute, use "dot" operator in the form of class_name.attr_name or instance_name.attr_name.

To construct an instance of a class, invoke the constructor in the form of instance_name = class_name(*args).

[TODO] UML class diagram for class and instances.

[TODO] more

Example 1: Getting Started with a Circle class

Let's write a module called circle (to be saved as circle.py), which contains a Circle class. The Circle class shall contain a data attribute radius and a method get_area(), as shown in the following class diagram.

[TODO] class diagram

#!/usr/bin/env python3
# -*- coding: UTF-8 -*-
"""
circle.py: The circle module, which defines a Circle class.
"""
from math import pi

class Circle:    # For Python 2 use: "class Circle(object):"
    """A Circle instance models a circle with a radius"""
    
    def __init__(self, radius=1.0):
        """Constructor with default radius of 1.0"""
        self.radius = radius  # Create an instance variable radius
        
    def __str__(self):
        """Return a descriptive string for this instance, invoked by print() and str()"""
        return 'This is a circle with radius of %.2f' % self.radius

    def __repr__(self):
        """Return a command string that can be used to re-create this instance, invoked by repr()"""
        return 'Circle(radius=%f)' % self.radius
    
    def get_area(self):
        """Return the area of this Circle instance"""
        return self.radius * self.radius * pi
 
# For Testing under Python interpreter
# If this module is run under Python interpreter, __name__ is '__main__'.
# If this module is imported into another module, __name__ is 'circle' (the module name).
if __name__ == '__main__':
    c1 = Circle(2.1)      # Construct an instance
    print(c1)             # Invoke __str__()
    print(c1.get_area())
 
    c2 = Circle()         # Default radius
    print(c2)
    print(c2.get_area())  # Invoke member method
 
    c2.color = 'red'  # Create a new attribute for this instance via assignment
    print(c2.color)
    #print(c1.color)  # Error - c1 has no attribute color

    # Test doc-strings
    print(__doc__)                  # This module
    print(Circle.__doc__)           # Circle class
    print(Circle.get_area.__doc__)  # get_area() method
    
    print(isinstance(c1, Circle)) # True
    print(isinstance(c2, Circle)) # True
    print(isinstance(c1, str))    # False

Run this script, and check the outputs:

This is a circle with radius of 2.10
13.854423602330987
This is a circle with radius of 1.00
3.141592653589793
red
circle.py: The circle module, which defines a Circle class.
A Circle instance models a circle with a radius
Return the area of this Circle instance
True
True
False
How it Works
  1. By convention, module names (and package names) are in lowercase (optionally joined with underscore if it improves readability). Class names are initial-capitalized (i.e., CamelCase). Variable and method names are also in lowercase.
    Following the convention, this module is called circle (in lowercase) and is to be saved as "circle.py" (the module name is the filename - there is no explicit way to name a module). The class is called Circle (in CamelCase). It contains a data attribute (instance variable) radius and a method get_area().
  2. We use the "class Circle:" statement to define the Circle class.
    NOTES: In Python 2, you need to write "class Circle(object):" to create a so-called new-style class by inheriting from the default superclass object. Otherwise, it will create a old-style class. The old-style classes should no longer be used. In Python 3, "class Circle:" inherits from object by default.
  3. The first parameter of all the member methods shall be an object called self (e.g., get_area(self), __init__(self, ...)), which binds to this instance (i.e., itself) during invocation.
  4. You can invoke a method via the dot operator, in the form of obj_name.method_name(). However, Python differentiates between instance objects and class objects:
    • For class objects: You can invoke a method via
      class_name.method_name(instance_name, ...)
      where an instance_name is passed into the method as the argument 'self'.
    • For instance objects: Python converts an instance method call from:
      instance_name.method_name(...)
      to
      class_name.method_name(instance_name, ...)
      where the instance_name is passed into the method as the argument 'self'.
    I will elaborate on this later.
  5. You can construct an instance of a class by invoking its constructor, in the form of class_name(...), e.g.,
    c1 = Circle(1.2)
    c2 = Circle()     # radius default
    Python first creates a plain Circle object. It then invokes the Circle's __init__(self, radius) with self bound to the newly created instance, as follows:
    Circle.__init__(c1, 1.2)
    Circle.__init__(c2)       # radius default
    Inside the __init__() method, the self.radius = radius creates and attaches an instance variable radius under the instances c1 and c2.
    Take note that:
    • The __init__() is not really the constructor, but an initializer to create the instance variables.
    • __init__() shall never return a value.
    • __init__() is optional and can be omitted if there is no instance variables.
  6. Once instance c1 was created, invocation of instance method c1.get_area() is translated to Circle.getArea(c1) where self is bound to c1. Within the method, self.radius is bound to c1.radius, which was created during the initialization.
  7. There is no need to declare instance variables. The variable assignment statements in __init__() create the instance variables.
  8. You can dynamically add an attribute after an object is constructed via assignment, as in c2.color='red'. This is unlike other OOP languages like Java.
  9. You can place doc-string for module, class, and method immediately after their declaration, which can be retrieved via attribute __doc__. Doc-strings are strongly recommended for proper documentation.
  10. There is no "private" access control. All attributes are "public" and visible to all.
  11. [TODO] unit-test and doc-test
  12. [TODO] more
Inspecting the Objects

Run the script under the Python Interactive Shell:

$ cd /path/to/module-directory
$ python3
>>> exec(open('circle.py').read())
......

>>> dir()        # Return the list of names in the current local scope
['Circle', '__builtins__', '__doc__', '__file__', '__loader__', '__name__',
 '__package__', '__spec__', 'c1', 'c2', 'pi']
>>> __name__
'__main__'

# Inspect instance object c1
>>> dir(c1)      # List all attributes including built-ins
['__class__', '__dict__', '__doc__', '__init__', '__str__', 'get_area', 'radius', ...]
>>> vars(c1)     # Return a dictionary of instance variables kept in __dict__
{'radius': 2.1}
>>> c1.__dict__  # Same as vars(c1)
{'radius': 2.1}
>>> c1.__class__
<class '__main__.Circle'>
>>> type(c1)     # Same as c1.__class__
<class '__main__.Circle'>
>>> c1.__doc__
'A Circle instance models a circle with a radius'
>>> c1.__module__
'__main__'
>>> c1.__init__
<bound method Circle.__init__ of Circle(radius=2.100000)>
>>> c1.__str__
<bound method Circle.__str__ of Circle(radius=2.100000)>
>>> c1.__str__()  # or str(c1), or print(c1)
'This is a circle with radius of 2.10'
>>> c1.__repr__()  # or repr(c1)
'Circle(radius=2.100000)'
>>> c1  # same as c1.__repr__()
Circle(radius=2.100000)
>>> c1.radius
2.1
>>> c1.get_area
<bound method Circle.get_area of Circle(radius=2.100000)>
>>> c1.get_area()  # Same as Circle.get_area(c1)
13.854423602330987

# Inspect instance object c2
>>> dir(c2)
['color', 'get_area', 'radius', ...]
>>> type(c2)  # or c2.__class__
<class '__main__.Circle'>
>>> vars(c2)  # or c2.__dict__ 
{'radius': 1.0, 'color': 'red'}
>>> c2.radius
1.0
>>> c2.color
'red'
>>> c2.__init__
<bound method Circle.__init__ of Circle(radius=1.000000)>


# Inspect the class object Circle
>>> dir(Circle)   # List all attributes for Circle object
['__class__', '__dict__', '__doc__', '__init__', '__str__', 'get_area', ...]
>>> help(Circle)  # Show documentation
......
>>> Circle.__class__
<class 'type'>
>>> Circle.__dict__   # or vars(Circle)
mappingproxy({'__init__': ..., 'get_area': ..., '__str__': ..., '__dict__': ...,
 '__doc__': 'A Circle instance models a circle with a radius', '__module__': 
 '__main__'})
>>> Circle.__doc__
'A Circle instance models a circle with a radius'
>>> Circle.__init__
<function Circle.__init__ at 0x7fb325e0cbf8>
>>> Circle.__str__
<function Circle.__str__ at 0x7fb31f3ee268>
>>> Circle.__str__(c1)  # Same as c1.__str__() or str(c1) or print(c1)
'This is a circle with radius of 2.10'
>>> Circle.get_area
<function Circle.get_area at 0x7fb31f3ee2f0>
>>> Circle.get_area(c1)  # Same as c1.get_area()
13.854423602330987
Class Objects vs Instance Objects

As illustrated in the above example, there are two kinds of objects in Python's OOP model: class objects and instance objects, which is quite different from other OOP languages.

Class objects provide default behavior and serve as factories for generating instance objects. Instance objects are the real objects created by your application. An instance object has its own namespace. It copies all the names from the class object from which it was created.

The class statement creates a class object of the given class name. Within the class definition, you can create class variables via assignment statements, which are shared by all the instances. You can also define methods, via the defs, to be shared by all the instances.

When an instance is created, a new namespace is created, which is initially empty. It clones the class object and attaches all the class attributes. The __init__() is then invoked to create (initialize) instance variables, which are only available to this particular instance.

[TODO] more

__str__() vs. __repr__()

The built-in functions print(obj) and str(obj) invoke obj.__str__() implicitly. If __str__() is not defined, they invoke obj.__repr__().

The built-in function repr(obj) invokes obj.__repr__() if defined; otherwise obj.__str__().

When you inspect an object obj (e.g., c1) under the interactive prompt, Python invokes obj.__repr__(). The default (inherited) __repr__() returns the obj's address.

The __str__() is used for printing an "informal" descriptive string of this object. The __repr__() is used to present an "official" (or canonical) string representation of this object, which should look like a valid Python expression that could be used to recreate the object (i.e., eval(repr(obj)) == obj). In our Circle class, repr(c1) returns 'Circle(radius=2.100000)'. You can use "c1 = Circle(radius=2.100000)" to re-create instance c1.

We often redirect __str__() to __repr__() (to produce the same command string) as follows:

Importing the circle module

When you use "import circle", a namespace for circle is created under the current scope. You need to reference the Circle class as circle.Circle.

$ cd /path/to/module-directory
$ python3
>>> import circle  # circle module
>>> dir()          # Current local scope
['__builtins__', '__doc__', '__loader__', '__name__', '__package__', '__spec__', 'circle']
>>> dir(circle)    # The circle module
['Circle', '__builtins__', '__doc__', '__name__', 'pi', ...]
>>> dir(circle.Circle)  # Circle class
['__class__', '__doc__', '__init__', '__str__', 'get_area', ...]
>>> __name__            # of  current scope
'__main__'
>>> circle.__name__     # of  circle module
'circle'
>>> c1 = circle.Circle(1.2)
>>> dir(c1)
['__class__', '__doc__', '__str__', 'get_area', 'radius', ...]
>>> vars(c1)
{'radius': 1.2}
Importing the Circle class of the circle module

When you import the Circle class via "from circle import Circle", the Circle class is added to the current scope, and you can reference the Circle class directly.

>>> from circle import Circle
>>> dir()
['Circle', '__builtins__', '__doc__', '__loader__', '__name__', '__package__', '__spec__']
>>> c1 = Circle(3.4)
>>> vars(c1)
{'radius': 3.4}

Class Definition Syntax

The syntax is:

class class_name(superclass1, ...):
    """Class doc-string"""
   
    class_var1 = value1  # Class variables
    ......
   
    def __init__(self, arg1, ...):
        """Constructor"""
        self.instance_var1 = arg1  # Attach instance variables by assignment
        ......
      
    def __str__(self):
        """For printf() and str()"""
        ......
      
    def __repr__(self):
        """For repr() and interactive prompt"""
        ......
      
    def method_name(self, *args, **kwargs):
        """Method doc-string"""
        ......

Example 2: The Point class and Operator Overloading

In this example, we shall define a Point class, which models a 2D point with x and y coordinates. We shall also overload the operators '+' and '*' by overriding the so-called magic methods __add__() and __mul__().

[TODO] Class diagram

#!/usr/bin/env python3
# -*- coding: UTF-8 -*-
"""
point.py: The point module, which defines the Point class
"""

class Point:    # In Python 2, use: class Point(object):
    """A Point instance models a 2D point with x and y coordinates"""

    def __init__(self, x=0, y=0):
        """Constructor, which creates the instance variables x and y with default of (0,0)"""
        self.x = x
        self.y = y

    def __str__(self):
        """Return a descriptive string for this instance"""
        return '(%.2f, %.2f)' % (self.x, self.y)
        
    def __repr__(self):
        """Return a command string to re-create this instance"""
        return 'Point(x=%f, y=%f)' % (self.x, self.y)
        
    def __add__(self, right):
        """Override the '+' operator: create and return a new instance"""
        p = Point(self.x + right.x, self.y + right.y)
        return p

    def __mul__(self, factor):
        """Override the '*' operator: modify and return this instance"""
        self.x *= factor
        self.y *= factor
        return self

# Test
if __name__ == '__main__':
    p1 = Point()
    print(p1)      # (0.00, 0.00)
    p1.x = 5
    p1.y = 6
    print(p1)      # (5.00, 6.00)
    p2 = Point(3, 4)
    print(p2)      # (3.00, 4.00)
    print(p1 + p2) # (8.00, 10.00) Same as p1.__add__(p2)
    print(p1)      # (5.00, 6.00) No change
    print(p2 * 3)  # (9.00, 12.00) Same as p1.__mul__(p2)
    print(p2)      # (9.00, 12.00) Changed
How it Works
  1. Python supports operator overloading (like C++ but unlike Java). You can overload '+', '-', '*', '/', '//' and '%' by overriding member methods __add__(), __sub__(), __mul__(), __truediv__(), __floordiv__() and __mod__(), respectively. You can overload other operators too (to be discussed later).
  2. In this example, the __add__() returns a new instance; whereas the __mul__() multiplies into this instance and returns this instance, for academic purpose.

The getattr(), setattr(), hasattr() and delattr() built-in functions

You can access an object's attribute via the dot operator by hard-coding the attribute name, provided you know the attribute name in compile time. For example, you can read, write and delete an attribute via "obj_name.attr_name", "obj_name.attr_name = value" and "del obj_name.attr_name", respectively.

Alternatively, you can use built-in functions like getattr() by using a variable to hold the attribute name, which will be bound during runtime.

  • getattr(obj_name, attr_name[, default]) -> value: returns the value of the attr_name of the obj_name, equivalent to "obj_name.attr_name". If the attr_name does not exist, it returns the default if present; otherwise, it raises AttributeError.
  • setattr(obj_name, attr_name, attr_value): sets the attribute, equivalent to "obj_name.attr_name = value".
  • hasattr(obj_name, attr_name) -> bool: returns True if the "obj_name contains the atr_name".
  • delattr(obj_name, attr_name): deletes the named attribute, equivalent to "del obj_name.attr_name".

For example:

class MyClass:
    """This class contains an instance variable called var"""
    def __init__(self, var):
        self.var = var

i = MyClass(8)
print(i.var)              # 8
print(getattr(i, 'var'))  # 8
print(getattr(i, 'no_var', 'default'))  # default
attr_name = 'var'
print(getattr(i, attr_name))  # Using a variable

setattr(i, 'var', 9)      # Same as i.var = 9
print(getattr(i, 'var'))  # 9

print(hasattr(i, 'var'))  # True
delattr(i, 'var')
print(hasattr(i, 'var'))  # False

Class Variable vs. Instance Variables

Class variables are shared by all instances, whereas instance variables are specific to that particular instance.

class MyClass:
    count = 0  # Total number of instances
               # Class variable shared by all the instances

    def __init__(self):
        # Update class variable
        self.__class__.count += 1   # Increment count
                                    # or MyClass.count += 1
        # Create instance variable - An 'id' of the instance in running numbers
        self.id = self.__class__.count

    def get_id(self):
        return self.id

    def get_count(self):
        return self.__class__.count

if __name__ == '__main__':
    print(MyClass.count)         # 0
    
    obj1 = MyClass()
    print(MyClass.count)         # 1
    print(obj1.get_id())         # 1
    print(obj1.get_count())      # 1
    print(obj1.__class__.count)  # 1
    
    obj2 = MyClass()
    print(MyClass.count)         # 2
    print(obj1.get_id())         # 1
    print(obj1.get_count())      # 2
    print(obj1.__class__.count)  # 2
    print(obj2.get_id())         # 2
    print(obj2.get_count())      # 2
    print(obj2.__class__.count)  # 2

[TODO]

Private Variables?

Python does not support access control. In other words, all attributes are "public" and are accessible by ALL. There is no "private" attributes.

However, by convention:

  • Names begin with an underscore (_) are meant for internal use, and are not recommended to be accessed outside the class definition.
  • Names begin with double underscores (__) and not end with double underscores are further hidden from direct access through name mangling.
  • Names begin and end with double underscores (such as __init__, __add__, __str__) are special magic methods (to be discussed later).

For example,

class MyClass:
    def __init__(self):
        self.var = 1       # public
        self._var = 2      # meant for internal use (private) - 'Please' don't access directly
        self.__var = 3     # name mangling
        self.__var_ = 4    # name mangling
        self.__var__ = 5   # magic attribute

    def print(self):
        # All variables can be used within the class definition
        print(self.var)
        print(self._var)
        print(self.__var)
        print(self.__var_)
        print(self.__var__)

if __name__ == '__main__':
    obj1 = MyClass()
    print(obj1.var)
    print(obj1._var)
    # Variables beginning with __ are not accessible outside the class
    # except those ending with __
    #print(obj1.__var)    # AttributeError
    #print(obj1.__var_)   # AttributeError
    print(obj1.__var__)

    obj1.print()

    print(dir(obj1))
    # ['_MyClass__var', '_MyClass__var_', '__var__', '_var', ...]
    # Variables beginning with __ are renamed by prepending with an underscore and classname (i.e., name mangling)

Class Method, Instance Method and Static Method

Class Method (Decorator @classmethod)

A class method belongs to the class and is a function of the class. It is declared with the @classmethod decorator. It accepts the class as its first argument. For example,

>>> class MyClass:
        @classmethod
        def hello(cls):
           print('Hello from', cls.__name__)
 
>>> MyClass.hello()
Hello from MyClass

% Can be invoked via an instance too
>>> obj1 = MyClass()
>>> obj1.hello()
Instance Method

Instance methods are the most common type of method. An instance method is invoked by an instance object (and not a class object). It takes the instance (self) as its first argument. For example,

>>> class MyClass:
        def hello(self):
            print('Hello from', self.__class__.__name__)
 
>>> obj1 = MyClass()
>>> obj1.hello()
Hello from MyClass
>>> MyClass.hello()  # Cannot invoke via a class object
TypeError: hello() missing 1 required positional argument: 'self'

>>> MyClass.hello(obj1)  # Can explicitly pass an instance object
Hello from MyClass
Static Method (Decorator @classmethod)

A static method is declared with a @staticmethod decorator. It "doesn't know its class" and is attached to the class for convenience. It does not depends on the state of the object itself and could be a separate function a module. A static method can be invoked via a class object or instance object. For example,

>>> class MyClass:
        @staticmethod
        def hello():
            print('Hello, world')
 
>>> obj1 = MyClass()
>>> obj1.hello()
Hello, world
>>> MyClass.hello()
Hello, world

Example 3: Getter and Setter

In this example, we shall rewrite the Circle class to access the instance variable via the getter and setter. We shall rename the instance variable to _radius (meant for internal use only or private), with "public" getter get_radius() and setter set_radius(), as follows:

#!/usr/bin/env python3
# -*- coding: UTF-8 -*-
"""circle.py: The circle module, which defines the Circle class"""
from math import pi

class Circle:
    """A Circle instance models a circle with a radius"""

    def __init__(self, _radius=1.0):
        """Constructor with default radius of 1.0"""
        self.set_radius(_radius)   # Call setter

    def set_radius(self, _radius):
        """Setter for instance variable radius with input validation"""
        if _radius < 0:
            raise ValueError("Radius shall be non-negative")
        self._radius = _radius

    def get_radius(self):
        """Getter for instance variable radius"""
        return self._radius

    def get_area(self):
        """Return the area of this Circle instance"""
        return self.get_radius() * self.get_radius() * pi  # Call getter

    def __repr__(self):
        """Return a command string to recreate this instance"""
        # Used by str() too as __str__() is not defined
        return 'Circle(radius=%f)' % self.get_radius()  # Call getter
        
if __name__ == '__main__':
    c1 = Circle(1.2)        # Constructor
    print(c1)               # Invoke __repr__(). Output: Circle(radius=1.200000)
    print(vars(c1))         # Output: {'_radius': 1.2}
    print(c1.get_area())    # Output: 4.52389342117
    print(c1.get_radius())  # Run Getter. Output: 1.2
    c1.set_radius(3.4)      # Test Setter
    print(c1)               # Output: Circle(radius=3.400000)
    c1._radius = 5.6        # Access instance variable directly (not recommended but permitted)
    print(c1)               # Output: Circle(radius=5.600000)
 
    c2 = Circle()      # Default radius
    print(c2)          # Output: Circle(radius=1.000000)
 
    c3 = Circle(-5.6)  # ValueError: Radius shall be non-negative
How it Works
  1. While there is no concept of "private" attributes in Python, we could still rewrite our Circle class with "public" getter/setter, as in the above example. This is often done because the getter and setter need to carry out certain processing, such as data conversion in getter, or input validation in setter.
  2. We renamed the instance variable _radius, with a leading underscore to denote it "private" (but it is still accessible to all). According to Python naming convention, names beginning with a underscore are to be treated as "private", i.e., it shall not be used outside the class. We named our "public" getter and setter get_radius() and set_radius(), respectively.
  3. In the constructor, we invoke the setter to set the instance variable, instead of assign directly, as the setter performs input validation. Similarly, we use the getter in get_area() and __repr__().

Example 4: Creating a property object via the property() Built-in Function

Add the following into the Circle class in the previous example:

class Circle:
    ......
    # Add a new property object called radius, given its getter and setter
    # Place this line after get_radius() and set_radius()
    radius = property(get_radius, set_radius)

This creates a new property object (instance variable) called radius, with the given getter/setter (which operates on the existing instance variable _radius). Recall that we have renamed our instance variable to _radius, so they do not crash.

You can now use this new property radius, just like an ordinary instance variable, e.g.,

c1 = Circle(1.2)

# Access (read/write) the new property radius directly
print(c1.radius)    # Run get_radius() to read _radius. Output: 1.2
c1.radius = 3.4     # Run set_radius(), which sets _radius
print(c1.radius)    # Run get_radius() to read _radius. Output: 3.4
print(vars(c1))     # Output: {'_radius': 3.4}
print(dir(c1))      # Output: ['_radius', 'get_radius', 'radius', 'set_radius', ...]

# The existing instance variable _radius, getter and setter are still available
c1._radius = 5.6
print(c1._radius)       # Output: 5.6
c1.set_radius(7.8)
print(c1.get_radius())  # Output: 7.8

print(type(c1.radius))   # Output: <class 'float'>
print(type(c1._radius))  # Output: <class 'float'>
print(type(Circle.radius))   # Output: <class 'property'>
print(type(Circle._radius))  # AttributeError: type object 'Circle' has no attribute '_radius'

The built-in function property() has the following signature:

property(fn_get=None, fn_set=None, fn_del=None, doc=None)

You can specify a delete function, as well as a doc-string. For example,

class Circle:
    ......
   
    def del_radius(self):
        del self._radius
      
    radius = property(get_radius, set_radius, del_radius, "Radius of this circle")
More on property object

[TODO]

Creating a property via the @property Decorator

In the above example, the statement:

radius = property(get_radius, set_radius, del_radius)

is equivalent to:

# Create an empty property, getter, setter and deleter set to None
radius = property()
# Assign getter, setter and deleter functions
radius.getter(self.get_radius)
radius.setter(self.set_radius)
radius.deleter(self.del_radius)

These can be implemented via decorators @property, @varname.setter and @varname.deleter, respectively. For example,

#!/usr/bin/env python3
# -*- coding: UTF-8 -*-
"""circle.py: The circle module, which defines the Circle class"""
from math import pi

class Circle:   # @property does not work without subclassing object
    """A Circle instance models a circle with a radius"""

    def __init__(self, radius=1.0):
        """Constructor with default radius of 1.0"""
        self.radius = radius   # Call decorated setter

    @property
    def radius(self):
        """Radius of this circle"""  # doc-string here
        # Define getter here
        return self._radius  # Read the hidden instance variable _radius
    # Equivalent to:
    # def get_radius(self):
    #    return self._radius
    # radius = property(get_radius)   # Define a property with getter

    @radius.setter
    def radius(self, radius):
        """Setter for instance variable radius with input validation"""
        if radius < 0:
           raise ValueError("Radius shall be non-negative")
        self._radius = radius  # Set a hidden instance variable _radius

    @radius.deleter
    def radius(self):
        """Deleter for instance variable radius"""
        del self._radius  # Delete the hidden instance variable _radius

    def get_area(self):
        """Return the area of this Circle instance"""
        return self.radius * self.radius * pi  # Call decorated getter

    def __repr__(self):
        """Self description for this Circle instance, used by print(), str() and repr()"""
        return 'Circle(radius=%f)' % self.radius  # Call decorated getter
 
if __name__ == '__main__':
    c1 = Circle(1.2)
    print(c1)             # Output: Circle(radius=1.200000)
    print(vars(c1))       # Output: {'_radius': 1.2}
    print(dir(c1))        # Output: ['_radius', 'radius', ...]]
    c1.radius = 3.4       # Setter
    print(c1.radius)      # Getter. Output: 3.4
    print(c1._radius)     # hidden instance variable. Output: 3.4

    c2 = Circle()         # Default radius
    print(c2)             # Output: Circle(radius=1.000000)
 
    c3 = Circle(-5.6)     # ValueError: Radius shall be non-negative
How it Works
  1. We use a hidden instance variable called _radius to store the radius, which is set in the setter, after input validation.
  2. We renamed the getter from get_radius() to radius, and used the decorator @property to decorate the getter.
  3. We also renamed the setter from set_radius() to radius, and use the decorator @radius.setter to decorate the setter.
  4. [TODO] more

Inheritance

Example 1: The Cylinder class as a subclass of Circle class

In this example, we shall define a Cylinder class, as a subclass of Circle. The Cylinder class shall inherit attributes radius and get_area() from the superclass Circle, and add its own attributes height and get_volume().

[TODO] class diagram

#!/usr/bin/env python3
# -*- coding: UTF-8 -*-
"""cylinder.py: The cylinder module, which defines the Cylinder class"""
from circle import Circle  # Using the Circle class in the circle module

class Cylinder(Circle):
    """The Cylinder class is a subclass of Circle"""

    def __init__(self, radius = 1.0, height = 1.0):
        """Constructor"""
        super().__init__(radius)  # Invoke superclass' constructor (Python 3)
            # OR
            # super(Cylinder, self).__init__(radius)   (Python 2)
            # Circle.__init__(self, radius)            Explicit superclass class
        self.height = height

    def __str__(self):
        """Self Description for print()"""
        # If __str__ is missing in the subclass, print() will invoke the superclass version!
        return 'Cylinder(radius=%.2f,height=%.2f)' % (self.radius, self.height)

    def get_volume(self):
        """Return the volume of the cylinder"""
        return self.get_area() * self.height  # Inherited get_area()
 
# For testing
if __name__ == '__main__':
    cy1 = Cylinder(1.1, 2.2)  # Output: Cylinder(radius=1.10,height=2.20)
    print(cy1)
    print(cy1.get_area())     # Use inherited superclass' method
    print(cy1.get_volume())   # Invoke its method
 
    cy2 = Cylinder()          # Default radius and height
    print(cy2)                # Output: Cylinder(radius=1.00,height=1.00)
    print(cy2.get_area())
    print(cy2.get_volume())

    print(dir(cy1))
        # ['get_area', 'get_volume', 'height', 'radius', ...]
    print(Cylinder.get_area)
        # <function Circle.get_area at 0x7f490436b378>
        # Inherited from the superclass
    print(Circle.get_area)
        # <function Circle.get_area at 0x7f490436b378>

    c1 = Circle(3.3)
    print(c1)    # Output: This is a circle with radius of 3.30
        
    print(issubclass(Cylinder, Circle))  # True
    print(issubclass(Circle, Cylinder))  # False
    print(isinstance(cy1, Cylinder))     # True
    print(isinstance(cy1, Circle))       # True (A subclass object is also a superclass object)
    print(isinstance(c1, Circle))        # True
    print(isinstance(c1, Cylinder))      # False (A superclass object is NOT a subclass object)
    print(Cylinder.__base__)             # Show superclass: <class 'circle.Circle'>
    print(Circle.__subclasses__())       # Show a list of subclasses: [<class '__main__.Cylinder'>]
How it works?
  1. When you construct a new instance of Cylinder via:
    cy1 = Cylinder(1.1, 2.2)
    Python first creates a plain Cylinder object and invokes the Cylinder's __init__() with self binds to the newly created cy1, as follows:
    Cylinder.__init__(cy1, 1.1, 2.2)
    Inside the __init__(), the super().__init__(radius) invokes the superclass' __init__(). (You can also explicitly call Circle.__init__(self, radius) but you need to hardcode the superclass' name.) This creates a superclass instance with radius. The next statement self.height = height creates the instance variable height for cy1.
    Take note that Python does not call the superclass' constructor automatically (unlike Java/C++).
super()

There are two ways to invoke a superclass method:

  1. via explicit classname: e.g.,
    Circle.__init__(self)
    Circle.get_area(self)
    
  2. via super(): e.g.,
    super().__init__(radius)                # Python 3
    super(Cylinder, self).__init__(radius)  # Python 2: super(this_class_name, self)
    
    super().get_area()                # Python 3
    super(Cylinder, self).get_area()  # Python 2

You can avoid hard-coding the superclass' name with super(). This is recommended, especially in multiple inheritance as it can resolve some conflicts (to be discussed later).

The super() method returns a proxy object that delegates method calls to a parent or sibling class. This is useful for accessing inherited methods that have been overridden in a class.

Example 2: Method Overriding

In this example, we shall override the get_area() method to return the surface area of the cylinder. We also rewrite the __str__() method, which also overrides the inherited method. We need to rewrite the get_volume() to use the superclass' get_area(), instead of this class.

[TODO] Class diagram

#!/usr/bin/env python3
# -*- coding: UTF-8 -*-
"""cylinder.py: The cylinder module, which defines the Cylinder class"""
from math import pi
from circle import Circle  # Using the Circle class in the circle module

class Cylinder(Circle):
    """The Cylinder class is a subclass of Circle"""

    def __init__(self, radius = 1.0, height = 1.0):
        """Constructor"""
        super().__init__(radius)  # Invoke superclass' constructor
        self.height = height

    def __str__(self):
        """Self Description for print()"""
        return 'Cylinder(%s, height=%.2f)' % (super().__repr__(), self.height)
                # Use superclass' __str__()

    # Override
    def get_area(self):   
        """Return the surface area the cylinder"""
        return 2.0 * pi * self.radius * self.height

    def get_volume(self):
        """Return the volume of the cylinder"""
        return super().get_area() * self.height  # Use superclass' get_area()
 
# For testing
if __name__ == '__main__':
    cy1 = Cylinder(1.1, 2.2)
    print(cy1)              # Output: Cylinder(Circle(radius=1.100000), height=2.20)
    print(cy1.get_area())   # Invoke overridden version
    print(cy1.get_volume()) # Invoke its method
 
    cy2 = Cylinder()        # Default radius and height
    print(cy2)              # Output: Cylinder(Circle(radius=1.000000), height=1.00)
    print(cy2.get_area())
    print(cy2.get_volume())

    print(dir(cy1))
        # ['get_area', 'get_volume', 'height', 'radius', ...]
    print(Cylinder.get_area)
        # <function Cylinder.get_area at 0x7f505f464488>
    print(Circle.get_area)
        # <function Circle.get_area at 0x7f490436b378>
  1. In Python, the overridden version replaces the inherited version, as shown in the above function references.
  2. To access superclass' version of a method, use:
    • For Python 3: super().method_name(*args), e.g., super().get_area()
    • For Python 2: super(this_class, self).method_name(*args), e.g., super(Cylinder, self).get_area()
    • Explicitly via the class name: superclass.method-name(self, *args), e.g., Circle.get_area(self)
Example 3: Shape and its subclasses

[TODO] Class diagram

#!/usr/bin/env python3
# -*- coding: UTF-8 -*-
"""sh.py: The sh module. It contains a superclass Shape and 3 subclasses Circle, Rectangle and Square"""
from math import pi

class Shape:    # Python 2: class Shape(object):
    """The superclass Shape with a color"""
    def __init__(self, color = 'red'):  # Constructor
        self.color = color
    def __str__(self):  # For print() and str()
        return 'Shape(color=%s)' % self.color

class Circle(Shape):
    """The Circle class: a subclass of Shape with a radius"""
    def __init__(self, radius = 1.0, color = 'red'):  # Constructor
        super().__init__(color)
        self.radius = radius
    def __str__(self):  # For print() and str()
        return 'Circle(%s, radius=%.2f)' % (super().__str__(), self.radius)
    def get_area(self):
        return self.radius * self.radius * pi

class Rectangle(Shape):
    """The Rectangle class: a subclass of Shape wit a length and width"""
    def __init__(self, length = 1.0, width = 1.0, color = 'red'):  # Constructor
        super().__init__(color)
        self.length = length
        self.width = width
    def __str__(self):  # For print() and str()
        return 'Rectangle(%s, length=%.2f, width=%.2f)' % (super().__str__(), self.length, self.width)
    def get_area(self):
        return self.length * self.width

class Square(Rectangle):
    """The Square class: a subclass of Rectangle having the same length and width"""
    def __init__(self, side = 1.0, color = 'red'):  # Constructor
        super().__init__(side, side, color)
    def __str__(self):  # For print() and str()
        return 'Square(%s)' % super().__str__()

# For Testing
if __name__ == '__main__':
    s1 = Shape('orange')
    print(s1)                # Shape(color=orange)
    print(s1.color)
 
    c1 = Circle(1.2, 'orange')
    print(c1)                # Circle(Shape(color=orange), radius=1.20)
    print(c1.get_area())
 
    r1 = Rectangle(1.2, 3.4, 'orange')
    print(r1)                # Rectangle(Shape(color=orange), length=1.20, width=3.40)
    print(r1.get_area())
 
    sq1 = Square(5.6, 'orange')
    print(sq1)               # Square(Rectangle(Shape(color=orange), length=5.60, width=5.60))
    print(sq1.get_area())

Multiple Inheritance

Python supports multiple inheritance, which is defined in the form of "class class_name(base_class_1, base_class_2,...):...".

Mixin Pattern

The simplest and most useful pattern of multiple inheritance is called mixin. A mixin is a superclass that is not meant to exist on its own, but meant to be inherited by some sub-classes to provide extra functionality.

[TODO]

Diamond Problem

Suppose that two classes B and C inherit from a superclass A, and D inherits from both B and C. If A has a method called m(), and m() is overridden by B and/or C, then which version of m() is inherited by D?

[TODO] class diagram

Let's look at Python's implementation.

Example 1

class A:
    def m(self):
        print('in Class A')

class B(A):
    def m(self):
        print('in Class B')
    
class C(A):
    def m(self):
        print('in Class C')

# Inherits from B, then C. It does not override m()
class D1(B,C):  
    pass

# Different order of subclass list
class D2(C,B):
    pass

# Override m()
class D3(B,C):
    def m(self):
        print('in Class D3')
if __name__ == '__main__': x = D1() x.m() # 'in Class B' (first in subclass list) x = D2() x.m() # 'in Class C' (first in subclass list) x = D3() x.m() # 'in Class D3' (overridden version)

Example 2: Suppose the overridden m() in B and C invoke A's m() explicitly.

class A:
    def m(self):
        print('in Class A')

class B(A):
    def m(self):
        A.m(self)
        print('in Class B')
    
class C(A):
    def m(self):
        A.m(self)
        print('in Class C')

class D(B,C):
    def m(self):
        B.m(self)
        C.m(self)
        print('in Class D')


if __name__ == '__main__':
    x = D()
    x.m()

The output is:

in Class A
in Class B
in Class A
in Class C
in Class D

Take note that A's m() is run twice, which is typically not desired. For example, suppose that m() is the __init__(), then A will be initialized twice.

Example 3: Using super()

class A:
    def m(self):
        print('in Class A')

class B(A):
    def m(self):
        super().m()
        print('in Class B')
    
class C(A):
    def m(self):
        super().m()
        print('in Class C')

class D(B,C):
    def m(self):
        super().m()
        print('in Class D')

if __name__ == '__main__':
    x = D()
    x.m()
in Class A
in Class C
in Class B
in Class D

With super(), A's m() is only run once. This is because super() uses the so-called Method Resolution Order (MRO) to linearize the superclass. Hence, super() is strongly recommended for multiple inheritance, instead of explicit class call.

Example 4: Let's look at __init__().

class A:
    def __init__(self):
        print('init A')
    
class B(A):
    def __init__(self):
        super().__init__()
        print('init B')
    
class C(A):
    def __init__(self):
        super().__init__()
        print('init C')

class D(B,C):
    def __init__(self):
        super().__init__()
        print('init D')

if __name__ == '__main__':
    d = D()
    # init A
    # init C
    # init B
    # init D
    
    c = C()
    # init A
    # init C
    
    b = B()
    # init A
    # init B

Each superclass is initialized exactly once, as desired.

You can check the MRO via the mro() member method:

>>> D.mro()
[<class '__main__.D'>, <class '__main__.B'>, <class '__main__.C'>, <class '__main__.A'>, <class 'object'>]
>>> C.mro()
[<class '__main__.C'>, <class '__main__.A'>, <class 'object'>]
>>> B.mro()
[<class '__main__.B'>, <class '__main__.A'>, <class 'object'>]

Abstract Methods

An abstract method has no implementation, and therefore cannot be called. The subclasses shall overrides the abstract methods inherited and provides their own implementations.

In Python 3, you can use decorator @abstractmethod to mark an abstract method. For example,

@abstractmethod
def method_name(self, ...):
    pass

[TODO] more

Polymorphism

Polymorphism in Programming is the ability to present the same interface for differing underlying forms. For example, a polymorphic function can be applied to arguments of different types, and it behaves differently depending on the type of the arguments to which they are applied.

Python is implicitly polymorphic, as type are associated with objects instead of variable references.

Magic Methods

A magic method is an object's member methods that begins and ends with double underscore, e.g., __init__(), __add__(), __len__().

As an example, we list the magic methods in the int class:

>>> dir(int)
['__abs__', '__add__', '__and__', '__bool__', '__ceil__', ...]

In Python, built-in operators and functions invoke the corresponding magic methods. For example, operator '+' invokes __add__(), built-in function len() invokes __len__(). Even though the magic methods are invoked implicitly via built-in operators and functions, you can also call them explicitly, e.g., 'abc'.__len__() is the same as 'abc'.len().

The following table summarizes the commonly-used magic methods and their invocation.

Magic Method Invoked Via Invocation Syntax
__lt__(self, right)
__gt__(self, right)
__le__(self, right)
__ge__(self, right)
__eq__(self, right)
__ne__(self, right)
Comparison Operators self < right
self > right
self <= right
self >= right
self == right
self != right
__add__(self, right)
__sub__(self, right)
__mul__(self, right)
__truediv__(self, right)
__floordiv__(self, right)
__mod__(self, right)
__pow__(self, right)
Arithmetic Operators self + right
self - right
self * right
self / right
self // right
self % right
self ** right
__and__(self, right)
__or__(self, right)
__xor__(self, right)
__invert__(self)
__lshift__(self, n)
__rshift__(self, n)
Bitwise Operators self & right
self | right
self ^ right
~self
self << n
self >> n
__str__(self)
__repr__(self)
__sizeof__(self)
Function call str(self), print(self)
repr(self)
sizeof(self)
__len__(self)
__contains__(self, item)
__iter__(self)
__next__(self)
__getitem__(self, key)
__setitem__(self, key, value)
__delitem__(self, key)
Sequence Operators & Functions len(self)
item in self
iter(self)
next(self)
self[key]
self[key] = value
del self[key]
__int__(self)
__float__(self)
__bool__(self)
__oct__(self)
__hex__(self)
Type Conversion Function call int(self)
float(self)
bool(self)
oct(self)
hex(self)
__init__(self, *args)
__new__(cls, *args)
Constructor x = ClassName(*args)
__del__(self) Operator del del x
__index__(self) Convert this object to an index x[self]
__radd__(self, left)
__rsub__(self, left)
...
RHS (Reflected) addition, subtraction, etc. left + self
left - self
...
__iadd__(self, right)
__isub__(self, right)
...
In-place addition, subtraction, etc self += right
self -= right
...
__pos__(self)
__neg__(self)
Unary Positive and Negative operators +self
-self
__round__(self)
__floor__(self)
__ceil__(self)
__trunc__(self)
Function Call round(self)
floor(self)
ceil(self)
trunc(self)
__getattr__(self, name)
__setattr__(self, name, value)
__delattr__(self, name)
Object's attributes self.name
self.name = value
del self.name
__call__(self, *args, **kwargs) Callable Object obj(*args, **kwargs);
__enter__(self), __exit__(self) Context Manager with-statement  

Construction and Initialization

When you use x = ClassName(*args) to construct an instance of x, Python first calls __new__(cls, *args) to create an instance, it then invokes __init__(self, *args) (the initializer) to initialize the instance.

Operator Overloading

Python supports operators overloading (like C++) via overriding the corresponding magic functions.

Example: Override the '==' operator for the Circle class

class Circle:
    def __init__(self, radius):
        self.radius = radius

    def __eq__(self, right):
        """Override operator '==' to compare the two radius"""
        if self.__class__.__name__ == right.__class__.__name__:
            return self.radius == right.radius
        raise TypeError("not a 'Circle' object")

if __name__ == '__main__':
    print(Circle(8) == Circle(8))   # True
    print(Circle(8) == Circle(88))  # False
    print(Circle(8) == 'abc')       # TypeError

[TODO] more examples

Iterable and Iterator: iter() and next()

Python iterators are supported by two magic member methods: __iter__(self) and __next__(self).

  • The Iterable object (such as list) shall implement the __iter__(self) member method to return an iterator object. This method can be invoked explicitly via "iterable.__iter__()", or implicitly via "iter(iterable)" or "for item in iterable" loop.
  • The returned iterator object shall implement the __next__(self) method to return the next item, or raise StopIeration if there is no more item. This method can be invoked explicitly via "iterator.__next__()", or implicitly via "next(iterator)" or within the "for item in iterable" loop.
Example

A list is an iterable that supports iterator.

# Using iter() and next() built-in functions
>>> list_itr = iter([11, 22, 33])  # Get an iterator from a list
>>> list_itr
<list_iterator object at 0x7f945e438550>
>>> next(list_itr)
11
>>> next(list_itr)
22
>>> next(list_itr)
33
>>> next(list_itr)  # No more item, raise StopIteration
...... 
StopIteration

# Using __iter__() and __next__() member methods
>>> list_itr2 = [44, 55].__iter__()
>>> list_itr2
<list_iterator object at 0x7f945e4385f8>
>>> list_itr2.__next__()
44
>>> list_itr2.__next__()
55
>>> list_itr2.__next__()
StopIteration

# The "for .. in iterable" loop uses iterator implicitly
>>> for item in [11, 22, 33]: print(item)
Example 2

Let's implement our own iterator. The following RangeDown(min, max) is similar to range(min, max + 1), but counting down. In this example, the iterable and iterator are in the same class.

class RangeDown:
    """Iterator from max down to min (both inclusive)"""

    def __init__(self, min, max):
        self.current = max + 1
        self.min = min

    def __iter__(self):
        return self

    def __next__(self):
        self.current -= 1
        if self.current < self.min:
            raise StopIteration
        else:
            return self.current
          
if __name__ == '__main__':
    # Use iter() and next()
    itr = iter(RangeDown(6, 8))
    print(next(itr))   # 8
    print(next(itr))   # 7
    print(next(itr))   # 6
    #print(next(itr))  # StopIteration

    # Iterate in for-in loop
    for i in RangeDown(6, 8):
        print(i, end=" ")  # 8 7 6
    print()

    # Use __iter__() and __next__()
    itr2 = RangeDown(9, 10).__iter__()
    print(itr2.__next__())  # 10
    print(itr2.__next__())  # 9
    print(itr2.__next__())  # StopIteration
Example

Let's separate the iterable and iterator in two classes.

class RangeDown:
    """Iterable from max down to min (both inclusive)"""

    def __init__(self, min, max):
        self.min = min
        self.max = max

    def __iter__(self):
        return RangeDownIterator(self.min, self.max)

class RangeDownIterator:

    def __init__(self, min, max):
        self.min = min
        self.current = max + 1

    def __next__(self):
        self.current -= 1
        if self.current < self.min:
            raise StopIteration
        else:
            return self.current
    
if __name__ == '__main__':
    itr = iter(RangeDown(6, 8))
    print(next(itr))   # 8
    print(next(itr))   # 7
    print(next(itr))   # 6
    #print(next(itr))  # StopIteration

Generator and yield

A generator function is a function that can produce a sequence of results instead of a single value. A generator function returns a generator iterator object, which is a special type of iterator where you can obtain the next elements via next().

A generator function is like an ordinary function, but instead of using return to return a value and exit, it uses yield to produce a new result. A function which contains yield is automatically a generator function.

Generators are useful to create iterators.

Example: A Simple Generator
>>> def my_simple_generator():
   yield(11)
   yield(22)
   yield(33)
   
>>> g1 = my_simple_generator()
>>> g1
<generator object my_simple_generator at 0x7f945e441990>
>>> next(g1)
11
>>> next(g1)
22
>>> next(g1)
33
>>> next(g1)
......
StopIteration
>>> for item in my_simple_generator(): print(item, end=' ')
11 22 33
Example 2

The following generator function range_down(min, max) implements the count-down version of range(min, max+1).

>>> def range_down(min, max):
   """A generator function contains yield statement and creates a generator iterator object"""
   current = max
   while current >= min:
       yield current  # Produce a result each time it is run
       current -= 1   # Count down

>>> range_down(5, 8)
<generator object range_down at 0x7f5e34fafc18>  # A generator function returns a generator object

# Using the generator in the for-in loop
>>> for i in range_down(5, 8):
   print(i, end=" ")  # 8 7 6 5

# Using iter() and next()
>>> itr = range_down(2, 4)  # or iter(range_down(2, 4))
>>> itr
<generator object range_down at 0x7f230d53a168>
>>> next(itr)
4
>>> next(itr)
3
>>> next(itr)
2
>>> next(itr)
StopIteration

# Using __iter__() and __next__()
>>> itr2 = range_down(5, 6).__iter__()
>>> itr2
<generator object range_down at 0x7f230d53a120>
>>> itr2.__next__()
6
>>> itr2.__next__()
5
>>> itr2.__next__()
StopIteration

Each time the yield statement is run, it produce a new value, and updates the state of the generator iterator object.

Example 3

We can have generators which produces infinite value.

from math import sqrt, ceil

def gen_primes(number):
    """A generator function to generate prime numbers, starting from number"""
    while True:   # No upperbound!
        if is_prime(number):
            yield number
        number += 1


def is_prime(number:int) -> int:
    if number <= 1:
        return False

    factor = 2
    while (factor <= ceil(sqrt(number))):
        if number % factor == 0: return False
        factor += 1

    return True


if __name__ == '__main__':
    g = gen_primes(8)     # From 8
    for i in range(100):  # Generate 100 prime numbers
        print(next(g))
Generator Expression

A generator expression has a similar syntax as a list/dictionary comprehension (for generating a list/dictionary), but surrounded by braces and produce a generator iterator object. (Note: braces are used by tuples, but they are immutable and thus cannot be comprehended.) For example,

>>> a = (x*x for x in range(1,5))
>>> a
<generator object <genexpr> at 0x7f230d53a2d0>
>>> for item in a: print(item, end=' ')
1 4 9 16 
>>> sum(a)  # Applicable to functions that consume iterable
30
>>> b = (x*x for x in range(1, 10) if x*x % 2 == 0)
>>> for item in b: print(item, end=' ')
4 16 36 64 

# Compare with list/dictionary comprehension for generating list/dictionary
>>> lst = [x*x for x in range(1, 10)]
>>> lst
[1, 4, 9, 16, 25, 36, 49, 64, 81]
>>> dct = {x:x*x for x in range(1, 10)}
>>> dct
{1: 1, 2: 4, 3: 9, 4: 16, 5: 25, 6: 36, 7: 49, 8: 64, 9: 81}

Callable: __call__()

In Python, you can call an object to execute some codes, just like calling a function. This is done by providing a __call__() member method. For example,

class MyCallable:

    def __init__(self, value):
        self.value = value

    def __call__(self):
        return 'The value is %s' % self.value

if __name__ == '__main__':
    # Construct an instance
    obj = MyCallable(88)
    # Call the instance, invoke __call__()
    print(obj())  # Output: The value is 88

Context Manager: __enter__() and __exit__()

[TODO]

Unit Testing

Testing is CRTICALLY IMPORTANT in software development. Some people actually advocate "Write Test First (before writing the codes)" (in so called Test-Driven Development (TDD)). You should at least write your tests along side your development.

In python, you can carry out unit testing via built-in modules unittest and doctest.

Python's unittest Module

The unittest module supports all features needed to run unit tests:

  • Test Case: contains a set of test methods, supported by testunit.TestCase class.
  • Test Suite: a collection of test cases, or test suites, or both; supported by testunit.TestSuite class.
  • Test Fixture: Items and preparations needed to run a test; supported via the setup() and tearDown() methods in unittest.TestCase class.
  • Test Runner: run the tests and report the results; supported via unittest.TestRunner, unittest.TestResult, etc.
Example 1: Writing Test Case
#!/usr/bin/env python3
# -*- coding: UTF-8 -*-
"""
unittest_eg1.py: Unit test example
"""
import unittest

# Define a function to be unit-tested
def my_sum(a, b):
    """Return a + b"""
    return a + b

# Define a test case, which consists of a set of test methods.   
class TestMySum(unittest.TestCase):  # subclass of TestCase
    def test_positive_inputs(self):
        result = my_sum(8, 80)
        self.assertEqual(result, 88)

    def test_negative_inputs(self):
        result = my_sum(-9, -90)
        self.assertEqual(result, -99)

    def test_mixed_inputs(self):
        result = my_sum(8, -9)
        self.assertEqual(result, -1)

    def test_zero_inputs(self):
        result = my_sum(0, 0)
        self.assertEqual(result, 0)

# Run the test cases in this module
if __name__ == '__main__':
    unittest.main()

The expected outputs are:

....
----------------------------------------------------------------------
Ran 4 tests in 0.001s
OK
How it Works
  1. You can create a test case by sub-classing unittest.TestCase.
  2. A test case contains test methods. The test method names shall begin with test.
  3. You can use the generic assert statement to compare the test result with the expected result:
    assert test, [msg]
        # if test is True, do nothing; else, raise AssertError with the message
  4. You can also use the assertXxx() methods provided by unittest.TestCase class. These method takes an optional argument msg that holds a message to be display if assertion fails. For examples,
    • assertEqual(a, b, [msg]): a == b
    • assertNotEqual(a, b, [msg]): a != b
    • assertTrue(a, [msg]): bool(x) is True
    • assertFalse(a, [msg]): bool(x) is False
    • assertIsNone(expr, [msg]): x is None
    • assertIsNotNone(expr, [msg]): x is not None
    • assertIn(a, b, [msg]): a in b
    • assertNotIn(a, b, [msg]): a not in b
    • assertIs(obj1, obj2, [msg]): obj1 is obj2
    • assertIsNot(obj1, obj2, [msg]): obj1 is not obj2
    • assertIsInstance(obj, cls, [msg]): isintance(obj, cls)
    • assertIsNotInstance(obj, cls, [msg]): not isintance(obj, cls)
    • assertGreater(a, b, [msg]): a > b
    • assertLess(a, b, [msg]): a < b
    • assertGreaterEqual(a, b, [msg]): a >= b
    • assertLessEqual(a, b, [msg]): a <= b
    • assertAlmostEqual(a, b, [msg]): round(a-b, 7) == 0
    • assertNotAlmostEqual(a, b, [msg]): round(a-b, 7) != 0
    • assertRegex(text, regex, [msg]): regex.search(text)
    • assertNotRegex(text, regex, [msg]): not regex.search(text)
    • assertDictEqual(a, b, [msg]):
    • assertListEqual(a, b, [msg]):
    • assertTupleEqual(a, b, [msg]):
    • assertSetEqual(a, b, [msg]):
    • assertSequenceEqual(a, b, [msg]):
    • assertItemsEqual(a, b, [msg]):
    • assertDictContainsSubset(a, b, [msg]):
    • assertRaises(except, func, *args, **kwargs): func(*args, **kwargs) raises except
    • Many more, see the unittest API documentation.
  5. Test cases and test methods run in alphanumeric order.
Setting up Test Fixture

You can setup your test fixtures, which are available to all the text methods, via setUp(), tearDown(), setUpClass() and tearDownClass() methods. The setUp() and tearDown() will be executed before and after EACH test method; while setUpClass() and tearDownClass() will be executed before and after ALL test methods in this test class.

For example,

"""
ut_template.py: Unit test template 
"""
import unittest 

class MyTestClass(unittest.TestCase):

    # Run before ALL test methods in this class.
    @classmethod
    def setUpClass(cls):
        print('run setUpClass()') 

    # Run after ALL test methods in this class.
    @classmethod
    def tearDownClass(cls):
        print('run tearDownClass()') 

    # Run before EACH test method.
    def setUp(self):
        print('run setUp()') 

    # Run after EACH test method.
    def tearDown(self):
        print('run tearDown()') 

    # A test method
    def test_numbers_equal(self):
        print('run test_numbers_equal()') 
        self.assertEqual(8, 8) 

    # Another test method
    def test_numbers_not_equal(self):
        print('run test_numbers_not_equal()') 
        self.assertNotEqual(8, -8) 

# Run the test cases in this module
if __name__ == '__main__':
    unittest.main()

The expected outputs are:

Finding files... done.
Importing test modules ... done.

run setUpClass()
run setUp()
run test_numbers_equal()
run tearDown()
run setUp()
run test_numbers_not_equal()
run tearDown()
run tearDownClass()
----------------------------------------------------------------------
Ran 2 tests in 0.001s

OK

[TODO] An Example to setup text fixtures for each test method.

Using Test Suite

You can organize your test cases into test suites. For example,

# Define a test suite with selected test methods from test cases
def my_suite1():
    suite = unittest.TestSuite()
    suite.addTest(MyTestCase('test_method1'))  # add a test method
    suite.addTest(MyTestCase('test_method2'))
    return suite
   
# Or, generate test suite for all testXxx() methods of a test case
my_suite2 = unittest.TestLoader().loadTestsFromTestCase(MyTestCase)

# Run test suites
if __name__ == "__main__":
    runner = unittest.TextTestRunner()  # Use a text-based TestRunner
    runner.run(my_suite1())
    runner.run(my_suite2)
Skipping Tests

Use unittest.skip([msg]) decorator to skip one test, e.g.,

@unittest.skip('msg')  # decorator to skip this test
def test_x():
    ......

Invoke instance method skipTest([msg]) inside the test method to skip the test, e.g.,

def test_x():
    self.skipTest()  # skip this test
    ......

You can use decorator unittest.skipIf(condition, [msg]), @unittest.skipUnless(condition, [msg]), for conditional skip.

Fail Test

To fail a test, use instance method fail([msg]).

doctest

Embed the test input/output pairs in the doc-string, and invoke doctest.testmod(). For example,

#!/usr/bin/env python3
# -*- coding: UTF-8 -*-
import doctest

# Define a function to be unit-tested
def my_sum(a, b):
    """
    (number, number) -> number
    Return a + b

    For use by doctest:
    >>> my_sum(8, 80)
    88
    >>> my_sum(-9, -90)
    -99
    >>> my_sum(8, -9)
    -1
    >>> my_sum(0, 0)
    0
    """
    return a + b

if __name__ == '__main__':
    doctest.testmod(verbose=1)

Study the outputs:

Trying:
    my_sum(8, 80)
Expecting:
    88
ok
Trying:
    my_sum(-9, -90)
Expecting:
    -99
ok
Trying:
    my_sum(8, -9)
Expecting:
    -1
ok
Trying:
    my_sum(0, 0)
Expecting:
    0
ok
1 items had no tests:
    __main__
1 items passed all tests:
   4 tests in __main__.my_sum
4 tests in 2 items.
4 passed and 0 failed.
Test passed.

The "(number, number) -> number" is known as type contract, which spells out the expected types of the parameters and return value.

Performance Measurement

You can use modules timeit, profile and pstats for profiling Python program and performance measurements.

[TODO] examples

[TODO] Code coverge via coverage module.

REFERENCES & RESOURCES

  1. The Python's mother site @ www.python.org; "The Python Documentation" @ https://www.python.org/doc/; "The Python Tutorial" @ https://docs.python.org/tutorial/; "The Python Language Reference" @ https://docs.python.org/reference/.
  2. Vernon L. Ceder, "The Quick Python Book", 2nd ed, 2010, Manning (Good starting guide for experience programmers who wish to learning Python).
  3. Mark Lutz, "Learning Python", 5th ed, 2013; "Programming Python", 4th ed, 2011; "Python Pocket Reference", 5th ed, 2014, O'reilly.