Python 101¶
“For the things we have to learn before we can do them, we learn by doing them.”
—Aristotle (around 350 BCE)
Welcome to the Python tutorial of the group “Micromechanical Materials Modelling” of the “Institute of Mechanics and Fluid Dynamics”. The goal of this tutorial is to introduce the basic tools used in our everyday scientific worklife by actually performing such tasks. We will start with a brief introduction to the Unix Shell in which we list the most common commands and what they do.
Contents
Installing Python¶
How to install Python depends a lot on your operating system.
macOS¶
For macOS two popular choices for package managers exist: Homebrew and MacPorts. Both are solid so the choice is up to you, I currently prefer Homebrew.
Homebrew¶
Install the command line tools via
$ xcode-select --install
Agree to the Xcode license by running
$ sudo xcodebuild -license
Install Homebrew.
Install the required formulas by running
$ brew install git python3 hdf5 freetype libpng
MacPorts¶
Install the current version of Xcode on your Mac via the Mac App Store.
Open your terminal and install the Xcode Command Line Utilities and run
$ xcode-select --install
Agree to the Xcode license by running
$ sudo xcodebuild -license
Install MacPorts for your version of macOS.
Install the required ports via
$ sudo port install freetype libpng git hdf5 python35
Ubuntu 16.04 LTS¶
Install the required packages via
$ sudo apt-get install cmake gfortran git libfreetype6-dev libatlas-dev liblapack-dev libhdf5-dev python3-dev python3-venv python3-pip python3-tk
Ubuntu 16.10¶
Install the required packages via
sudo apt-get install python3-venv python3-tk hdfview
Setting up Python 101¶
First we will set up a virtual environment using the venv
module of
Python.
$ python3 -m venv ~/.virtualenvs/python101
The virtual environment can be activated via
$ source ~/.virtualenvs/python101/bin/activate
If everything went fine you should see that your shell prompt now starts with
(python101)
. This means that everything you do with regards to Python now
uses this virtual environment.
To make sure that the Python packages that are required for installing other Python packages are up to date execute the following command:
$ pip install --upgrade pip setuptools
Now we will clone the Python 101 repository using git.
$ git clone https://github.com/ZeeD26/python101.git
We enter the newly created directory
$ cd python101
and install Python 101 along with its dependencies
$ python -m pip install -e .
Attention
The .
after python -m pip install -e
is important!
python -m pip install -e
is not sufficient.
The right tool for the job¶
“Give me six hours to chop down a tree and I will spend the first four sharpening the axe.”
—Abraham Lincoln
Programming may be considered a craft (although you can find some debate on that). If you need to solve a scientific problem that may, e.g., only be solvable numerically, and there is no software at hand to solve this problem, you have to write code that does. And as you expect a “regular” craftsman to know the right tool for his job, you may also want to know your tools for your job as a scientific programmer.
In the following an overview of the basic tools for scientific programmers is given, starting with the Unix shell as a mean to interact with the computer while not relying on a GUI and enabling reproducibility, followed by an introduction to text editors suitable for programming, and finishing with a brief introduction to version control systems.
The Unix shell¶
“The Linux philosophy is ‘Laugh in the face of danger’. Oops. Wrong One. ‘Do it yourself’. Yes, that’s it.”
—Linus Torvalds
We could try to think of a neat way of defining what a Unix shell is, but thankfully Wikipedia comes to the rescue:
A Unix shell is a command-line interpreter or shell that provides a traditional Unix-like command line user interface. Users direct the operation of the computer by entering commands as text for a command line interpreter to execute, or by creating text scripts of one or more such commands.
To put it in simpler terms, instead of pointing and clicking on colorful icons to perform actions like opening a directory or a program, you type commands in a terminal. The most commonly used commands are outlined in the following sections.
Note
If you do not have any prior experience with command line interfaces (CLIs), working with one might seem daunting at first. This is perfectly normal. It takes time to get used to the Unix shell. The key to mastering the Unix shell is to use it. A lot.
pwd
¶
This command is used to show the path to the current working directory. It is usually used to see where you currently are.
$ pwd
ls
¶
This command is used to show the contents of the directory you specify as argument.
$ ls <directory>
If you do not provide anything besides ls
it prints the contents
of the current working directory.
$ ls
A single .
in place of <directory>
also results in the contents of the
current working directory to be printed, as .
is an explicit way of
specifying the current working directory.
For a more verbose version you may use the -l
options:
$ ls -l
Note
If you have some spaces in some of your file names you have to enclose
the whole filename like this: "<filename>"
cd
¶
This command is used to change the directory to the one specified as argument. If no argument is provided you are changing to your home directory.
$ cd <directory>
To get to the parent directory use
$ cd ..
Note
In a lot of cases bash
—the shell you are using—provides rather
intelligent autocompletion. To use this start typing the name of a file
or directory and hit the tab
button. If there is a unique completion
option it is completed automatically. Else hitting tab another time will
give you a list of options that starts with whatever you typed until then.
cat
¶
This command is used to print the contents of the files specified as arguments.
$ cat <file1> <file2> ... <fileN>
cp
¶
This command is used to copy files. For example
$ cp <file1> <file2>
copies <file1>
to <file2>
. If you want to copy a lot of files to
another directory use
$ cp <file1> <file2> ... <fileN> <directory1>/
Copying a whole directory requires you to use the -r
option:
$ cp -r <directory1> <directory2>
mv
¶
This command is used to move files. For example
$ mv <file1> <file2>
essentially renames <file1>
to <file2>
. To move several files into a
directory use
$ mv <file1> <file2> ... <fileN> <directory1>/
As opposed to cp
the mv
command can move whole directories without
using the -r
option:
$ mv <directory1> <directory2>
touch
¶
This command is used to create an empty file. Using
$ touch <file1>
hence results in an empty file with the name <file>
.
Note
If you want to copy something from the Terminal you can not do this via the
key combination Ctrl + C
as this is reserved for cancelling the
running program. Instead use Ctrl + Shift + C
. For pasting you also
have to use Ctrl + Shift + V
.
mkdir
¶
This command is used to create a directory. Using
$ mkdir <directory>
thus creates a directory with the name <directory>
.
rm
¶
This command is used to delete files and directories. Hence
$ rm <file>
deletes <file>
.
Warning
If you delete files or directories on a modern, graphical operating system
the files and directories usually do not get deleted immediately, but are
copied to an intermediate directory that is usually called trash bin
.
This could be considered a safety measure against accidentally deleting
important files. This “safety net” does not exist for the rm
command.
Whatever you delete via rm
is permanently deleted.
grep
¶
If you want to see whether some text is contained within a file you can use
grep
:
$ grep <pattern> <file>
with <pattern>
being the text you are looking for.
find
¶
Find files in a directory hierarchy. This program is rather extensive and can perform complex search operations. In its simplest form it may be used like this:
$ find <path> -name "<pattern>"
It is important to put <pattern>
into quotation marks to make sure the
shell is not expanding some special characters. Special characters can, e.g.,
be the wildcard character *
, which matches everything. Hence, the command
$ find . -name "*.py"
finds all Python files below the current directory.
Summary¶
pwd
- Print the path to the directory you are currently in.
ls <directory>
- List the contents of directory specified by
<directory>
. If you do not specify a directory it defaults to your current directory.cd <directory>
- Change the directory to
<directory>
. If you do not specify a directory you go to your home directory. If you want to go back to your last directory you can usecd -
.cat <file1> <file2> ... <fileN>
- Read the files specified and print their content to the terminal.
cp <file1> <file2>
- Copy the first argument to the second argument. If you want to copy a directory you have to use it with the
-r
option:cp -r <directory1> <directory2>
.mv <file1/directory1> <file2/directory2>
- Move the first argument to the second argument. It is basically like renaming the first argument.
touch <file>
- Create an empty file at
<file>
.mkdir <directory>
- Create a directory at
<directory>
.rm <file1> <file2> ... <fileN>
- Delete the files specified. If you want to delete a directory and its contents you have to use it with the
-r
option:rm -r <file1>
.grep <pattern> <file>
- Search for
<pattern>
in<file>
.find <path> -name "<pattern>"
- Find all files in
<path>
and below whose name is matching<pattern>
.
Exercises¶
- Create an empty file called
my_first_file.txt
- Open the file with your text editor and fill it with something other than
asdf
. Save and close afterwards. - Print the content of the file to the terminal.
- Make a new directory named
my_first_directory
- Copy the file
my_first_file.txt
into this directory. - Remove the old file.
- Print the content of the file
my_first_file.txt
in the directorymy_first_directory
to the terminal. - Print your current working directory.
- Enter the directory
my_first_directory
. - Print your current working directory.
- Enter the parent directory.
- List the content of your current working directory.
- Delete the directory
my_first_directory
. - List the content of your current working directory.
Text editors¶
The features a good text editor should have:
- Unicode support
- A good text editor should support Unicode. This is the de facto standard for cross-platform compatibility of text files. With Unicode support you can be sure that the text file you edited on your Unix machine can be worked with properly on, e.g., a Windows machine.
- syntax highlighting
- In our case, the main objective of using a text editor is to write code. Choosing a text editor that supports syntax highlighting for the programming language we use helps us to comprehend the written code easier by applying different colors and/or weights to the text depending on whether the piece of text is, e.g., a string, a type declaration, or a keyword.
- snippets
- Every programming language has certain structures for building blocks like, e.g., if-conditionals or for-loops. Typing them out each and every time is a waste of time. Good text editors allow the use of so-called snippets.
- hackability
- Each programmer has a different taste with regards to how they want to accomplish certain actions in text editors. Different shortcuts might be required, or a custom layout of the editor. A good text editor allows you to customize it to your needs, not require you to adapt to its way (at least to a certain amount).
- package manager
- To be truly hackable a good text editor supports third-party packages that extend or alter the functionality of the text editor.
To gain more insight about the performance of different text editors take a look at this editor performance comparison.
In the following a brief introduction to the text editors I recommend is given.
Atom¶
Atom is a text editor made by GitHub.
Advantages
- support for different encodings
- highly hackable
- open-source
- easy to learn
Disadvantages
- slower than the other presented editors
- less memory efficient than the other presented editors
- may not be installed on all machines you have to work with
Sublime Text 3¶
Sublime Text is a text editor made by a small team of developers. Proper support for packages is supplied by Package Control.
Advantages
- fast
- relatively memory efficient
- support for different encodings
- highly hackable
- easy to learn
Disadvantages
- closed-source
- may not be installed on all machines you have to work with
- if used for free a pop-up every ten times you save a file
- a license costs $70 at the time of writing
Visual Studio Code¶
Visual Studio Code is a text editor made by Microsoft.
Advantages
- support for different encodings
- highly hackable
- open-source
- easy to learn
Disadvantages
- may not be installed on all machines you have to work with
Vim¶
Vim is a text editor made by the community lead by Bram Moolenaar. Support for packages is built-in as of vim 8, but due to backward compatibility the most popular package manager is vim-plug.
Advantages
- open-source
- fast
- memory efficient
- may very well be installed on all Unix machines you have to work with
Disadvantages
- takes time to get used to
- hard to master
Emacs¶
Emacs is a text editor made by the community lead by the Free Software Foundation.
Advantages
- open-source
- fast
- memory efficient
- may be installed on all Unix machines you have to work with
Disadvantages
- takes time to get used to
- hard to master
Version control systems¶
Even if you have no prior experience with programming, you may already be familiar with some kind of version control, whether on purpose or not. Over the course of your studies it is very likely you have had to write some sort of papers, maybe even while collaborating with others. Does the following comic strip resonate with you?
This gives you a way of keeping track of changes, as you can compare two versions of the same document. Depending on the software you use you either have to compare the changes “manually” by skimming the whole document, or you can get a view highlighting the differences (see, for example, subsec_diff).
Another way of having versions of a file usually comes with cloud file storage services like Dropbox or Google Drive. Those services keep a version history of the synced files available for restoring files you want to revert to older versions or which you deleted accidentally.
But why would we want to keep track of changes to our code? Would the latest working version not suffice at all times?
If we keep track of changes to our code we do not have to be afraid to simply tinker with our code. Maybe we get an awesome idea for a new feature or how to speed up your code. As we hack away we realize that the idea is not really that awesome and we wish to go back to our previous working state. If we are then in the unlucky state that the “undo” action of our text editor is unable to revert all the changes we made, we have to fall back to another solution, e.g., the aforementioned naive version control schemes. If we worked on a copy of the last working version “reverting” is simple—we just delete the fresh copy. Reverting our file back using the version history of a cloud file storage service requires may require us to go to their website and check quite a few versions of this file, as we have not had any way of controlling which state of the file represents a meaningful increment. This leads us to the first feature we would like our version control system to have:
Note
We want to have control over what changes in our document/code form a logical unit that represents a meaningful increment.
More complicated documents or code libraries may be split up into several files to emphasize the structure while separating logical units like chapters or classes, respectively. When using the copy-and-rename scheme we would copy and rename the directory containing all the files. The drawback of this is the redundancy, as not all the files may change within the same increment. When using the version history of cloud file storage services, on the other hand, we do not run into the problem of data redundancy, but we are unable to track which versions of the files belong together.
Note
We want to be able to not only track the changes of a file, but the cohesive changes across multiple files, i.e., we want to track changes of a project instead of files.
These two ways of keeping track of file versions come with a variety of advantages and disadvantages that we are going to address in the following while we introduce git as the right tool for the job with regards to version control systems.
During our coding sessions there will be times when we wish we could take some of the changes to a working script back and start from the last working checkpoint. Or we find a bug that was introduced somewhere down the road. Or we accidentally delete an important file. In the past we may have used the “undo” command in our text editors or the history tracking of cloud file hosting services like Dropbox. But this is not sufficient for proper source code management. Let us work out the advantages of version control systems as another tool in our toolbox over file versioning by, e.g., appending version numbers to the filename—d
Best practices¶
What not to put into version control
- config files with sensitive information, e.g., passwords, private keys
- editor backups
- generated files, e.g., executables, HTML documentation
- OS specific files, e.g.,
Thumbs.db
,.DS_Store
SVN¶
Git¶
One of the most popular version control systems at the time of writing this tutorial is git. It was created by Linus Torvalds for development of the Linux kernel, which is one of the most widely distributed software repositories.
Let us introduce the most important commands by creating a new project: a collection of recipes.
Initializing a git repository—git init
¶
First of all we create a new directory that represents the root of our project via
mkdir recipes
and subsequently we enter it
cd recipes
At this point we can use the command
git init
to initialize the recipes
directory as a new git repository. We should see
the following output in your console
Initialized empty Git repository in <path_before_recipes>/recipes/.git/
If we now view the contents of the recipes
directory via
ls -A
we should see a new directory called .git
, whose content we can inspect
using
ls .git
resulting in the output
branches config description HEAD hooks info objects refs
This directory is where git stores all the information about all versions of the files associated with the project. At this point we should neither modify nor delete any files in this directory, as it may lead to corruption of our repository.
Knowing what’s what—git status
¶
Ignoring files¶
Useful gitignore templates¶
The official github group supplies a nice list of global gitignore templates as well as project specific gitignore templates.
Basics¶
“The canonical, ‘Python is a great first language’, elicited, ‘Python is a great last language!’”
—Noah Spurrier
Now we are going to introduce the basics of the Python programming language. We start with the infamous Hello, World! program and the basic syntax of a Python script.
Attention
Make sure you have your virtual environment activated! If you do not have
(python101)
in front of your command line prompt you need to activate
it using
$ source ~/.virtualenvs/python101/bin/activate
Execute the following command to start the interactive Python interpreter:
$ python
You should see a couple of lines printed in the terminal, with the first line
stating, among the current date and time, the version of Python you use. The
very last line should start with >>>
which indicates the Python prompt. You
can write Python commands there and execute them.
Attention
In the previous chapter The Unix shell the basic commands for the Unix
shell were introduced. Notice how every command was preceeded by a $
character. In this tutorial code blocks that start with a $
sign are
meant to be executed in the Unix shell. If the code block is preceeded by
>>>
it means that it should either be executed in the Python
interpreter or be used in a Python script.
Python as an interactive calculator¶
To get your feet wet with Python you can use the Python interpreter as a calculator. You have the usual mathematical operators at your disposal, like
+ : | addition, |
---|---|
- : | subtraction, |
* : | multiplication, |
/ : | division, |
** : | exponent, |
// : | integer division, and |
% : | modulus. |
If you are not familiar with one of them just give it a try in the Python interpreter—do not only use integers, but also floating point numbers. You can also use brackets as you would use them in mathematical expressions. Try whether Python uses the proper mathematical rules with regards to the order of execution of the operators.
Note
At one point you will enter an “invalid” expression like 1 + * 2
.
Python will then raise a SyntaxError
to tell you that whatever you
typed is not valid Python syntax. In many cases Python will also give you
additional information about the error. There are many more errors you can
encounter, and it is perfectly normal to have errors. The only difference
between a seasoned programmer and a beginner is the time it takes to fix
those errors. The more errors and mistakes you made the better you know how
to solve them.
To leave the Python interpreter you either execute
>>> quit()
or you press Ctrl + D
. Besides the interactive Python interpreter you can
also write scripts with Python. Scripts are files that can be executed from the
command line interface. They contain Python expressions that get executed once
you call the scripts. A script can be simple and merely rename files or it can
be complex and run a complete simulation of a car crash. You decide how simple
they are.
Your first Python script¶
We will start with the infamous Hello, World! Program. Open a new terminal,
activate your virtual environment, and create a new file named
hello_world.py
via
$ touch hello_world.py
Open it with your favorite text editor, e.g., Atom or SublimeText. In the former case you would open the file via
$ atom hello_world.py
Now type (not copy!) the following into the file
print('Hello, World!')
Save the file, switch to your command line interface, and execute
$ python hello_world.py
If you did everything correctly you should see the phrase Hello, World!
popping up in your command line interface. If you see something like
File "hello_world.py", line 1
print('Hello, World!)
^
SyntaxError: EOL while scanning string literal
or
File "hello_world.py", line 2
^
SyntaxError: unexpected EOF while parsing
it means that you have either forgotten the closing '
or )
,
respectively. As you can see Python tries its best to describe the error to you
so that it can be fixed quickly.
If everything went fine: Congratulations! You wrote your first Python script!
The print
function¶
The function you used in your first Python script, the print()
function,
has a rather simple goal: Take whatever you have in there and display it in the
command line interface. In the Python interpreter (the command line starting
with >>>
) the result of an expression was displayed automatically. Try
creating a new file math_expressions.py
and enter several mathematical
expressions like you did earlier. Save the file, switch to your terminal and
execute the file via
python math_expressions.py
You should not see a single thing happening. That is because you never told
Python what to actually do with those expressions. So what it does is evaluate
them and nothing more. Now wrap the mathematical expressions in the
print()
function, for example like this:
print((3 + 4)*6)
If you execute the script again you should see the expected output.
Integers, floats and strings¶
In the previous examples you worked with integers, floating-point numbers, and
with strings. -4
, 0
, and 2
are all integers. 1.2
, 1.0
and
-2e2
(which is the scientific notation for -200.0
) are floating-point
numbers. Finally, 'Hello, World!'
is a string. These categories are called
data types. Every value in Python is of a certain data type.
The meaning of operators may depend on the data types of the values surrounding
it. Take, e.g., the addition operator +
:
>>> 1 + 2
3
>>> 1.2 + 3.4
4.6
>>> 'My first sentence.' + 'My second sentence.'
'My first sentence.My second sentence.'
>>> 'My ' + 3 + 'rd sentence.'
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: Can't convert 'int' object to str implicitly
In the last case the addition operator has no idea how to combine the integer
1
with the strings. What you can do to solve this is to convert the integer
to a string using str()
:
>>> 'My ' + str(3) + 'rd sentence.'
'My 3rd sentence.'
If you want to convert something to a string you use str()
, to convert to
an integer you use int()
, for floating-point numbers you use
float()
.
>>> '1.2' + '3.4'
'1.23.4'
>>> float('1.2') + float('3.4')
4.6
>>> int('1.4')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: invalid literal for int() with base 10: '1.4'
>>> str(1e2)
'100.0'
Play around with those three functions to see what can be converted and what can not. Try the different operators, e.g., try to multiply a string with an integer, etc.
Variables¶
Like in mathematics you can also use variables to store values. A variable has a name by which it is called and a value. There are three rules that a variable name must comply:
- It must be exactly one word.
- It must comprise only letters, numbers, and the underscore character.
- It must not begin with a number.
Other than that anything goes.
Warning
Although anything else is a viable variable name, you should take special
care not to use names of built-in objects like, e.g., int
. If you name
a variable after some function or class it is not usable anymore in the
subsequent code.
To assign a value to a variable you use the
equal sign =
with the variable name on the left and the value on the right:
>>> my_first_variable = 21
>>> 2*my_first_variable
42
>>> my_second_variable = 3
>>> my_first_variable/my_second_variable
7.0
>>> my_third_variable = my_first_variable
>>> print(my_third_variable)
21
Here is a slightly more complex example:
students = 35
tutors = 2
classrooms = 1
pizza_orders = 20
students_per_tutor = students / tutors
persons = students + tutors
persons_per_classroom = persons / classrooms
hungry_persons = persons - pizza_orders
print('There are', students, 'students and', tutors, 'tutors.')
print('That makes', persons, 'persons in', classrooms, 'class room(s).')
print(hungry_persons, 'have to stay hungry...')
The advantages of using variables are two-fold:
- If the amount of students, tutors, classrooms or pizza_orders changes you only have to update one line instead of many. This is less error-prone and faster.
- You give the values some meaning which should be represented in the variable name. You could in principle read “the students per tutor is the amount of students divided by the amount of tutors.” This makes your code easily comprehensible and you need fewer comments. But you still should write them when they make sense!
And here is what the output should look like:
There are 35 students and 2 tutors.
That makes 37 persons in 1 class room(s).
17 have to stay hungry...
Notice how we used ,
to separate strings and variable names in
print()
, but everything was composed in a nice way? The reason for this
is that print()
can take an arbitrary amount of arguments. Just chain
them using ,
and you are good to go. How this works is part of the section
Functions.
User input¶
In some cases you may want to ask the user of your script to provide some
additional information, like the path to a file or parameters for a simulation.
For this the input()
can be used.
print('What is your name?')
name = input()
print('Nice to meet you,', name)
Note
The value returned by input()
is always a string. So when you are
asking for numbers you have to convert them.
print('What is your age in years?')
age = int(input())
print('In 5 years you will be', age+5, 'years old.')
Imports¶
Sometimes the features that Python offers by default are not enough. What if you want to use the \(\sin(x)\) function? For more specialized topics Python offers modules or packages, either ones that already ship with every Python installation or packages from external parties. The packages that Python ships with are called the standard library. External packages may be, e.g., NumPy and SciPy for scientific computing with Python, or Matplotlib for plotting.
You activate this additional functionality by importing these packages:
>>> import math
Now we have access to all functions available in the math
module.
>>> math.pi
3.141592653589793
>>> math.sin(0.5*math.pi)
1.0
Take your time and browse the documentation of the math
module, try some
of the provided functions like math.ceil()
, math.exp()
, etc.
Summary¶
- You can use the interactive Python interpreter to execute small commands.
- You can execute scripts that hold several commands using Python.
- You can display results of computations or strings using the
print()
function- You can use
str()
,int()
,float()
to convert from one data type to another—if it is somehow possible.- You can store values in variables to access them at a later point in your script.
- You can import modules or packages to extend Pythons builtin functionality using the import statement.
Exercises¶
- Write a script that asks the user for the radius of a circle and subsequently shows the circumference and the area of the circle in the terminal.
Comments¶
Once your programs grow longer and the logic gets more and more complicated it is a very good idea to place meaningful comments in your code. This makes it easier for the future you and your collaborators or supervisors to actually understand why your implementation is as it is.
In the very beginning your comments may be used to outline what the code block following the comment is actually doing. Once you get more familiar with Python the content of the comment should change from what the code is doing to why it is doing it.
To actually write a comment prepend it with the #
character. So if you
write
# In the following example you should only see
# ``Now you see me.``, and
# ``Are you still seing me?`` in the terminal.
print('Now you see me.')
# print("Now you don't.")
print('Are you still seeing me?') # Inline comments are nice as well!
into a file and execute it you should only see
Now you see me.
Are you still seeing me?
Summary¶
- You can use comments by prepending
#
- Comments are useful to explain why you did something the way you did it
Exercises¶
- If you encounter some more complicated code-blocks in the future comment them. Admittedly this is not directly an exercise, but a sincere request.
- Really, comment your code for others and your future self. You will forget why you coded something the way you did. Still not really an exercise.
Containers¶
At one point you may want to keep objects like strings or numbers in another object to show that they are strongly related. These objects that contain other objects are called container. Python offers a fair amount of useful containers out of the box, some of which are introduced in the following.
Lists¶
If you need a flexible container that can take in about anything (even itself)
and you need to keep an order to the contained objects you should use a
list
. An empty list is generated by either
>>> list()
[]
or
>>> []
[]
Or you generate a non-empty list
>>> list([1, 2, 3])
[1, 2, 3]
or
>>> [1, 2, 3]
[1, 2, 3]
Accessing values for the list is done like this:
>>> x = [1, 2, 3]
>>> print(x[1])
2
Note
As you can see in the example above specifying x[1]
did not give you
the first element of the list, but the second. This is due to Python
starting indexing at 0!
With this you can also change values in a list
>>> x = [1, 2, 3]
>>> x[1] = 4
>>> print(x)
[1, 4, 3]
Negative indices can be used to retrieve element from the end:
>>> x = [1, 2, 3, 4, 5]
>>> print(x[-1])
5
>>> print(x[-2])
4
You can also create slices of the list using the following notation:
list[start:end:step]
Its usage is best made clear by some examples:
>>> x = ['zero', 'one', 'two', 'three', 'four']
>>> print(x[2:])
['two', 'three', 'four']
>>> print(x[:3])
['zero', 'one', 'two']
>>> print(x[1:3])
['one', 'two']
>>> print(x[::2])
['zero', 'two', 'four']
>>> print(x[::-1])
['four', 'three', 'two', 'one', 'zero']
>>> print(x[-2:])
['three', 'four']
You can also append values to a list
>>> x = [1, 2, 3]
>>> print(x)
[1, 2, 3]
>>> x.append(4)
>>> print(x)
[1, 2, 3, 4]
or insert values at specified positions
>>> x = [1, 2, 3]
>>> print(x)
[1, 2, 3]
>>> x.insert(0, -1)
>>> print(x)
[-1, 1, 2, 3]
>>> x.insert(1, 0)
>>> print(x)
[-1, 0, 1, 2, 3]
A list can hold anything, even other lists
>>> [1, 'two', ['three', 4]]
[1, 'two', ['three', 4]]
Tuples¶
If you are sure that you do not want to modify the data container after its
generation and want to keep everything in the order you specified the
tuple
container is the container of choice. They may be initialized by
>>> tuple([1, 2, 3])
(1, 2, 3)
>>> (1, 2, 3)
(1, 2, 3)
Accessing single elements is done like in lists:
>>> x = (1, 2, 3)
>>> print(x[1])
2
And when you try to modify them an error is raised:
>>> x = (1, 2, 3)
>>> x[1] = 4
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'tuple' object does not support item assignment
Sets¶
If you do not want duplicates in your container and you do not care about their
order you should use a set
. They are initialized via
>>> set() # This one is empty
set()
>>> set([1, 2, 2, 3, 3, 'three'])
{1, 2, 3, 'three'}
>>> {1, 2, 2, 3, 3, 'three'}
{1, 'three', 2, 3}
and elements can be added like this:
>>> x = {1, 2, 3}
>>> print(x)
{1, 2, 3}
>>> x.add('four')
>>> print(x)
{1, 2, 'four', 3}
Sets
furthermore support a lot of the operators you are familiar
with from the set in mathematics. For further information see the Python
documentation.
Dictionaries¶
If you do not want to keep the order of things but would rather like to use
keys to access the objects in the container the dict
has got you
covered. Dicts
are initialized via
>>> dict()
{}
>>> {}
{}
>>> x = {'One': 1, 'two': 2, 'THREE': ['A', 'list', 'of', 'strings']}
{'THREE': ['A', 'list', 'of', 'strings'], 'One': 1, 'two': 2}
and their values may be accessed like this:
>>> x = {'One': 1, 'two': 2, 'THREE': ['A', 'list', 'of', 'strings']}
>>> x['One']
1
>>> x['THREE']
['A', 'list', 'of', 'strings']
As you can see there are no restrictions regarding the type of the value that
each key is pointing to. Keys on the other hand have to be hashable.
In practice the keys are usually ints
or strings
.
If you want to retrieve something that does not exist an error is raised:
>>> x = {'One': 1, 'two': 2, 'THREE': ['A', 'list', 'of', 'strings']}
>>> x['four']
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
KeyError: 'four'
A way of providing a fallback in case a key does not exist in the dictionary is
the dict.get()
method:
>>> x = {'One': 1, 'two': 2, 'THREE': ['A', 'list', 'of', 'strings']}
>>> x.get('two', 'fallback')
2
>>> x.get('four', 'fallback')
'fallback'
Common operations¶
Length¶
Sometimes you are interested in the amount of objects that a container holds.
This information can be accessed using the len()
function:
>>> x = [1, 2, 3]
>>> len(x)
3
>>> x = (1, 2)
>>> len(x)
2
>>> x = set([1, 2, 2, 3, 3, 'three'])
>>> len(x)
4
>>> x = {'One': 1, 'two': 2, 'THREE': ['A', 'list', 'of', 'strings']}
>>> len(x)
3
Membership check¶
>>> x = [1, 2, 3]
>>> 2 in x
True
Note
While the behavior is clear for lists
,
tuples
and sets
, the in
checks for the
keys of a dictionary
.
>>> x = {
... 'one': 'two',
... 3: 4
... }
>>> 'one' in x
True
>>> 'two' in x
False
>>> 3 not in x
False
>>> 4 not in x
True
Due to the nature of the implementation member checks are especially fast in
sets
and dictionaries
Flow control¶
“Would you tell me, please, which way I ought to go from here?”“That depends a good deal on where you want to get to,” said the Cat.“I don’t much care where—” said Alice.“Then it doesn’t matter which way you go,” said the Cat.“—so long as I get somewhere,” Alice added as an explanation.“Oh, you’re sure to do that,” said the Cat, “if you only walk long enough.”—Alice’s Adventures in Wonderland
Up until now the code war rather… underwhelming. You only executed a sequence of commands each and every time. But with the introduction of the control flow of a program a whole new world of programs reveals itself. It is up to your experience and creativity to use the control flow tools to structure your code in a way that complex tasks can be handled.
Boolean expressions¶
Knowing boolean expressions is essential for the control flow of a program. Python handles this rather intuitively. The most important boolean operations are
== : | equal to |
---|---|
!= : | unequal to |
< : | less than |
<= : | less than or equal to |
> : | greater than |
>= : | greater than or equal to |
>>> 1 == 1
True
>>> 1 + 1 != 2
False
>>> 5*2 >= 11
False
Warning
Beware of float
comparisons, as the way computers represent
floating-point numbers internally might lead to some seemingly weird
behavior.
>>> 0.1 + 0.2 == 0.3
False
This is certainly unexpected behavior. To deal with this shortcoming
version 3.5 of Python introduced the isclose
function in its
math
module.
>>> import math
>>> math.isclose(0.1 + 0.2, 0.3)
True
If you are working with an older version there are still convenience
solutions. The numpy
package that is introduced later on in this
tutorial also provides the means for float
comparisons.
In many cases a simple boolean expression is not sufficient to correctly
specify the control flow of a program. Sometimes a code block should only be
executed when a condition is not met, when several conditions are met all at
once or when at least one of several conditions is met. For this Python hast
the keywords not
, and
, and or
, respectively. Their behavior is
outlined in the following tables.
A | not A |
---|---|
True | False |
False | True |
A | B | A and B |
---|---|---|
True | True | True |
True | False | False |
False | True | False |
False | False | False |
A | B | A or B |
---|---|---|
True | True | True |
True | False | True |
False | True | True |
False | False | False |
Combining boolean expressions using these keywords is an essential skill for programming.
>>> i = 3
>>> i < 4 and i >= 0
True
>>> i < 4 and i > 3
False
>>> i < 4 and not i > 3
True
>>> i >= 0 or i > 3
True
You can furthermore use brackets to specify the order of the evaluation of the subexpressions like you would in equations.
Note
In Python False
is not the only thing that is “false” in Python.
False
, None
, numeric zero of all types, and empty strings
and containers are all interpreted as false.
The most important construct using boolean expressions is introduced in the following.
The if
statement¶
A construct every programming language provides is the if statement. The basic structure is as follows:
if <expression>:
<code block>
The result is that the code block is only executed when expression is
True
.
Note
Test
>>> x = 0
>>> y = 1
>>> if x > 0:
... print('Larger than 0')
...
>>> if y > 0:
... print('Larger than 0')
...
Larger than 0
The expression can also include a negation using Boolean operations:
>>> x = 0
>>> y = 1
>>> if not x > 0:
... print('Not larger than 0')
...
Not larger than 0
>>> if not y > 0:
... print('Not larger than 0')
...
If you want to cover both cases you can also use the else keyword:
>>> x = 1
>>> if x < 0:
... print('Negative')
... else:
... print('Positive')
...
Positive
But as you can see this does not cover all the cases. What if x
is 0? For
this we have to use elif:
>>> x = 0
>>> if x < 0:
... print('Negative')
... elif x == 0:
... print('Zero')
... else:
... print('Positive')
...
Zero
And you can add as many elifs as you want.
The while
loop¶
Sometimes it is necessary to perform a routine until a certain condition is met. This is achieved using a while loop.
>>> x = 0
>>> while x < 5:
... print(x)
... x += 1
...
0
1
2
3
4
Assume you want to exit the while loop when a certain condition is met. This is possible with the break statement.
>>> x = 0
>>> while x < 5:
... if x == 3:
... break
... print(x)
... x += 1
...
0
1
2
Attention
Although while loops are a common building stone in every programming language I advise you to avoid using them whenever possible. It happens quite easily that the criterion for exiting the loop is never reached and your program gets stuck performing the same task more often than you intended. In many cases a while loop can be substituted with a for loop.
The for
loop¶
In a lot of cases you just want to work on all the elements of a container one at a time. This is easily achieved with for loops.
>>> x = [1, 2, 3]
>>> for i in x:
... print(i)
...
1
2
3
Here i
takes on the value of the elements of x
one after the other.
This allows you to work with i
inside of this for loop. After
all elements have been visited you automatically exit the loop. A more
sophisticated example might be to store the squared values of another list in
a new list.
>>> x = [1, 2, 3]
>>> x_squared = []
>>> for value in x:
... x_squared.append(value**2)
...
>>> print(x_squared)
[1, 4, 9]
range
¶
A shortcut to loop over integers is given as the range()
function.
>>> for i in range(3):
... print(i)
...
0
1
2
>>> for i in range(3, 6):
... print(i)
...
3
4
5
>>> for i in range(3, 12, 3):
... print(i)
...
3
6
9
enumerate
¶
Sometimes you also want to track where you currently are in your iteration. For
example you want to know what the current state of your program is, but
printing the value you are operating on each single time is kind of too much.
Then you could use enumerate()
like this:
>>> results = []
>>> for i, value in enumerate(range(100,900)):
... if i % 200 == 0:
... print('Current step:', i, '-- Value:', value)
... results.append(i**2 % 19)
...
Current step: 0 -- Value: 100
Current step: 200 -- Value: 300
Current step: 400 -- Value: 500
Current step: 600 -- Value: 700
As you can see we now have comma-separated variables i
and value
. i
get the current index we are in whereas value
holds the actual object of
the container.
zip
¶
Another common task is that you have to loop over several lists at the same
time. Use the :func:zip
function for this:
>>> fruits = ['banana', 'orange', 'cherry']
>>> colors = ['yellow', 'orange', 'red']
>>> for fruit, color in zip(fruits, colors):
... print('The color of', fruit, 'is', color)
...
The color of banana is yellow
The color of orange is orange
The color of cherry is red
Note
zip()
stops the for loop as soon as one list
is
empty:
>>> fruits = ['banana', 'orange', 'cherry', 'apple', 'lemon']
>>> colors = ['yellow', 'orange', 'red']
>>> for fruit, color in zip(fruits, colors):
... print('The color of', fruit, 'is', color)
...
The color of banana is yellow
The color of orange is orange
The color of cherry is red
Errors¶
As Python is a dynamic language it can never be garantueed that the input of your functions is what you want it to be. Take as an example a function whose purpose is the computation of the sum of the digits in a number. What if by accident someone passes a string as argument to this function? In some cases it is hence a good idea to check the input of a function for its sanity. If the input does not hold this test you may raise and error like this:
>>> raise ValueError('The input was wrong')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: The input was wrong
Summary¶
- if statements are a way to add a branch into your code.
- while loops should be avoided whenever possible.
- for loops are used to work on items of a container.
range()
is used to ad-hoc generate containers holding integers.enumerate()
is used to keep track of the current index in a for loop.zip()
is used to loop over several containers at once.- raise is used to raise an error when something is fishy.
Functions¶
Introduction¶
Some tasks in your code may be performed a lot of times. Say you want to say “Hello” not only to the world but to a lot of people. You would have to write
print('Hello, {name}!')
an awful lot of times. It would be way shorter if we were able to write
greet('{name}')
and it would just print Hello, {name}!
. OK, it is not that shorter… But
imagine you want to find the roots of some functions. There is way more code
involved and having a nice little shortcut to execute all that code without
writing it over and over again would be nice, wouldn’t it? This is in the
spirit of the DRY principle! As a nice side effect you only have to change the
routine in one place of your code instead of all of them if you find a bug or
want to use some more sophisticated routine.
But even if you do only use a code snippet once in your program it may be more declarative to give it a short name and hide the implementation somewhere else. This also helps you to divide and conquer larger problems!
The nice thing is that you already know a function: print()
!
You used it like this:
print('Hello, World!')
Let’s take a closer look at the parts that make up a function.
Components of a function¶
Name¶
Most functions have names [1]. The name of print()
is – never
would have guessed – print. Printing the function without calling it reveals
that it is a built-in function:
>>> print(print)
<built-in function print>
There are a lot more Built-in Functions, among them, for example,
float()
.
Positional arguments¶
float()
is a function that lets you convert a number or a
string that represents a number to a floating point number. Its interface is
defined as
-
float
(x)
where x
is its argument.
To be more precise it is its positional argument.
By calling float()
and passing an integer as argument the
result is the corresponding float:
>>> float(1)
1.0
>>> float(-2)
-2.0
You can also pass strings that represent numbers to float()
:
>>> float('1')
1.0
>>> float('-2')
-2.0
>>> float('1.500')
1.5
>>> float('1e-2')
0.01
>>> float('+1E6')
1000000.0
A similar function also exists for integer. It is the function
int()
.
Keyword arguments¶
int()
is a function that lets you convert a number or a string
that represents a number to an integer. Its interface is defined as
-
int
(x, base=10)
where x
is its positional argument and base
is its keyword
argument. While positional arguments must be passed to a function keyword
arguments are optional and provide default values. The default value of
base
is 10
.
When passing floats to int()
everything after the decimal point
is dropped:
>>> int(1.0)
1
>>> int(-2.0)
-2
>>> int(1.3)
1
>>> int(1.8)
1
>>> int(-1.3)
-1
>>> int(-1.8)
-1
When passing strings to int()
the base
keyword argument is
used to indicate how the number should be interpreted in terms of which base it
is given in.
>>> int('1')
1
>>> int('-2')
-2
>>> int('101010')
101010
>>> int('101010', base=2)
42
You do not have to write out the keyword arguments, they will be interpreted in the same order as they are in the interface:
>>> int('101010', 2)
42
But you can not convert strings that represent floating point numbers to integers like this:
>>> int('1.0')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: invalid literal for int() with base 10: '1.0'
For this you would have to chain int()
and
float()
:
>>> int(float('1.0'))
1
Defining a function¶
A function is defined by starting with the def
keyword. Subsequently you
provide the name of the function and its arguments in paranthesis. You can then
use it like the functions you already used.
>>> def greet(name):
... print('Hello', name)
...
>>> greet('World')
Hello World
>>> greet('you')
Hello you
>>> def add_reciprocal(a, b):
... return 1/a + 1/b
>>> add_reciprocal(4, 8)
0.375
To provide keyword arguments you let the argument have a default value by assigning a value to it:
>>> def name_and_favorite_food(name, favorite_food='pizza'):
... return name + "'s favorite food is " + favorite_food + '.'
...
>>> name_and_favorite_food('Dominik')
"Dominik's favorite food is pizza."
>>> name_and_favorite_food('Stefan', 'kimchi')
"Stefan's favorite food is kimchi."
If you execute them in the interactive Python interpreter the value returned by
a function is automatically displayed if you do not assign it to a variable. It
is important to know that the return
keyword is used to make the value
accessible to the outer scope, that is outside of the function. See the
following example, where we first do not have a return statement:
>>> def is_positive_without_return(x):
... if x >= 0:
... print(True)
... else:
... print(False)
...
>>> is_positive_without_return(1)
True
>>> is_positive_without_return(-1)
False
>>> a = is_positive_without_return(1)
True
>>> print(a)
None
As you can see the variable a
has no value at all. If you want a
to
receive the result of the function you have to return it, not print it:
>>> def is_positive_without_return(x):
... if x >= 0:
... return True
... else:
... return False
...
>>> is_positive_without_return(1)
True
>>> is_positive_without_return(-1)
False
>>> a = is_positive_without_return(1)
>>> print(a)
True
Functions as function arguments¶
You are able to pass almost anything to a function—even functions itself! This is especially useful if you want to do data fitting or root finding of mathematical functions.
def format_heading(text):
return '\n' + text + '\n' + '='*len(text) + '\n'
def prettify(sections, header_formatter):
# Assume that `sections` is a list of dictionaries with the keys
# ``heading`` for the heading text and ``content`` for the content
# of the section.
# Assume that the `header_formatter` is a function that takes a string
# as argument and formats it in a way that is befitting for a heading.
text = ''
for section in sections:
heading = section['heading']
content = section['content']
text += format_heading(heading) + content + '\n'
# Before returning it we make shure that all surrounding whitespace is
# gone.
return text.strip()
secs = [
{
'heading': 'Introduction',
'content': 'In this section we introduce some smart method to teach Python.'
},
{
'heading': 'Results',
'content': 'More than 42 % of participants in this study learned Python.'
}
]
pretty_text = prettify(secs, header_formatter=format_heading)
print(pretty_text)
So as you can see the argument header_formatter
is just treated as if it
was a function. If you run this script the output will be:
Introduction
============
In this section we introduce some smart method to teach Python.
Results
=======
More than 42 % of participants in this study learned Python.
Summary¶
- Functions can make your life easier by streamlining repeated tasks or giving a name to complex programming logic.
- Functions may have two different kinds of arguments, positional arguments that must be given to the function and keyword arguments which provide default values.
- Functions are defined using the
def
keyword.- Functions may return values by using the
return
keyword.
Exercises¶
Heaviside step function¶
The Heaviside step function can be defined as
Write a function that implements the Heaviside step function following the given definition:
-
heaviside_step_function
(n)¶ Return the value of the Heaviside step function of n.
Parameters: n (float) – The number of which the Heaviside step function should be computed. Returns: The value of the Heaviside step function of n. Return type: int
Start by downloading the
exercise template
and editing
this file. You can run tests via
$ python heaviside_step_function.py test
to check whether you got a correct solution. You can also take a look at
one possible solution
.
Sum of multiples of 3 and 5¶
Note
The following problem was inspired by Problem 1 at Project Euler.
If you list all the natural numbers below 10 that are multiples of 3 or 5, you get 3, 5, 6, and 9. The sum of these multiples is 23. Define a function according to the following definition:
-
sum_of_multiples_of_3_and_5
(n)¶ Return the sum of all multiples of 3 and 5 below n.
Parameters: n (int) – The number up to which the sum of multiples of 3 and 5 is computed. Returns: The sum of all multiples of 3 and 5 up to n. Return type: int
Start by downloading the
exercise template
and
editing this file. You can run tests via
$ python sum_of_multiples_of_3_and_5.py test
to check whether you got a correct solution. You can also take a look at
one possible solution
.
Even Fibonacci numbers¶
Note
The following problem was inspired by Problem 2 at Project Euler.
Each new term in the Fibonacci sequence is generated by adding the previous two terms. By starting with 0 and 1, the first ten terms will be
0, 1, 1, 2, 3, 5, 8, 13, 21, 34.
Define a function according to the following definition:
-
even_fibonacci_numbers
(n)¶ Return the sum of all even Fibonacci numbers below n.
Parameters: n (int) – The number up to which the sum of Fibonacci numbers is computed. Returns: The sum of all Fibonacci numbers up to n. Return type: int
Start by downloading the
exercise template
and
editing this file. You can run tests via
$ python even_fibonacci_numbers.py test
to check whether you got a correct solution. You can also take a look at
one possible solution
.
Prime factorization¶
The prime factors of 360 are
Write a function that returns a dictionary whose first keys correspond to the prime factor and the value to the multiplicity of this prime factor. For example, given 360 the function should return:
{
2: 3,
3: 2,
5: 1
}
-
prime_factorization
(n)¶ Return the prime factorization of n.
Parameters: n (int) – The number for which the prime factorization should be computed. Returns: List of tuples containing the prime factors and multiplicities of n. Return type: dict[int, int]
Start by downloading the
exercise template
and editing
this file. You can run tests via
$ python prime_factorization.py test
to check whether you got a correct solution. You can also take a look at
one possible solution
.
Newton’s method (1D)¶
Newton’s method is a rather popular iterative root finding algorithm. Starting at an initial guess \(x_0\) it tries to find better and better approximations of the root of a function \(f(x)\). For this it uses the first derivative \(f'(x)\) of the function. The process
is repeated until a value \(f(x_n)\) is reached that is within a predefined tolerance to zero. For further information see the Wikipedia page.
The way it is supposed to work is as follows:
>>> def f(x):
... return x**2 - 2
...
>>> def df_dx(x):
... return 2*x
...
>>> newtons_method_1d(f, df_dx, x0=4, tol=1e-8)
1.4142135623730951
Hence, implement a function following the given definition:
-
newtons_method_1d
(f, df_dx, x0, tol)¶ Return the root of f within tol by using Newton’s method.
Parameters: Returns: The root of f within a tolerance of tol.
Return type:
Start by downloading the
exercise template
and
editing this file. You can run tests via
$ python newtons_method_1d.py test
to check whether you got a correct solution. You can also take a look at
one possible solution
.
Footnotes
[1] | Lambdas are the exception to the rule as they define anonymous functions. |
Strings¶
You have worked with them for a while now: strings
. In principle
strings are containers of characters in a certain order. As such they are
somewhat similar to lists
<list>`, and also support, e.g., indexing.
>>> my_first_string = 'Hello, World!'
>>> my_first_string[7:]
'World!'
>>> my_first_string[::-1]
'!dlroW ,olleH'
They offer a large amount of ways to work with text, of which a few selected ones are outlined in the following.
split
¶
Working with a whole string at once is needlessly complicated in many cases.
Hence the Python standard library offers a way to take a string
apart: str.split()
>>> some_string = 'My first string'
>>> some_string.split()
['My', 'first', 'string']
By default it splits to an arbitrary amount of whitespace
>>> some_string = 'My second string'
>>> some_string.split()
['My', 'second', 'string']
Note
In Python whitespace is everything you can not directly see like spaces,
tabs (\t
in strings), and newlines (\n
in strings).
but you can also specify another string which is used to split the other string apart:
>>> some_string = '0.1, 0.2, 0.3, 0.4, 0.5'
>>> some_string.split(',')
['0.1', ' 0.2', ' 0.3', ' 0.4', ' 0.5']
The string that is used to split with is consumed in the process. So to get rid of the additional whitespace you might get the idea to add it to the string used for the splitting
>>> some_string = '0.1, 0.2, 0.3, 0.4, 0.5'
>>> some_string.split(', ')
['0.1', '0.2', '0.3', '0.4', '0.5']
Eventually this might lead to problems if the string format is not strictly kept:
>>> some_string = '0.1, 0.2,0.3, 0.4, 0.5'
>>> some_string.split(', ')
['0.1', ' 0.2,0.3', '0.4', ' 0.5']
But Python’s got you covered…
strip
¶
To get rid of unwanted whitespace around a string you can use the
str.strip()
method:
>>> some_string = '\n\n So much surrounding whitespace\n\n'
>>> print(some_string)
So much surrounding whitespace
>>> some_string.strip()
'So much surrounding whitespace'
You can also get rid of something else than whitespace by specifying the characters as an argument:
>>> more_string = '...Python...'
>>> more_string.strip('.')
'Python'
Note
The characters you specify as an argument are not seen as a string but as a collection of characters you want to strip of the string:
>>> incantation = 'abracadabra'
>>> incantation.strip('bra')
'cad'
Summary¶
- Use
str.split()
to split astring
at a substring. - Use
str.strip()
to get rid of excess characters at the edges, especially whitespace.
Exercises¶
Split and strip¶
From time to time you may want to split a string by a delimiting character like
,
, but the whitespace is all over the place.
>>> some_string = '0.1, 0.2,0.3, 0.4, 0.5'
>>> some_string.split(',')
['0.1', ' 0.2', '0.3', ' 0.4', ' 0.5']
>>> some_string.split(', ')
['0.1', ' 0.2,0.3', '0.4', ' 0.5']
So a pure splitting operation does not really do the trick. Write a function that improves this behavior for this case, so that you can use it as follows:
>>> strip_and_split(some_string, ',')
['0.1', '0.2', '0.3', '0.4', '0.5']
-
split_and_strip
(string, delimiter)¶ Return a list of stripped strings after splitting string by delimiter.
Parameters: Returns: The list of strings that are stripped after splitting by delimiter.
Return type:
Start by downloading the
exercise template
and editing
this file. You can run tests via
$ python split_and_strip.py test
to check whether you got a correct solution. You can also take a look at
one possible solution
.
Is palindrome string¶
A palindrome is a word, phrase, number or sequence of characters which reads
the same backward as forward. Examples for words are radar, level, and
racecar. Write a function that checks whether a strings
is a
palindrome or not.
-
is_palindrome_string
(string)¶ Return whether string is a palindrome, ignoring the case.
Parameters: string (str) – The string to check. Returns: A boolean representing whether string is a palindrome or not. Return type: bool
Start by downloading the
exercise template
and editing
this file. You can run tests via
$ python is_palindrome_string.py test
to check whether you got a correct solution. You can also take a look at
one possible solution
.
File IO¶
Usually the result of your program is important not only now but in the future as well. For this it is a good practice to store the results in file so that you can work with them later on if necessary. This is called file input and output, or in short: file IO.
Reading from a file¶
The best way to open a file in Python is by using
the with statement. This automatically opens the file, keeps a
lock on it so that no one is able to modify it while you are working with it,
and closes it afterwards. Interacting with the file is then done by working
with the variable that you specify after as
. This is called a
file object. To get the complete content of a file use its read()
method like this:
>>> with open('csv_file.txt', 'r') as f:
... file_content = f.read()
...
>>> print(file_content)
1, 2, 3
4, 5, 6
7, 8, 9
where the first parameter is the path to the file and the second parameter is
the so-called file mode. r
is for read-only.
Note
In some tutorials and also production code you may find something along the lines of this to interact with a file:
>>> f = open('csv_file.txt', 'r')
>>> file_content = f.read()
>>> f.close()
>>> print(file_content)
1, 2, 3
4, 5, 6
7, 8, 9
This is kind of the old style of working with files when
the with statement did not exist yet. It has the huge
downside that you have to take care about closing the file. And if an error
is raised between open
and close
it is not closed at all.
The with statement takes care of this for you even in the
case of an exception and is subsequently the preferred way.
You can also iterate over f
as if it is some form of container:
>>> with open('csv_file.txt', 'r') as f:
... for i, line in enumerate(f):
... print('Line', i, '--', line)
...
Line 0 -- 1, 2, 3
Line 1 -- 4, 5, 6
Line 2 -- 7, 8, 9
As you can see you get some additional white lines. The reason for this is that
each line still contains its newline character \n
in the end. To this one
print()
adds an additional one by default so you end up with an empty
line. To circumvent this use the strip()
method of the line string:
>>> with open('csv_file.txt', 'r') as f:
... for i, line in enumerate(f):
... stripped_line = line.strip()
... print('Line', i, '--', stripped_line)
...
Line 0 -- 1, 2, 3
Line 1 -- 4, 5, 6
Line 2 -- 7, 8, 9
Writing to a file¶
To write to a file you have to open it first, this time with w
as file
opening mode, to indicate that you want to write to the file. Then you can
use the write()
method of the file object to write to the file:
>>> with open('my_first_file.txt', 'w') as f:
... f.write('This is smart.')
... f.write('This is even smarter.')
...
14
21
Now the content of your file would be
This is smart.This is even smarter.
Which is not nicely formatted. So you have to take care that you add the
newline character \n
and spaces accordingly:
>>> with open('my_first_file.txt', 'w') as f:
... f.write('This is smart.\n')
... f.write('This is even smarter.\n')
...
15
22
Subsequently the content of your file would be
This is smart.
This is even smarter.
Summary¶
- You can open a file using the
open()
function in awith
statement.- To merely read from a file open it with the filemode
r
and use theread()
method of the file object.- To write from a file open it with the filemode
w
and use thewrite
method of the file object.
Classes¶
Especially for larger code bases the rather mathematical way of programming may become slightly convoluted. A paradigm that tries to alleviate this is object-oriented programming. Let us just dive right in with the definition of a rectangle class. Each rectangle has a length and a width, from which one can compute its area and perimeter. Here is the definition:
class Rectangle:
"""A rectangle that is described by its length and width."""
def __init__(self, length, width):
self.length = length
self.width = width
def area(self):
"""Return the area of the rectangle."""
return self.length*self.width
def perimeter(self):
"""Return the perimeter of the rectangle."""
return 2*(self.length + self.width)
Let us go over this step by step. The definition of a class starts with the
keyword class
followed by the name of the class. Within the class you can
define methods. They are like functions that are attached to the class. What
is slightly peculiar is, that all take self
as their first argument—
we will come to that in a second. The first method is __init__
.
def __init__(self, length, width):
self.length = length
self.width = width
As you use classes you instantiate objects of that class. During the
instantiation you can provide some initial arguments to the class to
customize the resulting object using the special __init__
method. In this
example a rectangle object can be initialized with values for its length
\(l\) and for its width \(w\). These values are then stored in the
instance as attributes. The reference to the instance is the self
argument, that each method has as its first argument. So by attaching
information to self
, each distinct object has its own state.
Now we can exploit this in other methods. The area \(A\) of a rectangle is defined as
In the code this is described by the following definition:
def area(self):
"""Return the area of the rectangle."""
return self.length*self.width
Where a method is defined, that takes the length of the rectangle
self.length
, multiplies it with self.width
and then returns it. If we
did not use self.length
but just length
or used width
instead of
self.width
we would not have accessed the values we stored in this
rectangle during its initialization, but some global values instead. So to make
sure that we only use the attributes of our specific rectangle we access them
from self
.
The method used to get the perimeter \(P\), which, for a rectangle, is defined as
follows the same theme:
def perimeter(self):
"""Return the perimeter of the rectangle."""
return 2*(self.length + self.width)
In the following the usage of this class is shown.
first_rectangle = Rectangle(length=2, width=3)
print('Information about the first rectangle')
print('Length:', first_rectangle.length)
print('Width:', first_rectangle.width)
print('Area:', first_rectangle.area())
print('Perimeter:', first_rectangle.perimeter())
First an object of the class Rectangle
is instantiated with the length
argument set to 2
and the width
argument set to 3
. The name of our
first Rectangle
object is first_rectangle
. Subsequently the attributes
length
and width
of the object first_rectangle
can be accessed via
first_rectangle.length
and first_rectangle.width
, respectively. One
could say that what ever has been self
in the class definition now is
replaced by the name of the object. In the case of the methods the self
argument is implicitly supplied by calling the method from the object. So it is
sufficient to use first_rectangle.area()
, and not
first_rectangle.area(self)
or first_rectangle.area(first_rectangle)
—
both of which would be wrong. The output of the above code is
Information about the first rectangle
Length: 2
Width: 3
Area: 6
Perimeter: 10
If another rectangle is instantiated with different values the information
changes accordingly. So in the case of a second_rectangle
second_rectangle = Rectangle(length=5, width=7)
print('Information about the second rectangle')
print('Length:', second_rectangle.length)
print('Width:', second_rectangle.width)
print('Area:', second_rectangle.area())
print('Perimeter:', second_rectangle.perimeter())
the output would be
Information about the second rectangle
Length: 5
Width: 7
Area: 35
Perimeter: 24
Inheritance¶
A square is a special case of a rectangle, i.e., a rectangle with equal sides. As the computation of the geometrical properties remains the same, one option of initializing a square could be
square = Rectangle(length=2, width=2)
But maybe you want to be more explicit when initializing squares. This is where
inheritance kicks in. Let us take a look at the definition of a Square
class that inherits from the previously defined Rectangle
class:
class Square(Rectangle):
"""A square that is described by its side length."""
def __init__(self, side_length):
super().__init__(length=side_length, width=side_length)
As opposed to the Rectangle
class the name of the Square
class is
followed by parenthesis containing Rectangle
. This tells Python that the
Rectangle
class is the superclass of Square
, i.e., Square
inherits from Rectangle
. To inherit means that, if not otherwise defined,
Square
has the exact same method definitions as its superclass
Rectangle
.
But as the __init__
method of Rectangle
takes the arguments length
and width
, which is not required for the definition of a square, we can
simplify it. Now it takes only one argument side_length
. If we stored it as
we did in the Rectangle
class, i.e., as
def __init__(self, side_length):
self.side_length = side_length
The methods that Square
inherits from Rectangle
would fail to be
callable, as they access the attributes length
and width
, which would
not be defined if we took this definition of the __init__
method. Instead
we could do this:
def __init__(self, side_length):
self.length = side_length
self.width = side_length
So now the definition looks awfully similar to the one of the Rectangle
class. A bit too similar maybe, and we do not want to repeat ourselves.
Another side-effect is that, should the __init__
method of the
Rectangle
implement some more code, it would have to be copied to
Square
as well. As this is error-prone there is a way to leverage the
method of a superclass within the child class, and this is done using the
super()
function. If used within a method definition followed by calling
a method it will resolve to the first parent class that implements a method
with this name and call it for the current object. So by implementing it via
def __init__(self, side_length):
super().__init__(length=side_length, width=side_length)
we tell Python to call the __init__
method of the superclass of Square
and pass the side_length for the length
and width
.
Using the class can now be done like this:
first_square = Square(side_length=2)
print('Information about the first square')
print('Length:', first_square.length)
print('Width:', first_square.width)
print('Area:', first_square.area())
print('Perimeter:', first_square.perimeter())
The respective output:
Information about the first square
Length: 2
Width: 2
Area: 4
Perimeter: 8
Type checking¶
Checking whether an object is of a certain type, or is a child of a certain
type, is done by using the isinstance()
function. The first argument is
the object whose type should be checked, the second argument is the class for
which to check.
def what_is_it(object):
if isinstance(object, Rectangle):
print('It is a rectangle.')
if isinstance(object, Square):
print('It is a square.')
If we use this simple function on our geometrical objects you can see how it works.
what_is_it(first_rectangle)
As first_rectangle
is a Rectangle
, but not a Square
, the output is:
It is a rectangle.
Now for the first_square
:
what_is_it(first_square)
As Square
is a specialization of Rectangle
you can also see that it
identifies as such in addition to being a Square
.
It is a rectangle.
It is a square.
Special methods¶
Some behavior of Python classes is implemented in terms of so-called Special method names. An example of how they may be used can be seen in the following:
import math
class FreeVector:
"""A vector that is not bound by an initial or terminal point."""
def __init__(self, vector):
self.vector = tuple(vector)
@property
def magnitude(self):
return math.sqrt(math.fsum(v**2 for v in self.vector))
@property
def direction(self):
magnitude = self.magnitude
return tuple(v/magnitude for v in self.vector)
def __repr__(self):
return '{self.__class__.__name__}(vector={self.vector!r})'.format(
self=self)
def __str__(self):
return str(self.vector)
def __eq__(self, other):
if (isinstance(other, FreeVector) and
all(math.isclose(a, b) for a, b in zip(
other.vector, self.vector))):
return True
else:
return False
def __neq__(self, other):
return not self.__eq__(self, other)
def __add__(self, other):
if not isinstance(other, FreeVector):
return NotImplemented
return tuple(a + b for a, b in zip(self.vector, other.vector))
def __sub__(self, other):
if not isinstance(other, FreeVector):
return NotImplemented
return tuple(a - b for a, b in zip(self.vector, other.vector))
The usage may be as follows:
>>> a = FreeVector((1, 2, 3))
>>> a
FreeVector(vector=(1, 2, 3))
>>> str(a)
'(1, 2, 3)'
>>> b = FreeVector((1, 2, 3))
>>> c = FreeVector((4, 5, 6))
>>> a == b
True
>>> a == c
False
>>> a + c
(5, 7, 9)
>>> c - a
(3, 3, 3)
Exercises¶
- Copy the definition of the
Rectangle
class and extend it by adding a methodaspect_ratio
which returns the ratio of its length to its width. - Define a
Circle
class with the radius \(r\) as defining attribute. Implement thearea
andperimeter
class accordingly. - Read and work on the book “Building Skills in Object-Oriented Design” to understand the process of object-oriented design.
NumPy¶
NumPy is a third-party package for Python that provides utilities for numerical programming with Python. It is commonly imported via
import numpy as np
which maps the NumPy package to the name np
. So every time you see some
code with the np
prefix you may assume that NumPy is imported even though
it might not be stated explicitly.
Array¶
The NumPy array
is the underlying mechanism that makes
NumPy so convenient and efficient.
Creating arrays¶
A NumPy array
is easily initialized via
>>> np.array([0, 1, 2]) # 1D array of integers
array([0, 1, 2])
>>> np.array([0.0, 1.0, 2.0]) # 1D array of floats
array([ 0., 1., 2.])
>>> np.array([[0, 1], [2, 3]]) # 2D array of integers
array([[0, 1],
[2, 3]])
So by nesting lists of numbers you are able to construct multi-dimensional
arrays. But there are also other methods to initialize
arrays
:
-
>>> np.zeros(3) array([ 0., 0., 0.]) >>> np.zeros([2, 3]) array([[ 0., 0., 0.], [ 0., 0., 0.]])
-
>>> np.ones(3) array([ 1., 1., 1.]) >>> 3 * np.ones(3) array([ 3., 3., 3.]) >>> np.ones([2, 3]) array([[ 1., 1., 1.], [ 1., 1., 1.]])
-
>>> np.eye(2) array([[ 1., 0.], [ 0., 1.]]) >>> -2 * np.eye(2) array([[-2., -0.], [-0., -2.]])
numpy.arange()
: This one should feel rather familiar to therange
of regular Python. But where the latter is only able to deal with integers, the implementation of NumPy can also work with floats.>>> np.arange(5) array([0, 1, 2, 3, 4]) >>> np.arange(3, 7) array([3, 4, 5, 6]) >>> np.arange(2, 4, 0.5) array([ 2. , 2.5, 3. , 3.5])
numpy.linspace()
: If you need evenly spaced samples this way of initializing them is preferred tonumpy.arange()
due to the latter introducing slight roundoff errors which is caused by the implementation.>>> np.linspace(0, 4, 5) array([ 0., 1., 2., 3., 4.]) >>> np.linspace(0.3, 0.7, 5) array([ 0.3, 0.4, 0.5, 0.6, 0.7]) >>> np.linspace(0.3, 0.7, 4, endpoint=False) array([ 0.3, 0.4, 0.5, 0.6])
Array attributes¶
The arrays
also provide some information about
themselves which can be accessed by its attributes.
Number of dimensions¶
The ndim
attribute of an array is the amount of
dimensions of the array.
>>> x = np.array([0.0, 1.0, 2.0])
>>> x.ndim
1
>>> x = np.array([[0, 1, 2], [3, 4, 5]])
>>> x.ndim
2
Accessing data¶
Similarly to lists
and tuples
data is accessed
via referring to the indices:
>>> x = np.array([[0, 1, 2],
... [3, 4, 5]])
>>> x[0, 0]
0
>>> x[0, 1]
1
>>> x[1, 0]
3
>>> x[1, 2]
5
Slicing¶
Slicing refers to extracting partial data from arrays. This is very efficient as nothing is copied in memory.
>>> x = np.array([[0, 1, 2, 3, 4],
... [5, 6, 7, 8, 9],
... [10, 11, 12, 13, 14],
... [15, 16, 17, 18, 19],
... [20, 21, 22, 23, 24]])
>>> x[0, :]
array([0, 1, 2, 3, 4])
>>> x[1, :]
array([5, 6, 7, 8, 9])
>>> x[:, 1]
array([ 1, 6, 11, 16, 21])
>>> x[:3, :3]
array([[ 0, 1, 2],
[ 5, 6, 7],
[10, 11, 12]])
>>> x[2:, 2:]
array([[12, 13, 14],
[17, 18, 19],
[22, 23, 24]])
Note
Slices of an array share the memory of the original array. Hence all the changes you do to a slice are also represented in the original array:
>>> x = np.array([[0, 1, 2, 3, 4],
... [5, 6, 7, 8, 9],
... [10, 11, 12, 13, 14],
... [15, 16, 17, 18, 19],
... [20, 21, 22, 23, 24]])
>>> print(x)
[[ 0 1 2 3 4]
[ 5 6 7 8 9]
[10 11 12 13 14]
[15 16 17 18 19]
[20 21 22 23 24]]
>>> x_slice = x[1:-1, 1:-1]
>>> print(x_slice)
[[ 6 7 8]
[11 12 13]
[16 17 18]]
>>> x_slice[1, 1] = 888
>>> print(x_slice)
[[ 6 7 8]
[ 11 888 13]
[ 16 17 18]]
>>> print(x)
[[ 0 1 2 3 4]
[ 5 6 7 8 9]
[ 10 11 888 13 14]
[ 15 16 17 18 19]
[ 20 21 22 23 24]]
Math¶
The goal of NumPy is to provide functions and classes that make numerical
computations easier. For an extensive overview see the section
Linear algebra (numpy.linalg). of the NumPy documentation. Some of them require the
part linalg
after numpy
. In combination with
import numpy as np
the functions and classes may be used via
np.linalg.<function>
The functions that require this import are indicated by this usage.
Componentwise math operations¶
NumPy arrays
support the typical math operations
introduced in Basics and they are executed componentwise. Here are
a few examples:
>>> x = np.array([1, 2, 3, 4, 5])
>>> y = np.array([6, 5, 8, 9, 10])
>>> z = 2.0
>>> x + z
array([ 3., 4., 5., 6., 7.])
>>> x + y
array([ 7, 7, 11, 13, 15])
>>> y % z
array([ 0., 1., 0., 1., 0.])
>>> y / x
array([ 6. , 2.5 , 2.66666667, 2.25 , 2. ])
Dot¶
If you need a scalar product for 1D arrays
or matrix
multiplication for 2D arrays
the function
numpy.dot()
is the function of choice:
>>> x = np.array([1, 2, 3, 4, 5])
>>> y = np.array([6, 5, 8, 9, 10])
>>> np.dot(x, y)
126
>>> x.dot(y) # alternative syntax
126
>>> x = np.array([[1, 2, 3],
... [4, 5, 6],
... [7, 8, 9]])
>>> y = np.array([[10, 11, 12],
... [13, 14, 15],
... [16, 17, 18]])
>>> z = np.array([[1/19, 1/20, 1/21],
... [1/22, 1/23, 1/24],
... [1/25, 1/26, 1/27]])
>>> np.dot(x, np.dot(y, z)) # this is a bit convoluted
array([[ 12.35196172, 11.80535117, 11.30555556],
[ 29.63712919, 28.32591973, 27.12698413],
[ 46.92229665, 44.84648829, 42.9484127 ]])
>>> x.dot(y).dot(z) # better
array([[ 12.35196172, 11.80535117, 11.30555556],
[ 29.63712919, 28.32591973, 27.12698413],
[ 46.92229665, 44.84648829, 42.9484127 ]])
Eigenvalues¶
The function numpy.linalg.eig()
may be used to compute the eigenvalues
and eigenvectors of a square array:
>>> x = np.array([[1, 2, 3],
... [4, 5, 6],
... [7, 8, 9]])
>>> np.linalg.eig(x)
(array([ 1.61168440e+01, -1.11684397e+00, -1.30367773e-15]), array([[-0.23197069, -0.78583024, 0.40824829],
[-0.52532209, -0.08675134, -0.81649658],
[-0.8186735 , 0.61232756, 0.40824829]]))
The output is a bit messy, but ss you can read in the documentation of
numpy.linalg.eig()
the function actually returns two things: an array
holding the eigenvalues and an array holding the eigenvectors as columns:
>>> eigenvalues, eigenvectors = np.linalg.eig(x)
>>> for i in range(eigenvalues.size):
... print(eigenvalues[i], eigenvectors[:, i])
...
16.1168439698 [-0.23197069 -0.52532209 -0.8186735 ]
-1.11684396981 [-0.78583024 -0.08675134 0.61232756]
-1.30367772647e-15 [ 0.40824829 -0.81649658 0.40824829]
Norm¶
Computing the norm of a vector is a fairly common task, so it seems obvious
that NumPy also provides a function for this: numpy.linalg.norm()
.
>>> x = np.array([3, 4])
>>> np.linalg.norm(x)
5.0
Besided the regularly used \(\ell^2\)-norm this function also offers other
norms. For further information use the documentation of
numpy.linalg.norm()
.
Determinant¶
Knowing the determinant of an array may tell you whether it is singular or, in the case of a \(3x3\) array, tell you the volume of the rhomboid that is spanned by the vectors composing the array.
>>> x = np.array([[1, 2, 3],
... [4, 5, 4],
... [3, 2, 1]])
>>> np.linalg.det(x)
-7.9999999999999982
This should be \(-8\) so it is an interesting way to see that the determinant computed by NumPy is computed numerically and not analytically.
Exercises¶
Gaussian Elimination (Eye Variant)¶
Solving systems of linear equations is one of the basic tasks in numerical mathematics—hence it is also one of the basic tasks in computational materials science. A system of linear equations like
may also be described via
where
The solution algorithm makes use of the augmented matrix form
The procedure is then to first get this augmented matrix form into triangle form and subsequently form the identity matrix on the left hand side. The right hand side then is the solution. For further information see Gaussian elimination
Start with the following template, complete it, and test it:
def gaussian_elimination_eye(A, b):
Matplotlib¶
“The purpose of visualization is insight, not pictures.”
—Ben Shneiderman
Visualization is one of the most important means for the interpretation of the results of scientific computations. The package that is mostly used for this task is Matplotlib.
The matplotlib
package is typically imported via
import matplotlib.pyplot as plt
And subsequently all plotting commands are called using the prefix plt
.
Rather than exercises that want you to get some results you are shown simple examples for the most common types of plots.
Note
Producing scientific plots that manage to support your chain of arguments in a presentation or publication can not be overestimated. To a certain degree it could be considered an art, but there are some best practices, e.g., for picking colormaps.
Line plots¶
The most common plot you see in scientific publications may well be a so-called
line plot. The command for line plots is plot()
and
using it like this
import matplotlib.pyplot as plt
import numpy as np
# Generate data for the plot
x = np.linspace(0, 1, 21)
y0 = np.sin(2*np.pi*x)
y1 = np.cos(2*np.pi*x)
# Generate the plot
fig, ax = plt.subplots()
ax.plot(x, y0)
ax.plot(x, y1)
plt.show(fig)
results in a line plot with both the sine and cosine:

Scatter plots¶
When your data does not have a sequence but is still characterized by data
points a so-called scatter plot is the plot of choice. The command is
scatter()
and when used like this
import matplotlib.pyplot as plt
import numpy as np
# Generate data for the plot
r = np.random.RandomState(42)
x = r.random_sample(10)
y0 = r.random_sample(10)
y1 = r.random_sample(10)
# Generate the plot
fig, ax = plt.subplots()
ax.scatter(x, y0)
ax.scatter(x, y1)
plt.show(fig)
it results in a plot showing the different data point groups:

Pcolormesh plots¶
import matplotlib.pyplot as plt
import numpy as np
from scipy.ndimage.filters import gaussian_filter
# Generate data for the plot
x = np.linspace(0, 1, 51)
y = np.linspace(0, 1, 51)
r = np.random.RandomState(42)
z = gaussian_filter(r.random_sample([50, 50]), sigma=5, mode='wrap')
z -= np.min(z)
z /= np.max(z)
# Generate the plot
fig, ax = plt.subplots()
cmap = ax.pcolormesh(x, y, z)
fig.colorbar(cmap)
plt.show(fig)

Contour plots¶
import matplotlib.pyplot as plt
import numpy as np
from scipy.ndimage.filters import gaussian_filter
# Generate data for the plot
x = np.linspace(0, 1, 50)
y = np.linspace(0, 1, 50)
r = np.random.RandomState(42)
z = gaussian_filter(r.random_sample([50, 50]), sigma=5, mode='wrap')
z -= np.min(z)
z /= np.max(z)
# Generate the plot
fig, ax = plt.subplots()
cmap = ax.contour(x, y, z)
fig.colorbar(cmap)
plt.show(fig)

Contourf plots¶
import matplotlib.pyplot as plt
import numpy as np
from scipy.ndimage.filters import gaussian_filter
# Generate data for the plot
x = np.linspace(0, 1, 50)
y = np.linspace(0, 1, 50)
r = np.random.RandomState(42)
z = gaussian_filter(r.random_sample([50, 50]), sigma=5, mode='wrap')
z -= np.min(z)
z /= np.max(z)
# Generate the plot
fig, ax = plt.subplots()
cmap = plt.contourf(x, y, z)
fig.colorbar(cmap)
plt.show(fig)

SciPy¶
SciPy is a collection of modules that offer high-level interfaces to algorithms that are common in the scientific community, like linear algebra, optimization, integration, interpolation, etc. Covering all the possibilities in a tutorial is almost impossible, so to see what SciPy can do refer to its documentation.
Fitting curves¶
The routine used for fitting curves is part of the scipy.optimize
module
and is called scipy.optimize.curve_fit()
. So first said module has to be
imported.
>>> import scipy.optimize
The function that you want to fit to your data has to be defined with the x
values as first argument and all parameters as subsequent arguments.
>>> def parabola(x, a, b, c):
... return a*x**2 + b*x + c
...
For test purposes the data is generated using said function with known parameters.
>>> params = [-0.1, 0.5, 1.2]
>>> x = np.linspace(-5, 5, 31)
>>> y = parabola(x, params[0], params[1], params[2])
>>> plt.plot(x, y, label='analytical')
[<matplotlib.lines.Line2D object at ...>]
>>> plt.legend(loc='lower right')
<matplotlib.legend.Legend object at ...>
>>> plt.show()

Then some small deviations are introduced as fitting a function to data generated by itself is kind of boring.
>>> r = np.random.RandomState(42)
>>> y_with_errors = y + r.uniform(-1, 1, y.size)
>>> plt.plot(x, y_with_errors, label='sample')
[<matplotlib.lines.Line2D object at ...>]
>>> plt.legend(loc='lower right')
<matplotlib.legend.Legend object at ...>
>>> plt.show()

Now the fitting routine can be called.
>>> fit_params, pcov = scipy.optimize.curve_fit(parabola, x, y_with_errors)
It returns two results, the parameters that resulted from the fit as well as the covariance matrix which may be used to compute some form of quality scale for the fit. The actual data for the fit may be compared to the real parameters:
>>> for param, fit_param in zip(params, fit_params):
... print(param, fit_param)
...
-0.1 -0.0906682795944
0.5 0.472361903203
1.2 1.00514576223
>>> y_fit = parabola(x, *fit_params)
>>> plt.plot(x, y_fit, label='fit')
[<matplotlib.lines.Line2D object at ...>]
>>> plt.legend(loc='lower right')
<matplotlib.legend.Legend object at ...>
>>> plt.show()

As you can see the fit is rather off for the third parameter c
. A look at
the covariance matrix also shows this:
>>> print(pcov)
[[ 1.64209005e-04 1.75357845e-12 -1.45963560e-03]
[ 1.75357845e-12 1.16405938e-03 -1.73642112e-11]
[ -1.45963560e-03 -1.73642112e-11 2.33217333e-02]]
But to get a more meaningful scale for the quality of the fit the documentation recommends the following:
>>> print(np.sqrt(np.diag(pcov)))
[ 0.01281441 0.03411831 0.15271455]
So the fit for a
is quite well whereas the quality c
is the worst of
all the parameters.
Note
This routine works by iteratively varying the parameters and checking
whether the fit got better or worse. To help the routine find the best fit
it is hence a good idea to give it a good starting point. this can be done
using the p0
argument of curve_fit()
. In some
cases this is even necessary.
Exercises¶
Fitting a frequency measurement¶
Use the data provided here
and compute
the amplitude \(A\), the frequency \(f\), and the phase offset
\(\varphi\) of the measurement.
Recommended steps
Read the data from the file.
Plot the data to get a feeling for it.
Think of a function that may be well suited for your data, and define it. In this case
\[f(t) = A \sin (2 \pi f \cdot t - \varphi) + b\]Try to fit it to the data. Are there any problems? How could you solve them?
Find a good initial guess by taking the data into account. It may be useful to smoothen the data first to reduce the noise. The
scipy.signal
module provides a variety of filters to do this. Use the smoothed data to get estimates for- the amplitude \(A\)
- the frequency \(f\)
- the phase offset \(\varphi\)
- the amplitude offset \(b\)
Use the initial guess to fit the function to the raw data again.
Store the results in a file.
File formats¶
Ideally, a file format is
- self-documented, i.e., it offers metadata that allows the user to infer what data is contained within the file.
- easily readable by a multitude of programming languages, as not everyone we collaborate with shares our affection to Python.
- dense, i.e., it should not take up a lot of space.
- fault tolerant, i.e., in case of a faulty switched bit on the hard-drive the data should be recoverable.
- flexible in the shape of the data that is to be stored.
In the following a selection of file formats is introduced.
Comma-separated values¶
Filename extension: | |
---|---|
.csv |
We may know comma-separated values (CSV) file mostly by working with people who use spreadsheet software like Microsoft Excel. As the native file format of such software might be cumbersome to read, we kindly ask for a CSV export of those spreadsheets.
Typically a CSV file is a plain text format which contains one record per line with several delimiter separated fields. The sequence of the fields is the same for every records.
Note that CSV files are not standardized, and as such a lot of different
flavors exist. The delimiter, for example, does not necessary have to be a
comma, it could also be a simple space. If the delimiter is part of a field
value, for example, in a string, an escape character is required, which depends
on the flavor of the CSV implementation. Comments are also treated differently
in each implementation, with some allowing comments in the header via #
,
and others not supporting them at all
Hence, CSV files may be used for very simple files, but may proof unsuitable for complex data.
Let us take a look at a CSV file that is the result of an atomistic simulation which is stored in a file:
with open('md_result.csv', 'r') as f:
print(f.read())
# Atom types:
# - He: Helium
# - Ar: Argon
# Potentials:
# - He: Lennard-Jones potential with epsilon/k_B = 10.22 K, sigma = 256 pm
# - Ar: Lennard-Jones potential with epsilon/k_B = 120 K, sigma = 341 pm
# Simulation box size: 100 µm x 200 µm x 300 µm
# Periodic boundary conditions in all directions
# Step: 0
# Time: 0 s
# All quantities are given in SI units
atom_id type x-position y-position z-position x-velocity y-velocity z-velocity
0 He 5.7222e-07 4.8811e-09 2.0415e-07 -2.9245e+01 1.0045e+02 1.2828e+02
1 He 9.7710e-07 3.6371e-07 4.7311e-07 -1.9926e+02 2.3275e+02 -5.3438e+02
2 Ar 6.4989e-07 6.7873e-07 9.5000e-07 -1.5592e+00 -3.7876e+02 8.4091e+01
3 Ar 5.9024e-08 3.7138e-07 7.3455e-08 3.4282e+02 1.5682e+02 -3.8991e+01
4 He 7.6746e-07 8.3017e-08 4.8520e-07 -3.0450e+01 -3.7975e+02 -3.3632e+02
5 Ar 1.7226e-07 4.6023e-07 4.7356e-08 -3.1151e+02 -4.2939e+02 -6.9474e+02
6 Ar 9.6394e-07 7.2845e-07 8.8623e-07 -8.2636e+01 4.5098e+01 -1.0626e+01
7 He 5.4450e-07 4.6373e-07 6.2270e-07 1.5889e+02 2.5858e+02 -1.5150e+02
8 He 7.9322e-07 9.4700e-07 3.5194e-08 -1.9703e+02 1.5674e+02 -1.8520e+02
9 Ar 2.7797e-07 1.6487e-07 8.2403e-07 -3.8650e+01 -6.9632e+02 2.1642e+02
10 He 1.1842e-07 6.3244e-07 5.0958e-07 -1.4963e+02 4.2288e+02 -7.6309e+01
11 Ar 2.0359e-07 8.3369e-07 9.6348e-07 4.8457e+02 -2.6741e+02 -3.5254e+02
12 He 5.1019e-07 2.2470e-07 2.3846e-08 -2.3192e+02 -9.9510e+01 3.2770e+01
13 Ar 3.5383e-07 8.4581e-07 7.2340e-07 -3.0395e+02 4.7316e+01 2.2253e+02
14 He 3.8515e-07 2.8940e-07 5.6028e-07 2.3308e+02 2.5418e+02 4.2983e+02
15 He 1.5842e-07 9.8225e-07 5.7859e-07 1.9963e+02 2.0311e+02 -4.2560e+02
16 He 3.6831e-07 7.6520e-07 2.9884e-07 6.6341e+01 2.2232e+02 -9.7653e+01
17 He 2.8696e-07 1.5129e-07 6.4060e-07 9.0358e+01 -6.7459e+01 -6.4782e+01
18 He 1.0325e-07 9.9012e-07 3.4381e-07 7.1108e+01 1.1060e+01 1.5912e+01
19 Ar 4.3929e-07 7.5363e-07 9.9974e-07 2.3919e+02 1.7383e+02 3.3529e+02
Note that the first row that is not a comment holds the field names. This will
be important for later. Using the csv
from the Python standard library
we can read it in nicely:
import csv
number_of_rows_to_skip = 12
with open('md_result.csv', 'r', newline='') as f:
# skip the first rows
for _ in range(number_of_rows_to_skip):
next(f)
csv_reader = csv.reader(f, delimiter=' ')
for row in csv_reader:
print(row)
Which then results in the following output:
['0', 'He', '5.7222e-07', '4.8811e-09', '2.0415e-07', '-2.9245e+01', '1.0045e+02', '1.2828e+02']
['1', 'He', '9.7710e-07', '3.6371e-07', '4.7311e-07', '-1.9926e+02', '2.3275e+02', '-5.3438e+02']
['2', 'Ar', '6.4989e-07', '6.7873e-07', '9.5000e-07', '-1.5592e+00', '-3.7876e+02', '8.4091e+01']
['3', 'Ar', '5.9024e-08', '3.7138e-07', '7.3455e-08', '3.4282e+02', '1.5682e+02', '-3.8991e+01']
['4', 'He', '7.6746e-07', '8.3017e-08', '4.8520e-07', '-3.0450e+01', '-3.7975e+02', '-3.3632e+02']
['5', 'Ar', '1.7226e-07', '4.6023e-07', '4.7356e-08', '-3.1151e+02', '-4.2939e+02', '-6.9474e+02']
['6', 'Ar', '9.6394e-07', '7.2845e-07', '8.8623e-07', '-8.2636e+01', '4.5098e+01', '-1.0626e+01']
['7', 'He', '5.4450e-07', '4.6373e-07', '6.2270e-07', '1.5889e+02', '2.5858e+02', '-1.5150e+02']
['8', 'He', '7.9322e-07', '9.4700e-07', '3.5194e-08', '-1.9703e+02', '1.5674e+02', '-1.8520e+02']
['9', 'Ar', '2.7797e-07', '1.6487e-07', '8.2403e-07', '-3.8650e+01', '-6.9632e+02', '2.1642e+02']
['10', 'He', '1.1842e-07', '6.3244e-07', '5.0958e-07', '-1.4963e+02', '4.2288e+02', '-7.6309e+01']
['11', 'Ar', '2.0359e-07', '8.3369e-07', '9.6348e-07', '4.8457e+02', '-2.6741e+02', '-3.5254e+02']
['12', 'He', '5.1019e-07', '2.2470e-07', '2.3846e-08', '-2.3192e+02', '-9.9510e+01', '3.2770e+01']
['13', 'Ar', '3.5383e-07', '8.4581e-07', '7.2340e-07', '-3.0395e+02', '4.7316e+01', '2.2253e+02']
['14', 'He', '3.8515e-07', '2.8940e-07', '5.6028e-07', '2.3308e+02', '2.5418e+02', '4.2983e+02']
['15', 'He', '1.5842e-07', '9.8225e-07', '5.7859e-07', '1.9963e+02', '2.0311e+02', '-4.2560e+02']
['16', 'He', '3.6831e-07', '7.6520e-07', '2.9884e-07', '6.6341e+01', '2.2232e+02', '-9.7653e+01']
['17', 'He', '2.8696e-07', '1.5129e-07', '6.4060e-07', '9.0358e+01', '-6.7459e+01', '-6.4782e+01']
['18', 'He', '1.0325e-07', '9.9012e-07', '3.4381e-07', '7.1108e+01', '1.1060e+01', '1.5912e+01']
['19', 'Ar', '4.3929e-07', '7.5363e-07', '9.9974e-07', '2.3919e+02', '1.7383e+02', '3.3529e+02']
But as you can see all the numbers are read in as strings. This is due to CSV files not preserving the type information. A quick hack might be the following:
import csv
number_of_rows_to_skip = 12
possible_types = (int, float, str)
with open('md_result.csv', 'r', newline='') as f:
# skip the first rows
for _ in range(number_of_rows_to_skip):
next(f)
csv_reader = csv.reader(f, delimiter=' ')
for row in csv_reader:
for i, entry in enumerate(row):
for possible_type in possible_types:
try:
entry = possible_type(entry)
except ValueError:
continue
except:
raise
else:
row[i] = entry
break
print(row)
Here we define an order of types to check for, in this example we first check whether the entry can be cast to an integer, then to a float, and then to a string. If a casting operation succeeds, we set the entry of the row to the new value and exit the loop that checks for the types. Now the output is closer to what we would like.
[0, 'He', 5.7222e-07, 4.8811e-09, 2.0415e-07, -29.245, 100.45, 128.28]
[1, 'He', 9.771e-07, 3.6371e-07, 4.7311e-07, -199.26, 232.75, -534.38]
[2, 'Ar', 6.4989e-07, 6.7873e-07, 9.5e-07, -1.5592, -378.76, 84.091]
[3, 'Ar', 5.9024e-08, 3.7138e-07, 7.3455e-08, 342.82, 156.82, -38.991]
[4, 'He', 7.6746e-07, 8.3017e-08, 4.852e-07, -30.45, -379.75, -336.32]
[5, 'Ar', 1.7226e-07, 4.6023e-07, 4.7356e-08, -311.51, -429.39, -694.74]
[6, 'Ar', 9.6394e-07, 7.2845e-07, 8.8623e-07, -82.636, 45.098, -10.626]
[7, 'He', 5.445e-07, 4.6373e-07, 6.227e-07, 158.89, 258.58, -151.5]
[8, 'He', 7.9322e-07, 9.47e-07, 3.5194e-08, -197.03, 156.74, -185.2]
[9, 'Ar', 2.7797e-07, 1.6487e-07, 8.2403e-07, -38.65, -696.32, 216.42]
[10, 'He', 1.1842e-07, 6.3244e-07, 5.0958e-07, -149.63, 422.88, -76.309]
[11, 'Ar', 2.0359e-07, 8.3369e-07, 9.6348e-07, 484.57, -267.41, -352.54]
[12, 'He', 5.1019e-07, 2.247e-07, 2.3846e-08, -231.92, -99.51, 32.77]
[13, 'Ar', 3.5383e-07, 8.4581e-07, 7.234e-07, -303.95, 47.316, 222.53]
[14, 'He', 3.8515e-07, 2.894e-07, 5.6028e-07, 233.08, 254.18, 429.83]
[15, 'He', 1.5842e-07, 9.8225e-07, 5.7859e-07, 199.63, 203.11, -425.6]
[16, 'He', 3.6831e-07, 7.652e-07, 2.9884e-07, 66.341, 222.32, -97.653]
[17, 'He', 2.8696e-07, 1.5129e-07, 6.406e-07, 90.358, -67.459, -64.782]
[18, 'He', 1.0325e-07, 9.9012e-07, 3.4381e-07, 71.108, 11.06, 15.912]
[19, 'Ar', 4.3929e-07, 7.5363e-07, 9.9974e-07, 239.19, 173.83, 335.29]
But programming with this still requires you to know exactly which field number
corresponds to which entry. And maybe your format may differ from file to file,
so that your hardcoded indices lead to wrong results. It would be better if we
could somehow access the fields by names, e.g., row['id']
to get the id of
the record. This is where csv.DictReader
comes in.
>>> import csv
>>> number_of_rows_to_skip = 11
>>> with open('md_result.csv', 'r', newline='') as f:
... # skip the first rows
... for _ in range(number_of_rows_to_skip):
... next(f)
...
... csv_reader = csv.DictReader(f, delimiter=' ')
... for row in csv_reader:
... print(row)
...
OrderedDict([('atom_id', '0'), ('type', 'He'), ('x-position', '5.7222e-07'), ('y-position', '4.8811e-09'), ('z-position', '2.0415e-07'), ('x-velocity', '-2.9245e+01'), ('y-velocity', '1.0045e+02'), ('z-velocity', '1.2828e+02')])
OrderedDict([('atom_id', '1'), ('type', 'He'), ('x-position', '9.7710e-07'), ('y-position', '3.6371e-07'), ('z-position', '4.7311e-07'), ('x-velocity', '-1.9926e+02'), ('y-velocity', '2.3275e+02'), ('z-velocity', '-5.3438e+02')])
OrderedDict([('atom_id', '2'), ('type', 'Ar'), ('x-position', '6.4989e-07'), ('y-position', '6.7873e-07'), ('z-position', '9.5000e-07'), ('x-velocity', '-1.5592e+00'), ('y-velocity', '-3.7876e+02'), ('z-velocity', '8.4091e+01')])
OrderedDict([('atom_id', '3'), ('type', 'Ar'), ('x-position', '5.9024e-08'), ('y-position', '3.7138e-07'), ('z-position', '7.3455e-08'), ('x-velocity', '3.4282e+02'), ('y-velocity', '1.5682e+02'), ('z-velocity', '-3.8991e+01')])
OrderedDict([('atom_id', '4'), ('type', 'He'), ('x-position', '7.6746e-07'), ('y-position', '8.3017e-08'), ('z-position', '4.8520e-07'), ('x-velocity', '-3.0450e+01'), ('y-velocity', '-3.7975e+02'), ('z-velocity', '-3.3632e+02')])
OrderedDict([('atom_id', '5'), ('type', 'Ar'), ('x-position', '1.7226e-07'), ('y-position', '4.6023e-07'), ('z-position', '4.7356e-08'), ('x-velocity', '-3.1151e+02'), ('y-velocity', '-4.2939e+02'), ('z-velocity', '-6.9474e+02')])
OrderedDict([('atom_id', '6'), ('type', 'Ar'), ('x-position', '9.6394e-07'), ('y-position', '7.2845e-07'), ('z-position', '8.8623e-07'), ('x-velocity', '-8.2636e+01'), ('y-velocity', '4.5098e+01'), ('z-velocity', '-1.0626e+01')])
OrderedDict([('atom_id', '7'), ('type', 'He'), ('x-position', '5.4450e-07'), ('y-position', '4.6373e-07'), ('z-position', '6.2270e-07'), ('x-velocity', '1.5889e+02'), ('y-velocity', '2.5858e+02'), ('z-velocity', '-1.5150e+02')])
OrderedDict([('atom_id', '8'), ('type', 'He'), ('x-position', '7.9322e-07'), ('y-position', '9.4700e-07'), ('z-position', '3.5194e-08'), ('x-velocity', '-1.9703e+02'), ('y-velocity', '1.5674e+02'), ('z-velocity', '-1.8520e+02')])
OrderedDict([('atom_id', '9'), ('type', 'Ar'), ('x-position', '2.7797e-07'), ('y-position', '1.6487e-07'), ('z-position', '8.2403e-07'), ('x-velocity', '-3.8650e+01'), ('y-velocity', '-6.9632e+02'), ('z-velocity', '2.1642e+02')])
OrderedDict([('atom_id', '10'), ('type', 'He'), ('x-position', '1.1842e-07'), ('y-position', '6.3244e-07'), ('z-position', '5.0958e-07'), ('x-velocity', '-1.4963e+02'), ('y-velocity', '4.2288e+02'), ('z-velocity', '-7.6309e+01')])
OrderedDict([('atom_id', '11'), ('type', 'Ar'), ('x-position', '2.0359e-07'), ('y-position', '8.3369e-07'), ('z-position', '9.6348e-07'), ('x-velocity', '4.8457e+02'), ('y-velocity', '-2.6741e+02'), ('z-velocity', '-3.5254e+02')])
OrderedDict([('atom_id', '12'), ('type', 'He'), ('x-position', '5.1019e-07'), ('y-position', '2.2470e-07'), ('z-position', '2.3846e-08'), ('x-velocity', '-2.3192e+02'), ('y-velocity', '-9.9510e+01'), ('z-velocity', '3.2770e+01')])
OrderedDict([('atom_id', '13'), ('type', 'Ar'), ('x-position', '3.5383e-07'), ('y-position', '8.4581e-07'), ('z-position', '7.2340e-07'), ('x-velocity', '-3.0395e+02'), ('y-velocity', '4.7316e+01'), ('z-velocity', '2.2253e+02')])
OrderedDict([('atom_id', '14'), ('type', 'He'), ('x-position', '3.8515e-07'), ('y-position', '2.8940e-07'), ('z-position', '5.6028e-07'), ('x-velocity', '2.3308e+02'), ('y-velocity', '2.5418e+02'), ('z-velocity', '4.2983e+02')])
OrderedDict([('atom_id', '15'), ('type', 'He'), ('x-position', '1.5842e-07'), ('y-position', '9.8225e-07'), ('z-position', '5.7859e-07'), ('x-velocity', '1.9963e+02'), ('y-velocity', '2.0311e+02'), ('z-velocity', '-4.2560e+02')])
OrderedDict([('atom_id', '16'), ('type', 'He'), ('x-position', '3.6831e-07'), ('y-position', '7.6520e-07'), ('z-position', '2.9884e-07'), ('x-velocity', '6.6341e+01'), ('y-velocity', '2.2232e+02'), ('z-velocity', '-9.7653e+01')])
OrderedDict([('atom_id', '17'), ('type', 'He'), ('x-position', '2.8696e-07'), ('y-position', '1.5129e-07'), ('z-position', '6.4060e-07'), ('x-velocity', '9.0358e+01'), ('y-velocity', '-6.7459e+01'), ('z-velocity', '-6.4782e+01')])
OrderedDict([('atom_id', '18'), ('type', 'He'), ('x-position', '1.0325e-07'), ('y-position', '9.9012e-07'), ('z-position', '3.4381e-07'), ('x-velocity', '7.1108e+01'), ('y-velocity', '1.1060e+01'), ('z-velocity', '1.5912e+01')])
OrderedDict([('atom_id', '19'), ('type', 'Ar'), ('x-position', '4.3929e-07'), ('y-position', '7.5363e-07'), ('z-position', '9.9974e-07'), ('x-velocity', '2.3919e+02'), ('y-velocity', '1.7383e+02'), ('z-velocity', '3.3529e+02')])
Note
If you are not using at least Python 3.6 the DictReader
returns regular dict
instead of its ordered variant,
OrderedDict
.
Now that the fields are in an OrderedDict
, the routine to
cast the field entries is slightly different:
>>> number_of_rows_to_skip = 11
>>> with open('md_result.csv', 'r', newline='') as f:
... # skip the first rows
... for _ in range(number_of_rows_to_skip):
... next(f)
...
... csv_reader = csv.DictReader(f, delimiter=' ')
... for row in csv_reader:
... for key, entry in row.items():
... for possible_type in possible_types:
... try:
... entry = possible_type(entry)
... except ValueError:
... continue
... except:
... raise
... else:
... row[key] = entry
... break
... print(row)
...
OrderedDict([('atom_id', 0), ('type', 'He'), ('x-position', 5.7222e-07), ('y-position', 4.8811e-09), ('z-position', 2.0415e-07), ('x-velocity', -29.245), ('y-velocity', 100.45), ('z-velocity', 128.28)])
OrderedDict([('atom_id', 1), ('type', 'He'), ('x-position', 9.771e-07), ('y-position', 3.6371e-07), ('z-position', 4.7311e-07), ('x-velocity', -199.26), ('y-velocity', 232.75), ('z-velocity', -534.38)])
OrderedDict([('atom_id', 2), ('type', 'Ar'), ('x-position', 6.4989e-07), ('y-position', 6.7873e-07), ('z-position', 9.5e-07), ('x-velocity', -1.5592), ('y-velocity', -378.76), ('z-velocity', 84.091)])
OrderedDict([('atom_id', 3), ('type', 'Ar'), ('x-position', 5.9024e-08), ('y-position', 3.7138e-07), ('z-position', 7.3455e-08), ('x-velocity', 342.82), ('y-velocity', 156.82), ('z-velocity', -38.991)])
OrderedDict([('atom_id', 4), ('type', 'He'), ('x-position', 7.6746e-07), ('y-position', 8.3017e-08), ('z-position', 4.852e-07), ('x-velocity', -30.45), ('y-velocity', -379.75), ('z-velocity', -336.32)])
OrderedDict([('atom_id', 5), ('type', 'Ar'), ('x-position', 1.7226e-07), ('y-position', 4.6023e-07), ('z-position', 4.7356e-08), ('x-velocity', -311.51), ('y-velocity', -429.39), ('z-velocity', -694.74)])
OrderedDict([('atom_id', 6), ('type', 'Ar'), ('x-position', 9.6394e-07), ('y-position', 7.2845e-07), ('z-position', 8.8623e-07), ('x-velocity', -82.636), ('y-velocity', 45.098), ('z-velocity', -10.626)])
OrderedDict([('atom_id', 7), ('type', 'He'), ('x-position', 5.445e-07), ('y-position', 4.6373e-07), ('z-position', 6.227e-07), ('x-velocity', 158.89), ('y-velocity', 258.58), ('z-velocity', -151.5)])
OrderedDict([('atom_id', 8), ('type', 'He'), ('x-position', 7.9322e-07), ('y-position', 9.47e-07), ('z-position', 3.5194e-08), ('x-velocity', -197.03), ('y-velocity', 156.74), ('z-velocity', -185.2)])
OrderedDict([('atom_id', 9), ('type', 'Ar'), ('x-position', 2.7797e-07), ('y-position', 1.6487e-07), ('z-position', 8.2403e-07), ('x-velocity', -38.65), ('y-velocity', -696.32), ('z-velocity', 216.42)])
OrderedDict([('atom_id', 10), ('type', 'He'), ('x-position', 1.1842e-07), ('y-position', 6.3244e-07), ('z-position', 5.0958e-07), ('x-velocity', -149.63), ('y-velocity', 422.88), ('z-velocity', -76.309)])
OrderedDict([('atom_id', 11), ('type', 'Ar'), ('x-position', 2.0359e-07), ('y-position', 8.3369e-07), ('z-position', 9.6348e-07), ('x-velocity', 484.57), ('y-velocity', -267.41), ('z-velocity', -352.54)])
OrderedDict([('atom_id', 12), ('type', 'He'), ('x-position', 5.1019e-07), ('y-position', 2.247e-07), ('z-position', 2.3846e-08), ('x-velocity', -231.92), ('y-velocity', -99.51), ('z-velocity', 32.77)])
OrderedDict([('atom_id', 13), ('type', 'Ar'), ('x-position', 3.5383e-07), ('y-position', 8.4581e-07), ('z-position', 7.234e-07), ('x-velocity', -303.95), ('y-velocity', 47.316), ('z-velocity', 222.53)])
OrderedDict([('atom_id', 14), ('type', 'He'), ('x-position', 3.8515e-07), ('y-position', 2.894e-07), ('z-position', 5.6028e-07), ('x-velocity', 233.08), ('y-velocity', 254.18), ('z-velocity', 429.83)])
OrderedDict([('atom_id', 15), ('type', 'He'), ('x-position', 1.5842e-07), ('y-position', 9.8225e-07), ('z-position', 5.7859e-07), ('x-velocity', 199.63), ('y-velocity', 203.11), ('z-velocity', -425.6)])
OrderedDict([('atom_id', 16), ('type', 'He'), ('x-position', 3.6831e-07), ('y-position', 7.652e-07), ('z-position', 2.9884e-07), ('x-velocity', 66.341), ('y-velocity', 222.32), ('z-velocity', -97.653)])
OrderedDict([('atom_id', 17), ('type', 'He'), ('x-position', 2.8696e-07), ('y-position', 1.5129e-07), ('z-position', 6.406e-07), ('x-velocity', 90.358), ('y-velocity', -67.459), ('z-velocity', -64.782)])
OrderedDict([('atom_id', 18), ('type', 'He'), ('x-position', 1.0325e-07), ('y-position', 9.9012e-07), ('z-position', 3.4381e-07), ('x-velocity', 71.108), ('y-velocity', 11.06), ('z-velocity', 15.912)])
OrderedDict([('atom_id', 19), ('type', 'Ar'), ('x-position', 4.3929e-07), ('y-position', 7.5363e-07), ('z-position', 9.9974e-07), ('x-velocity', 239.19), ('y-velocity', 173.83), ('z-velocity', 335.29)])
As long as the field names are consistent across files you can write code that needs less maintenance.
Another way of reading CSV files is by using the loadtxt()
function of NumPy. By specifying the data type as you would for
Structured arrays the type conversion is done for you, while retaining
the dictionary like behavior. You can also specify a comment character that
should be ignored and the amount of rows to skip:
csv_dtype = [
('atom_id', np.int32),
('type', np.string_, 2),
('position', np.float64, 3),
('velocity', np.float64, 3)
]
with open('md_result.csv', 'r') as f:
md_data = np.loadtxt(f, dtype=csv_dtype, skiprows=12)
print(md_data)
[ ( 0, b'He', [ 5.72220000e-07, 4.88110000e-09, 2.04150000e-07], [ -29.245 , 100.45 , 128.28 ])
( 1, b'He', [ 9.77100000e-07, 3.63710000e-07, 4.73110000e-07], [-199.26 , 232.75 , -534.38 ])
( 2, b'Ar', [ 6.49890000e-07, 6.78730000e-07, 9.50000000e-07], [ -1.5592, -378.76 , 84.091 ])
( 3, b'Ar', [ 5.90240000e-08, 3.71380000e-07, 7.34550000e-08], [ 342.82 , 156.82 , -38.991 ])
( 4, b'He', [ 7.67460000e-07, 8.30170000e-08, 4.85200000e-07], [ -30.45 , -379.75 , -336.32 ])
( 5, b'Ar', [ 1.72260000e-07, 4.60230000e-07, 4.73560000e-08], [-311.51 , -429.39 , -694.74 ])
( 6, b'Ar', [ 9.63940000e-07, 7.28450000e-07, 8.86230000e-07], [ -82.636 , 45.098 , -10.626 ])
( 7, b'He', [ 5.44500000e-07, 4.63730000e-07, 6.22700000e-07], [ 158.89 , 258.58 , -151.5 ])
( 8, b'He', [ 7.93220000e-07, 9.47000000e-07, 3.51940000e-08], [-197.03 , 156.74 , -185.2 ])
( 9, b'Ar', [ 2.77970000e-07, 1.64870000e-07, 8.24030000e-07], [ -38.65 , -696.32 , 216.42 ])
(10, b'He', [ 1.18420000e-07, 6.32440000e-07, 5.09580000e-07], [-149.63 , 422.88 , -76.309 ])
(11, b'Ar', [ 2.03590000e-07, 8.33690000e-07, 9.63480000e-07], [ 484.57 , -267.41 , -352.54 ])
(12, b'He', [ 5.10190000e-07, 2.24700000e-07, 2.38460000e-08], [-231.92 , -99.51 , 32.77 ])
(13, b'Ar', [ 3.53830000e-07, 8.45810000e-07, 7.23400000e-07], [-303.95 , 47.316 , 222.53 ])
(14, b'He', [ 3.85150000e-07, 2.89400000e-07, 5.60280000e-07], [ 233.08 , 254.18 , 429.83 ])
(15, b'He', [ 1.58420000e-07, 9.82250000e-07, 5.78590000e-07], [ 199.63 , 203.11 , -425.6 ])
(16, b'He', [ 3.68310000e-07, 7.65200000e-07, 2.98840000e-07], [ 66.341 , 222.32 , -97.653 ])
(17, b'He', [ 2.86960000e-07, 1.51290000e-07, 6.40600000e-07], [ 90.358 , -67.459 , -64.782 ])
(18, b'He', [ 1.03250000e-07, 9.90120000e-07, 3.43810000e-07], [ 71.108 , 11.06 , 15.912 ])
(19, b'Ar', [ 4.39290000e-07, 7.53630000e-07, 9.99740000e-07], [ 239.19 , 173.83 , 335.29 ])]
So this makes it really convenient to work with, e.g., the velocity may easily be computed like this:
print(np.linalg.norm(md_data['velocity'], axis=1))
With the output
[ 165.53317168 615.98627785 387.98565049 378.99652094 508.18147093
874.11550713 94.73885146 339.20775124 312.54965765 730.20064455
455.01614782 656.20167982 254.48575481 379.66301803 551.07856763
512.09488281 251.72091508 130.04611632 73.70117372 447.04374406]
JavaScript Object Notation¶
Filename extension: | |
---|---|
.json |
The JavaScript Object Notation (JSON) format was derived from JavaScript. It is, however, a language-independent format with many programming languages offering support for simple generation and parsing of JSON files, among them Python.
>>> with open('md_result.json', 'r') as f:
... print(f.read())
...
{
"potentials": {
"Ar": {
"type": "Lennard-Jones",
"parameters": {
"sigma": {
"unit": "pm",
"magnitude": 341
},
"epsilon/k_B": {
"unit": "K",
"magnitude": 120
}
}
},
"He": {
"type": "Lennard-Jones",
"parameters": {
"sigma": {
"unit": "pm",
"magnitude": "256"
},
"epsilon/k_B": {
"unit": "K",
"magnitude": "10.22"
}
}
}
},
"atoms": [
{
"type": "He",
"velocity": [
-29.245,
100.45,
128.28
],
"atom_id": 0,
"position": [
5.7222e-07,
4.8811e-09,
2.0415e-07
]
},
{
"type": "He",
"velocity": [
-199.26,
232.75,
-534.38
],
"atom_id": 1,
"position": [
9.771e-07,
3.6371e-07,
4.7311e-07
]
},
{
"type": "Ar",
"velocity": [
-1.5592,
-378.76,
84.091
],
"atom_id": 2,
"position": [
6.4989e-07,
6.7873e-07,
9.5e-07
]
},
{
"type": "Ar",
"velocity": [
342.82,
156.82,
-38.991
],
"atom_id": 3,
"position": [
5.9024e-08,
3.7138e-07,
7.3455e-08
]
},
{
"type": "He",
"velocity": [
-30.45,
-379.75,
-336.32
],
"atom_id": 4,
"position": [
7.6746e-07,
8.3017e-08,
4.852e-07
]
},
{
"type": "Ar",
"velocity": [
-311.51,
-429.39,
-694.74
],
"atom_id": 5,
"position": [
1.7226e-07,
4.6023e-07,
4.7356e-08
]
},
{
"type": "Ar",
"velocity": [
-82.636,
45.098,
-10.626
],
"atom_id": 6,
"position": [
9.6394e-07,
7.2845e-07,
8.8623e-07
]
},
{
"type": "He",
"velocity": [
158.89,
258.58,
-151.5
],
"atom_id": 7,
"position": [
5.445e-07,
4.6373e-07,
6.227e-07
]
},
{
"type": "He",
"velocity": [
-197.03,
156.74,
-185.2
],
"atom_id": 8,
"position": [
7.9322e-07,
9.47e-07,
3.5194e-08
]
},
{
"type": "Ar",
"velocity": [
-38.65,
-696.32,
216.42
],
"atom_id": 9,
"position": [
2.7797e-07,
1.6487e-07,
8.2403e-07
]
},
{
"type": "He",
"velocity": [
-149.63,
422.88,
-76.309
],
"atom_id": 10,
"position": [
1.1842e-07,
6.3244e-07,
5.0958e-07
]
},
{
"type": "Ar",
"velocity": [
484.57,
-267.41,
-352.54
],
"atom_id": 11,
"position": [
2.0359e-07,
8.3369e-07,
9.6348e-07
]
},
{
"type": "He",
"velocity": [
-231.92,
-99.51,
32.77
],
"atom_id": 12,
"position": [
5.1019e-07,
2.247e-07,
2.3846e-08
]
},
{
"type": "Ar",
"velocity": [
-303.95,
47.316,
222.53
],
"atom_id": 13,
"position": [
3.5383e-07,
8.4581e-07,
7.234e-07
]
},
{
"type": "He",
"velocity": [
233.08,
254.18,
429.83
],
"atom_id": 14,
"position": [
3.8515e-07,
2.894e-07,
5.6028e-07
]
},
{
"type": "He",
"velocity": [
199.63,
203.11,
-425.6
],
"atom_id": 15,
"position": [
1.5842e-07,
9.8225e-07,
5.7859e-07
]
},
{
"type": "He",
"velocity": [
66.341,
222.32,
-97.653
],
"atom_id": 16,
"position": [
3.6831e-07,
7.652e-07,
2.9884e-07
]
},
{
"type": "He",
"velocity": [
90.358,
-67.459,
-64.782
],
"atom_id": 17,
"position": [
2.8696e-07,
1.5129e-07,
6.406e-07
]
},
{
"type": "He",
"velocity": [
71.108,
11.06,
15.912
],
"atom_id": 18,
"position": [
1.0325e-07,
9.9012e-07,
3.4381e-07
]
},
{
"type": "Ar",
"velocity": [
239.19,
173.83,
335.29
],
"atom_id": 19,
"position": [
4.3929e-07,
7.5363e-07,
9.9974e-07
]
}
],
"step": 0,
"boundary_conditions": [
"periodic",
"periodic",
"periodic"
],
"system_size": [
{
"unit": "µm",
"magnitude": 100
},
{
"unit": "µm",
"magnitude": 200
},
{
"unit": "µm",
"magnitude": 300
}
],
"time": {
"unit": "s",
"magnitude": 0
},
"atom_types": {
"Ar": "Argon",
"He": "Helium"
}
}
Which can easily be read with the json
module of the Python standard
library:
>>> import json
>>> with open('md_result.json', 'r') as f:
... md_data = json.load(f)
...
>>> print(md_data)
{'potentials': {'He': {'parameters': {'sigma': {'magnitude': '256', 'unit': 'pm'}, 'epsilon/k_B': {'magnitude': '10.22', 'unit': 'K'}}, 'type': 'Lennard-Jones'}, 'Ar': {'parameters': {'sigma': {'magnitude': 341, 'unit': 'pm'}, 'epsilon/k_B': {'magnitude': 120, 'unit': 'K'}}, 'type': 'Lennard-Jones'}}, 'time': {'magnitude': 0, 'unit': 's'}, 'boundary_conditions': ['periodic', 'periodic', 'periodic'], 'atom_types': {'He': 'Helium', 'Ar': 'Argon'}, 'system_size': [{'magnitude': 100, 'unit': 'µm'}, {'magnitude': 200, 'unit': 'µm'}, {'magnitude': 300, 'unit': 'µm'}], 'atoms': [{'position': [5.7222e-07, 4.8811e-09, 2.0415e-07], 'atom_id': 0, 'type': 'He', 'velocity': [-29.245, 100.45, 128.28]}, {'position': [9.771e-07, 3.6371e-07, 4.7311e-07], 'atom_id': 1, 'type': 'He', 'velocity': [-199.26, 232.75, -534.38]}, {'position': [6.4989e-07, 6.7873e-07, 9.5e-07], 'atom_id': 2, 'type': 'Ar', 'velocity': [-1.5592, -378.76, 84.091]}, {'position': [5.9024e-08, 3.7138e-07, 7.3455e-08], 'atom_id': 3, 'type': 'Ar', 'velocity': [342.82, 156.82, -38.991]}, {'position': [7.6746e-07, 8.3017e-08, 4.852e-07], 'atom_id': 4, 'type': 'He', 'velocity': [-30.45, -379.75, -336.32]}, {'position': [1.7226e-07, 4.6023e-07, 4.7356e-08], 'atom_id': 5, 'type': 'Ar', 'velocity': [-311.51, -429.39, -694.74]}, {'position': [9.6394e-07, 7.2845e-07, 8.8623e-07], 'atom_id': 6, 'type': 'Ar', 'velocity': [-82.636, 45.098, -10.626]}, {'position': [5.445e-07, 4.6373e-07, 6.227e-07], 'atom_id': 7, 'type': 'He', 'velocity': [158.89, 258.58, -151.5]}, {'position': [7.9322e-07, 9.47e-07, 3.5194e-08], 'atom_id': 8, 'type': 'He', 'velocity': [-197.03, 156.74, -185.2]}, {'position': [2.7797e-07, 1.6487e-07, 8.2403e-07], 'atom_id': 9, 'type': 'Ar', 'velocity': [-38.65, -696.32, 216.42]}, {'position': [1.1842e-07, 6.3244e-07, 5.0958e-07], 'atom_id': 10, 'type': 'He', 'velocity': [-149.63, 422.88, -76.309]}, {'position': [2.0359e-07, 8.3369e-07, 9.6348e-07], 'atom_id': 11, 'type': 'Ar', 'velocity': [484.57, -267.41, -352.54]}, {'position': [5.1019e-07, 2.247e-07, 2.3846e-08], 'atom_id': 12, 'type': 'He', 'velocity': [-231.92, -99.51, 32.77]}, {'position': [3.5383e-07, 8.4581e-07, 7.234e-07], 'atom_id': 13, 'type': 'Ar', 'velocity': [-303.95, 47.316, 222.53]}, {'position': [3.8515e-07, 2.894e-07, 5.6028e-07], 'atom_id': 14, 'type': 'He', 'velocity': [233.08, 254.18, 429.83]}, {'position': [1.5842e-07, 9.8225e-07, 5.7859e-07], 'atom_id': 15, 'type': 'He', 'velocity': [199.63, 203.11, -425.6]}, {'position': [3.6831e-07, 7.652e-07, 2.9884e-07], 'atom_id': 16, 'type': 'He', 'velocity': [66.341, 222.32, -97.653]}, {'position': [2.8696e-07, 1.5129e-07, 6.406e-07], 'atom_id': 17, 'type': 'He', 'velocity': [90.358, -67.459, -64.782]}, {'position': [1.0325e-07, 9.9012e-07, 3.4381e-07], 'atom_id': 18, 'type': 'He', 'velocity': [71.108, 11.06, 15.912]}, {'position': [4.3929e-07, 7.5363e-07, 9.9974e-07], 'atom_id': 19, 'type': 'Ar', 'velocity': [239.19, 173.83, 335.29]}], 'step': 0}
So parsing a JSON file results in a composition of dictionaries, lists, strings, integers, and floats.
Writing JSON files is equally simple:
>>> with open('my_md_result.json', 'w') as f:
... json.dump(md_data, f, indent=2)
...
with the indent
keyword resulting in the file not written in a very
condensed manner, but with newlines and an indentation of 2. Note that the
following Python types can be converted out-of-the-box:
Python | JSON |
---|---|
dict |
object |
list , tuple |
array |
str |
string |
int , float ,
int - & float -derived
Enum s |
number |
True |
true |
False |
false |
None |
null |
Extensible Markup Language¶
Filename extension: | |
---|---|
.xml |
The Extensible Markup Language (XML) format was developed by the World Wide Web
Consortium and is supported by a lot of programming languages. Python offers
support for XML via the xml
package.
The short introduction of the xml.etree.ElementTree over at Python is well written and recommended as a starting point.
Hierarchical Data Format 5¶
filename extension: | |
---|---|
.h5 , .hdf5 |
The file structure of HDF5 includes two major types of objects:
- Datasets
- Multidimensional arrays of a homogeneous type.
- Groups
- Container structures which can hold datasets or other groups.
In that way we end up with a data format that somewhat resembles a filesystem.
Python support for HDF5 is due to the h5py package, which can be installed via
pip install h5py
and be used after importing it via
import h5py
File objects¶
To open an HDF5 file the recommended way is to use
with h5py.File('my_file.h5', 'r') as f:
# interact with the file object
where the first argument is the path to the HDF5 file, and the second one is the mode in which to open the file. Available modes are
r : | readonly, file must exist |
---|---|
r+ : | read/write, file must exist |
w : | create file, truncate if exists |
w- or x : | create file, fail if exists |
a : | read/write if exists, create otherwise |
with the default filemode being a
.
Danger
Similarly to open()
there is another way of opening an HDF5 file:
f = h5py.File('my_file.h5', 'r')
# interact with the file object
f.close()
This has the disadvantage that you have to take care of closing the file yourself. As the binary nature of HDF5 files essentially means that one corrupted bit may lead to loss of all the data in the file. We therefore strongly recommend the previously mentioned way with the context manager.
Every file object is also an HDF5 group object representing the root group of the file.
Groups¶
The hierarchical organization of the HDF5 file format is achieved by groups.
Similarly to directories on your regular filesystem they help you with the
organization of your data within an HDF5 file. As mentioned before the file
object returned by the File
initialization. Creating a new group is done by
calling the create_group()
method of an HDF5 Group
object:
with h5py.File('my_file.h5', 'w') as f:
print('Name of group:', f.name)
nested_group = f.create_group('nested group')
print('Name of nested group:', nested_group.name)
recursively_created_group = f.create_group('/this/is/deep')
print(
'Name of recursively created group:',
recursively_created_group.name)
Which results in the following output:
Name of group: /
Name of nested group: /nested group
Name of recursively created group: /this/is/deep
Working with files may also be compared to working with dict
objects,
as they offer the indexing syntax to traverse groups, support iteration and the
keys()
, values()
and items()
syntax:
with h5py.File('my_file.h5', 'w') as f:
f.create_group('/favorite/group/one')
f.create_group('/favorite/group/two')
f.create_group('/favorite/group/three')
group = f['/favorite/group']
for subgroup_name in sorted(group):
print(subgroup_name)
subgroup_one = group['one']
print(subgroup_one.name)
for subgroup_name, subgroup in sorted(group.items()):
print(subgroup_name, subgroup.name)
one
three
two
/favorite/group/one
one /favorite/group/one
three /favorite/group/three
two /favorite/group/two
Datasets¶
Creating a dataset is done via calling the create_dataset
method on an HDF5
group. Although we can create empty datasets in HDF5 and fill them with data
later on, we much more commonly just want to store data that has been worked on
in a NumPy array. This can be done like this:
my_array = np.array(5*5*3*3*3).reshape(5, 5, 3, 3, 3)
with h5py.File('my_file.h5', 'w') as f:
group = f.create_group('my_group')
dset = group.create_dataset('my_dataset', data=my_array)
with h5py.File('my_file.h5', 'r') as f:
retrieved_array = np.array(f['my_group/my_dataset'])
print(np.allclose(my_array, retrieved_array))
Which results in the ouptut True
. Handling the datatypes is done
automatically both for dumping and loading the data—both for regular and
structured arrays.
As the data we store tends to get quite large we can leverage the compression
options HDF5 offers. H5Py makes keeps this simple and enables compression when
a compression
keyword is supplied, followed by a number in the range of 1
to 9 indicating the cpu-time/compression trade-off, from “least compression” to
“densest compression.”
my_array = np.array(5*5*3*3*3).reshape(5, 5, 3, 3, 3)
with h5py.File('my_file.h5', 'w') as f:
group = f.create_group('my_group')
dset = group.create_dataset('my_dataset', data=my_array, compression=9)
Attributes¶
Each group and dataset may have attributes. Attributes are metadata that can be
assigned in a dictionary-like fashion using the attributes
attribute of the
group or dataset object.
md_dtype = [
('atom_id', np.int32),
('type', np.string_, 2),
('position', np.float64, 3),
('velocity', np.float64, 3)
]
md_data = np.array(
[
(0, 'He', (5.7222e-07, 4.8811e-09, 2.0415e-07), (-29.245, 100.45, 128.28)),
(1, 'He', (9.7710e-07, 3.6371e-07, 4.7311e-07), (-199.26, 232.75, -534.38)),
(2, 'Ar', (6.4989e-07, 6.7873e-07, 9.5000e-07), (-1.5592, -378.76, 84.091)),
(3, 'Ar', (5.9024e-08, 3.7138e-07, 7.3455e-08), (342.82, 156.82, -38.991)),
(4, 'He', (7.6746e-07, 8.3017e-08, 4.8520e-07), (-30.45, -379.75, -336.32)),
(5, 'Ar', (1.7226e-07, 4.6023e-07, 4.7356e-08), (-311.51, -429.39, -694.74)),
(6, 'Ar', (9.6394e-07, 7.2845e-07, 8.8623e-07), (-82.636, 45.098, -10.626)),
(7, 'He', (5.4450e-07, 4.6373e-07, 6.2270e-07), (158.89, 258.58, -151.5)),
(8, 'He', (7.9322e-07, 9.4700e-07, 3.5194e-08), (-197.03, 156.74, -185.2)),
(9, 'Ar', (2.7797e-07, 1.6487e-07, 8.2403e-07), (-38.65, -696.32, 216.42)),
(10, 'He', (1.1842e-07, 6.3244e-07, 5.0958e-07), (-149.63, 422.88, -76.309)),
(11, 'Ar', (2.0359e-07, 8.3369e-07, 9.6348e-07), (484.57, -267.41, -352.54)),
(12, 'He', (5.1019e-07, 2.2470e-07, 2.3846e-08), (-231.92, -99.51, 32.77)),
(13, 'Ar', (3.5383e-07, 8.4581e-07, 7.2340e-07), (-303.95, 47.316, 222.53)),
(14, 'He', (3.8515e-07, 2.8940e-07, 5.6028e-07), (233.08, 254.18, 429.83)),
(15, 'He', (1.5842e-07, 9.8225e-07, 5.7859e-07), (199.63, 203.11, -425.6)),
(16, 'He', (3.6831e-07, 7.6520e-07, 2.9884e-07), (66.341, 222.32, -97.653)),
(17, 'He', (2.8696e-07, 1.5129e-07, 6.4060e-07), (90.358, -67.459, -64.782)),
(18, 'He', (1.0325e-07, 9.9012e-07, 3.4381e-07), (71.108, 11.06, 15.912)),
(19, 'Ar', (4.3929e-07, 7.5363e-07, 9.9974e-07), (239.19, 173.83, 335.29))
],
dtype=md_dtype)
with h5py.File('md_results.h5') as f:
# Generic information regarding the file/the simulation
f.attrs['units'] = 'All quantities are in SI units.'
f.attrs['atom-types'] = json.dumps(
{
'He': 'Helium',
'Ar': 'Argon'
})
f.attrs['He potential'] = json.dumps(
{
'type': 'Lennard-Jones',
'parameters': {
'epsilon/k_B': 10.22,
'sigma': 256e-12
}
})
f.attrs['Ar potential'] = json.dumps(
{
'type': 'Lennard-Jones',
'parameters': {
'epsilon/k_B': 120,
'sigma': 341e-12
}
})
f.attrs['system size'] = np.array([100e-6, 200e-6, 300e-6])
f.attrs['boundary conditions'] = json.dumps(
['periodic', 'periodic', 'periodic'])
# Information specific to this dataset
dset = f.create_dataset('0000', data=md_data, compression=9)
dset.attrs['step'] = 0
dset.attrs['time'] = 0
HDFView¶
The HDF Group offers a tool for brwosing and editing HDF5 files: HDFView. Often it can be installed via the package manager of our operating system, and subsequently be executed via